Hello, World! And, answers to your Noiszy questions

Hello, World!

On Friday, some amazing things happened.  First, my guest post on mathbabe.org (about Noiszy) went up.  I woke up to my phone buzzing with form submissions from Noiszy.com.  A friend sent an IM saying, "uhhh, you're number 2 on hacker news!"  Several threads about Noiszy appeared on Reddit.  We were mentioned in a front page article on LifeHacker (more on that later).  A conversation started on Twitter.

This is awesome, to say the least!  Noiszy has multiple goals, but one personal goal for Noiszy is for people to think and talk about data & algorithms and how they should be used responsibly.  That's happening, and there is so much to discuss.  I've also had a lot of feedback on Noiszy (the plugin) and I'd like to address the top few items before we get into the more philosophical bits.

"Can I see the source code?"

Yes!  Not literally today, but we're going to make this open-source in the immediate future.  I'll keep you updated on that.

"Can I get this for Firefox/IE/Opera/mobile/etc.?"

These are all important.  For Firefox, you can use Foxified to enable Chrome extensions, and then install Noiszy from the Chrome web store - this appears to work well, but we're working a FF version.  Also, by going open-source, we should be able to develop support for additional browsers much more quickly.  Stay tuned for that by following @noiszytech on Twitter.

"Why doesn't noiszy just visit random sites? can i update the list of sites?"

Updating the list of sites is definitely high on the list of things to do.  A future post will delve into the depths of how personalization works and why it makes sense to choose a short-ish list of sites, but for now:

  1. To have any affect in the near-term, Noiszy data has to be focused on a limited number of sites.  More to come on that topic.

  2. Yes, we can update/change the list of sites, and I'd love to hear from you about what sites you'd like to direct traffic to and why.  Please submit this form, or tweet to @noiszytech with the hashtag #NoiszySites.

"Are you actually helping these sites by sending traffic to them?"

No.  Sites love traffic from real people, because people can buy things and be influenced; bots just drain resources.  For sites that make money from ads, clicks drive revenue, not impressions.  People click ads; bots do not (or, when they do, they don't actually buy things, which is even worse).

Noiszy is not a bot - it's traffic from real people, using their real browsers - but its traffic isn't meaningful, so it's kind of bot-like.  Since it doesn't click ads, it's highly unlikely that anyone is making money from your Noiszy visits.  (As a side note, there are some programs where sites make money from impressions, but it's a very small amount relative to clicks.  Also, in those cases, if the ads don't get clicked, the sites are penalized so in the long run they make less money from all impressions, Noiszy or not.)

Thank you, thank you, thank you to everyone who came to learn about Noiszy and the idea behind it, and especially to those who downloaded and used Noiszy and to everyone who contributed to the online conversation about it!  This is amazing and I'm so excited to keep the conversation going.  I'll be posting here and also on Twitter, so please follow @noiszytech for updates.

3 ways fake data can stop fake news

The phenomenon of fake news is an awesome (if horrifying) case study in gaming algorithms.
The idea behind fake news - using a compelling headline to get eyeballs on your content - is really the basic premise of marketing.  What makes fake news different is:

  1. The marketer doesn’t need the viewer to actually *do* anything beyond pay attention
  2. Traffic and attention (real or fake, it doesn’t matter) can be easily bought
  3. Algorithms use “people paying attention” as a way to choose what to surface, so there's a tipping point where content with a lot of attention - no matter whether the attention is real or fake - is re-promoted automatically, creating a feedback loop.

Noiszy creates “fake data” - meaningless digital noise - that’s designed to befuddle algorithms.  Here are three ways “fake data” can help stop fake news:

  1. Break recommendation engines.  On most news sites, the experience is personalized so you get more of whatever you click.  That "Keeping up with the Kardashians" click means you'll be getting more celebrity gossip, whether you want it or not.  Those Kardashian links can be tempting, so when more are shown, the likelihood of clicking goes up, and then more are shown, more clicks, more shown...  This is the filter bubble at work.  Eventually it would be easy to think that the fall TV lineup was the biggest thing going on in the world.  That’s the “filter bubble” phenomenon at work.

    But if that fateful Kardashian click is just one of a much larger stream of clicks, it has far less significance.  Noiszy’s random clicks make news sites feed me more sports, stocks, health, travel, investment, and other types of news.  It’s a more balanced experience, and a way to break out of the filter bubble.
  2. Stop fake news outlets from making $$.  Fake news sites make money by hosting ads on their pages, and gaming algorithms to generate the traffic that leads to clicks.  One of the success measures is "clickthrough rate" (CTR) - basically, Clicks divided by Views.  You can also think of it as "the % of people who view that also click"*.  A higher CTR is generally better, and you'll have a higher CTR if you get more clicks from fewer viewers.

    When random pageviews flood the site, that number of viewers increases, without increasing the number of clicks. This results in a lower CTR.  Google interprets that low CTR as a low “quality” indicator for the site, and devalues it.
  3. Make (some) targeting obsolete.  There are good things about targeting - if I really liked the handbag I saw Kim Kardashian (ahem) carrying, but couldn’t stomach the price, I’d be happy to be targeted me when it went on sale.  But there are also so many bad things about targeting - ways our feeling and behaviors can be exploited by combining seemingly unrelated data.  Many of these methods rely on correlations: “users who saw A are X% more likely to do B,” or even better, “users who saw C, D, and E are Y% more likely to do F.”  C, D, and E could be unrelated things on the surface, but they may reveal something about an underlying personality type or belief system.

    Those correlations are built on large sets of clean data - but they could be polluted with “fake data”.  If more users with unrelated belief systems randomly saw C, D, and E, the “are Y% more likely to do F” part of the equation can’t be determined - because on the whole, those randomized users are *not* more likely to do F.  Algorithm.  Broken.

Your mission, if you choose to accept it, is to click everything.  Even the things you don’t want to click - no, especially the things you don’t want to click.  But if you need a little help, there’s Noiszy.  The algorithms are listening - make some noise!

*Not the real formula, but it works for this explanation.

New technology protects internet privacy not by creating less data, but by creating more

Today, the House of Representatives approved a resolution making it legal for ISPs to collect and sell our browsing data without our consent, forever. Since it already passed in the Senate, it's on its way to the President's desk now. I've been thinking about what we can do to work against this loss of data privacy. I mean, I love collecting and analyzing data - it's what I do for a living - but not when users have no control over how, when, or whether it's done. 

I realized that instead of trying to hide the data I send, I should be sending lots of additional, meaningless data - creating a noisy cover that makes my true behavior hard to discern. And, if a lot of people did that, we could make entire datasets effectively unanalyzable. I think *that* is the way we can leverage the power of the data we create, and demand that it's used ethically.

So, I created Noiszy. Noiszy is a browser plugin that runs in the background and creates real-but-meaningless web data - digital "noise."  When you run the plugin, it sends your browser (I use it on a tab in the background) on a random trip around a short list of pre-approved news sites. It picks a site at random and then clicks links around that site for awhile, then eventually starts again with another site from the pre-approved list. (We can, and will, add to the list.) Ideally these should be sites that you don't often visit. This meaningless data dilutes the significance of the "real" data, by creating a campaign of misinformation.

Why should we care?

Organizations rely on large amounts of data to “target” users online and serve them relevant (more likely to be clicked) advertisements.  Plenty of targeting is innocuous and can be genuinely helpful.  Getting a sale offer on a product you recently viewed can be a win-win; the company makes a sale, and the customer is happy about the discount.  Targeting (and re-targeting) makes that possible.

However, when the pool of data gets larger and covers more areas, the implications are different.  For example: let's imagine that “Jane Internet” loves cats, and visits cats.com several times a day.  One day she's considering how to vote on a local building resolution, and she does some research by visiting two political news sites at opposite ends of the spectrum.  She reads one article on each site, getting a balanced view of the issue.  Let's imagine that one of these sites has access to this blended data and retargeting capabilities.  Soon, Jane starts to see advertisements related to the resolution, encouraging her to vote for the resolution because, the ad says, that vote will be best for the local wildlife.

Jane has no way of knowing this, but that message has been chosen specifically for her, because of her past visits to cats.com.  Without that awareness and context, Jane believes that the pro-wildlife message is one of the campaign's primary talking points, and is encouraged to vote in agreement.  The other side never has any opportunity to discuss or debate this point - and in fact, they don't even know that this animal-related topic has been raised, because they've never even been exposed to it.  Jane's attempt to be a well-informed voter has been usurped by retargeting.  And, perhaps most importantly, Jane doesn't even know this has happened.

Using Noiszy camouflages personal behavior online, making Jane less susceptible to remarketing and the “filter bubble.”  If different sites are visited more often than cats.com, the importance of those cats.com visits is diminished.  The algorithm can’t discern how important animals are to Jane, and can’t exploit that knowledge.  Instead, Jane sees more general information about the measure - or perhaps, no relevant advertising at all.  The playing field is leveled again.

Good for one, good for all

Noiszy is good for Jane Internet, but when we all use Noiszy, there is a much bigger impact: online data as a whole becomes less meaningful, and therefore less exploitable.  Companies and organizations lose the ability to "figure us out” from our data, and they will have to work harder to build products and content that people willingly engage with, rather than chasing clicks and impressions.  Could this be the end of “fake news”?

Ultimately, I want organizations to work harder to build products and content that people genuinely engage with, rather than chasing clicks and impressions. (There's a lot more about my thoughts on this on the website, herehere, and here.)

Noiszy is free and will remain free.  I don't want to make money from this; I want to start a data revolution, and I'd love to hear your thoughts.

Today, the House of Representatives is voting on HJR86

The Verge's report (among many others) about House Joint Resolution 86 lays out a good description of what's at stake here:

"ISPs like Comcast, AT&T, and Charter will be free to sell your personal information to the highest bidder without your permission — and no one will be able to protect you. The Federal Trade Commission has no legal authority to oversee ISP practices, and the bill under consideration ensures that the FCC cannot adopt “substantially similar” rules. So unless the bill fails in the House, the nation’s strongest privacy protections will not only be eliminated, they cannot be revived by the FCC."

The EFF has a great page dedicated to how to act to stop the House from approving this bill.

They've also posted a list of Five Creepy Things Your ISP Could Do if Congress Repeals the FCC’s Privacy Protections.

Fingers crossed.  This legislation is what's driven me to create Noiszy.  While Noiszy can't stop the legislation, it CAN work to prevent others from profiting off of your browsing data.