3 ways fake data can stop fake news

The phenomenon of fake news is an awesome (if horrifying) case study in gaming algorithms.
The idea behind fake news - using a compelling headline to get eyeballs on your content - is really the basic premise of marketing.  What makes fake news different is:

  1. The marketer doesn’t need the viewer to actually *do* anything beyond pay attention
  2. Traffic and attention (real or fake, it doesn’t matter) can be easily bought
  3. Algorithms use “people paying attention” as a way to choose what to surface, so there's a tipping point where content with a lot of attention - no matter whether the attention is real or fake - is re-promoted automatically, creating a feedback loop.

Noiszy creates “fake data” - meaningless digital noise - that’s designed to befuddle algorithms.  Here are three ways “fake data” can help stop fake news:

  1. Break recommendation engines.  On most news sites, the experience is personalized so you get more of whatever you click.  That "Keeping up with the Kardashians" click means you'll be getting more celebrity gossip, whether you want it or not.  Those Kardashian links can be tempting, so when more are shown, the likelihood of clicking goes up, and then more are shown, more clicks, more shown...  This is the filter bubble at work.  Eventually it would be easy to think that the fall TV lineup was the biggest thing going on in the world.  That’s the “filter bubble” phenomenon at work.

    But if that fateful Kardashian click is just one of a much larger stream of clicks, it has far less significance.  Noiszy’s random clicks make news sites feed me more sports, stocks, health, travel, investment, and other types of news.  It’s a more balanced experience, and a way to break out of the filter bubble.
  2. Stop fake news outlets from making $$.  Fake news sites make money by hosting ads on their pages, and gaming algorithms to generate the traffic that leads to clicks.  One of the success measures is "clickthrough rate" (CTR) - basically, Clicks divided by Views.  You can also think of it as "the % of people who view that also click"*.  A higher CTR is generally better, and you'll have a higher CTR if you get more clicks from fewer viewers.

    When random pageviews flood the site, that number of viewers increases, without increasing the number of clicks. This results in a lower CTR.  Google interprets that low CTR as a low “quality” indicator for the site, and devalues it.
  3. Make (some) targeting obsolete.  There are good things about targeting - if I really liked the handbag I saw Kim Kardashian (ahem) carrying, but couldn’t stomach the price, I’d be happy to be targeted me when it went on sale.  But there are also so many bad things about targeting - ways our feeling and behaviors can be exploited by combining seemingly unrelated data.  Many of these methods rely on correlations: “users who saw A are X% more likely to do B,” or even better, “users who saw C, D, and E are Y% more likely to do F.”  C, D, and E could be unrelated things on the surface, but they may reveal something about an underlying personality type or belief system.

    Those correlations are built on large sets of clean data - but they could be polluted with “fake data”.  If more users with unrelated belief systems randomly saw C, D, and E, the “are Y% more likely to do F” part of the equation can’t be determined - because on the whole, those randomized users are *not* more likely to do F.  Algorithm.  Broken.

Your mission, if you choose to accept it, is to click everything.  Even the things you don’t want to click - no, especially the things you don’t want to click.  But if you need a little help, there’s Noiszy.  The algorithms are listening - make some noise!

*Not the real formula, but it works for this explanation.