New technology protects internet privacy not by creating less data, but by creating more

Today, the House of Representatives approved a resolution making it legal for ISPs to collect and sell our browsing data without our consent, forever. Since it already passed in the Senate, it's on its way to the President's desk now. I've been thinking about what we can do to work against this loss of data privacy. I mean, I love collecting and analyzing data - it's what I do for a living - but not when users have no control over how, when, or whether it's done. 

I realized that instead of trying to hide the data I send, I should be sending lots of additional, meaningless data - creating a noisy cover that makes my true behavior hard to discern. And, if a lot of people did that, we could make entire datasets effectively unanalyzable. I think *that* is the way we can leverage the power of the data we create, and demand that it's used ethically.

So, I created Noiszy. Noiszy is a browser plugin that runs in the background and creates real-but-meaningless web data - digital "noise."  When you run the plugin, it sends your browser (I use it on a tab in the background) on a random trip around a short list of pre-approved news sites. It picks a site at random and then clicks links around that site for awhile, then eventually starts again with another site from the pre-approved list. (We can, and will, add to the list.) Ideally these should be sites that you don't often visit. This meaningless data dilutes the significance of the "real" data, by creating a campaign of misinformation.

Why should we care?

Organizations rely on large amounts of data to “target” users online and serve them relevant (more likely to be clicked) advertisements.  Plenty of targeting is innocuous and can be genuinely helpful.  Getting a sale offer on a product you recently viewed can be a win-win; the company makes a sale, and the customer is happy about the discount.  Targeting (and re-targeting) makes that possible.

However, when the pool of data gets larger and covers more areas, the implications are different.  For example: let's imagine that “Jane Internet” loves cats, and visits cats.com several times a day.  One day she's considering how to vote on a local building resolution, and she does some research by visiting two political news sites at opposite ends of the spectrum.  She reads one article on each site, getting a balanced view of the issue.  Let's imagine that one of these sites has access to this blended data and retargeting capabilities.  Soon, Jane starts to see advertisements related to the resolution, encouraging her to vote for the resolution because, the ad says, that vote will be best for the local wildlife.

Jane has no way of knowing this, but that message has been chosen specifically for her, because of her past visits to cats.com.  Without that awareness and context, Jane believes that the pro-wildlife message is one of the campaign's primary talking points, and is encouraged to vote in agreement.  The other side never has any opportunity to discuss or debate this point - and in fact, they don't even know that this animal-related topic has been raised, because they've never even been exposed to it.  Jane's attempt to be a well-informed voter has been usurped by retargeting.  And, perhaps most importantly, Jane doesn't even know this has happened.

Using Noiszy camouflages personal behavior online, making Jane less susceptible to remarketing and the “filter bubble.”  If different sites are visited more often than cats.com, the importance of those cats.com visits is diminished.  The algorithm can’t discern how important animals are to Jane, and can’t exploit that knowledge.  Instead, Jane sees more general information about the measure - or perhaps, no relevant advertising at all.  The playing field is leveled again.

Good for one, good for all

Noiszy is good for Jane Internet, but when we all use Noiszy, there is a much bigger impact: online data as a whole becomes less meaningful, and therefore less exploitable.  Companies and organizations lose the ability to "figure us out” from our data, and they will have to work harder to build products and content that people willingly engage with, rather than chasing clicks and impressions.  Could this be the end of “fake news”?

Ultimately, I want organizations to work harder to build products and content that people genuinely engage with, rather than chasing clicks and impressions. (There's a lot more about my thoughts on this on the website, herehere, and here.)

Noiszy is free and will remain free.  I don't want to make money from this; I want to start a data revolution, and I'd love to hear your thoughts.