Minimum Viable Data Collection

The gist of the EU’s General Data Protection Regulation (GDPR) is simple: protect citizens from data-hungry corporations. It gives people control over how their personal data is collected, as it requires website-owners to stick to a set of rules when dealing with people’s data (I’m paraphrasing). In short: it gives rights to citizens and duties to (digital) businesses.

Most websites in Europe have taken action to comply with parts of their duties, by implementing cookie consent banners. In the least bad scenario, the response of such website-owners was: let’s find out which cookies we set and which ones we allow others to set. Then ask users for permission and set cookies if permitted. So, basically, business as usual, but with a banner for explanation.

Of course, cookies themselves are not the problem per se. It is the broader strategy, which hardly requires cookies. It is to try and get as much information from users as possible and combining everything you can find, to create detailed profiles of website visitors. This is enabled by and helps enable surveillance capitalism, which threatens our societies and people in many ways.

What if website-owners used the logical opposite of that strategy? What if our industry championed… Minimum Viable Data Collection? This strategy would collect the minimum information from users. Maybe it would actively destroy and anonymise any personal data. You would find out what kind of tracking you do or enable (via third-parties) and reduce it. Ideally to zero. This strategy respects citizen’s rights much better.

Does this idealist blogger need more realism? Well, maybe. Let’s still explore this idea.

The current state of affairs

How do websites currently deal with their privacy-related obligations? Well, many ask permission to set cookies. Some use pointy formulations like ‘would you mind some cookies?’, completed with a picture of a cookie jar. Oatly, a company that makes vegan milk (yay!), is just of many going down this route:

Screenshot 2020 03 09 at 12.27.52 “Cookies go nicely with oat drinks. As it happens, the digital kind do too. So is it okay with you if we use cookies on this site?”

These are sometimes funny and original, but, also, they indicate these companies do not see online tracking as a serious problem. That’s understandable, it’s a complex problem, maybe we can’t expect marketeers to do their homework. But isn’t this too important to be turned into a joke?

Why do websites track?

The amount of cookie banners in the wild may lead us to conclude: websites really need tracking to function. Yet they don’t. So why do websites track?

From the perspective of surveillance capitalist companies, the need for tracking is clear: their business is built upon it. Again, see Shoshana Zuboff’s The Age of Surveillance Capitalism.

What I’m interested in: why do website owners wilfully welcome bad trackers on their web properties? Surely, it would be easier to comply with data collection regulations by not collecting data at all? Minimum Viable Data Collection seems to be the most straightforward solution (Ockham’s razor!).

I think these are some of the common reasons why sites don’t focus on minimising data collection yet:

Unaware of the harm: Some organisations may not have read up the consequences of using whatever part of their tools come with naughty trackers. This is not a valid excuse, but it may explain a majority of cases.
User research: Often, website owners want to track users to inform their design decisions. Through A/B tests (which aren’t trivial to get right, or other quantitative UX research on live websites.
Social media: A lot of websites embed content from third-parties, like videos from YouTube. Some also include buttons that allow social sharing, like Tweet This buttons. All of these come with tracking code more often than not. It’s unclear if people actually use them (but it seems some do).
Data collection for marketing: Some websites, especially those in the business of sales (like hotel or flight booking sites), collect user data so that they can make better (automated) decisions on offers and pricing.

Instead of maximising consent, minimise collection

The question all website owners should be asking, is: do we really, really need these trackers to exist on our pages? Can’t we do the things we want to do without trackers?

I think in many cases we can:

Unaware of the harm: More awereness is needed. As people who make websites and know how tracking works, we have to tell our clients and bosses.
User research: Maybe prioritise qualitative research? Or urge vendors of this software to design privacy-first?
Social media: Social sharing buttons could just be regular links () with information passed via parameters. Video embeds could be activated only when clicked.
Data collection for marketing: Maybe just don’t? If trust is good for business, dark patterns are not.

The no tracking movement

Excitingly, some websites choose to avoid or minimise tracking already.

The mission statement

Privacy and accessibility expert Laura Kalbag describes why she chose not to have trackers on her site:

For me, “no tracking” is both a fact and a mission statement. I’m not a fan of tracking. (…) That means that you will not find any analytics, article limits or cookies on my site. You also won’t find any third-party scripts, content delivery networks or third-party fonts. I won’t let anyone else track you either.

(From I don’t track you)

Laura was inspired by designer and front-end engineer Karolina Szczur, whose personal website sports ‘No tracking’ in the footer. So cool, and very thoughtful.

Dutch public broadcasters go cookie-less

I was pleasantly surprised to see STER, the department that runs advertising for Dutch public broadcasters on tv, radio and internet, decided to go cookieless for all of their web advertising in 2020. Instead of profiling users, they profile their own content. STER classifies their content into 23 “contexts” (like “sport/fitness” and “cooking/food”). They then allow advertisers to choose in which context they want to promote their brand. Cool, they can be in the advertising business, offer contextual advertising spots, without the need for tracking all the things.

NRC: “our journalism is our product”

Dutch newspaper NRC also stopped using third party cookies, and takes a clear stance:

Our journalism is our product. You are not. So we do not sell your data. Never. To nobody. We do not collect data for collection’s sake. We only save data with a tangible goal, like dealing with your subscription.

From NRC privacy (translation mine)

I like the stance, but should note my tracker blocker still has to filter 5 known trackers out when I read the news (also goes for paying users).

New York Times: less targeting, more revenue

Last year, The New York Times swapped behavioural targeting for targeting based on location and context. This did not decrease their revenue. Jean-Christophe Demarta, Senior Vice President for global advertising at the paper: “We have not been impacted from a revenue standpoint, and, on the contrary, our digital advertising business continues to grow nicely.”

Digiday editor Jessica Davies:

The publisher’s reader-revenue business model means it fiercely guards its readers’ user experience. Rather than bombard readers with consent notices or risk a clunky consent user experience, it decided to drop behavioral advertising entirely.

(Emphasis mine; both quotes from: After GDPR, The New York Times cut off ad exchanges in Europe - and kept growing ad revenue)

Again, my tracker blocker found 3 trackers when I visited nytimes.com this morning.

For large scale websites, it seems minimising data collection is easier said than done. Currently, there may not be any large sites that are truly tracker-free (besides Wikipedia).

GitHub: “Pretty simple, really”

In 2020, code platform GitHub decided to get rid of their cookie banner,by getting rid of their cookies. CEO Nat Friedman writes:

we have removed all non-essential cookies from GitHub, and visiting our website does not send any information to third-party analytics services.

(from: No cookie for you)

This fits well within their values, Friedman explains, “developers should not have to sacrifice their privacy to collaborate on GitHub.” I’d still love them to stop working with ICE.

Wrapping up

It is a concept for mostly inspirational purposes, but really… I think minimum viable data collection can be a great strategy for businesses who want to comply with the duties of privacy regulations. The advantages:

you need to build less complex cookie consent mechanisms
users stay on your site as they are less frustrated
you can serve customers that use tracking protection (see below)
your company gets hundreds of internet points

To users who need to browse today’s web, I strongly recommend using a browser with strong tracking protection, like Firefox (has Enhanced Tracking Protection) or Safari (has Intelligent Tracking Prevention. For nerds who want to read more, check out the intelligent tracking posts on the WebKit blog, including Preventing Tracking Prevention Tracking or the Firefox Tracking Protection section on MDN.

Update 17 December 2020: added GitHub example