In this blog, I have repeatedly discussed the subject of online privacy. Either by commenting on the current affairs of peer-to-peer networks like torrentspy or talking about the future of privacy regarding the data openness we are lately experiencing.
Yesterday, I visited one of my favorite annual exhibitions in Berlin, the transmediale, whose topic this time was “conspire”. Among other things, I attended a conference called “Web 3.0: Conspiring to keep the Net Public”, with the hope of discussing the evolution of the upcoming semantic web. To my surprise, the talk concentrated mostly on the privacy prospect of the web. To be honest, the overall conference didn’t blow my mind (it was hard to follow), but the presentation of Seda Gürses was a pleasant exception.
She pointed out some very interesting insights on privacy in cyberspace, which I would like to discuss here.
So what is privacy?
In her presentation, Seda showed a mathematical formula of privacy, which says that:
privacy = the right to be left alone / concealment of data x k-anonymity.
This means, that privacy consists of our fundamental need and right to be left alone, which can be achieved by concealment of data and k-anonymity. Lets get a bit more specific with the terms.
Concealment of Data
Whenever you subscribe in a site, there is always a login form with asterisks next to the fields you must fill in (your mail, your age, your zip code, etc); and there is always this little box you must click called “I have read the Terms of Service and agree with the policy”. Now if the service is a commercial one, it may provide these information to the so called ‘data-miners’.
They are marketing people, who collect vast amounts of information and then plan a corresponding marketing pattern.
They say for example: 50% of the Facebook users who have installed the vampire application are buying Dungeons and Dragons books in Amazon. And they put an ad next to the vampires applications about D&D.
Data mining vs. Privacy is an important issue covering not only the online world but also political subjects.
But it’s not, that there is no solution. Bruce Schneier noted:
there are many ways to analyze data without knowing details of the data, [...] it’s just that there is little incentive to use them.
Concealment of data suggests, that information such as name, age, location, etc. remain private. But how can this achieved?
K-Anonymity
That’s where k-anonymity comes handy. It keeps data miners and privacy advocates satisfied. K-anonymity simply says, that
A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.
K-anonymity can be achieved by two methods:
- Generalization.
Instead of saying: this subject is 26 years old, you say it is 20-30 years old. Instead of saying he lives in 10247 Berlin, you say 10xxx Berlin. And so on. - Another interesting way is perturbing the data. This means, that
The actual value can be replaced with a random value out of the standard distribution of values for that field. In this way, the overall distribution of values for that field will remain the same, but the individual data values will be wrong.
In other words, you can change the individual data in such way, that the collective data will still remain the same.
The right to be left alone
I left this one in the end on purpose. We all take this right for granted and in a way it is for granted. But if you think about it, its boundaries are very flexible. The issue of privacy is not only about concealing data, but also about the negotiation of what is private and what not. Years ago it was a debate if domestic violence was a privacy issue or not.
The best question ever
Seda Gürses stated in interesting theory (with a cute video), which concluded with the best question ever. It is a theory of a swedish scientist, whose name she didn’t remember (sadly).
If we really want to stay private and anonymous, concealing our personal information is surely not adequate enough. There are many parameters, which distinguish us from the others.
True and absolute anonymity can be only achieved when:
- Everyone would wear an identical box, which should be so wide and tall as the widest and tallest person on earth, so that our external characteristics wouldn’t be possible.
- Everyone would walk with the same pace, so that walking differentiation wouldn’t be possible.
- Everyone would go out of his house at the same time, so that noone could identify another.
- Each time someone went out, he should take a different route, so that a categorization would be impossible. etc. etc.
Also to avoid loneliness and isolation, people would be allowed to have a pet.
So in a world of true anonymity, the only distinction from one person to another would be his pet.
The question is: do we really want to live in a world of true anonymity?





Rick responded on 04 Feb 2008 at 4:57 pm #
Interesting post, and I always enjoy being linked too. Some thoughts:
In the political world, if there were no repercussions if I supported and voted for a particular candidate, then my privacy as a voter would not matter to me.
If I get divorced, it does not matter to me if my employer knew, so long as it did not affect decisions by the company that would negatively affect me or did not negatively affect my work environment.
I think that much of privacy/anonymity is about the negative consequences such knowledge would create if it is widely available. That’s not to say it is the only aspect, but it is the most troublesome. There is also, I believe, a human need to have a space that is one’s own, and a lack of privacy destroys the ability to feel like it’s yours.
Such knowledge doesn’t always have to be true or bad to negatively affect you.
Paul M. Banas responded on 05 Feb 2008 at 4:33 pm #
I have been trying to understand where lines should be drawn on privacy and the net, and I think your post is really getting at the core issue.
As someone who analyzes large sets of consumer data as part of my day job, I realize the value that this information provides to advertisers and marketers, who, like it or not, provide all the cash (or potential of cash) that makes things like Google, Facebook, and many other parts of the web experience possible.
While most marketers want to know consumer information in aggregate, we could care less about knowing consumer information on an individual basis. Which is where your point on anonymity comes in.
A good example is grocery store sales data. Every purchase you make in a grocery store is compiled and sent to data clearinghouses like AC Nielsen or IRI, where it is analyzed by marketing researchers. The data is cut at such a macro level that your individual purchasing habits are never even seen. Your data is therefore public, but still anonymous.
Sorry for the long winded comment, but I think this is a very though provoking post.
PMB
robojiannis responded on 05 Feb 2008 at 7:00 pm #
@ Rick
On the one hand I understand what you are saying; if we are true to our decisions and ideas (politics, social life, etc) why keep them private. But on the other hand, it is a common secret that such information are used for several purposes, who might have nothing to do with our intentions. So I say keep them private and that’s where k-anonymity seems to be really helpful.
@PMB
I’ve heard about this grocery store tactic. I still don’t agree with this macro-research, but I think keeping data anonymous is surely a positive step. I wonder though if the data really remain anonymous;
We have here these grocery store cards. The consumer collects points when she shops and then after a certain amount of points, she gets a set of dishes or something.
I have the impression - correct me if I’m wrong - that all the information (what the consumer bought, when, etc) is actually stored in this grocery card. Does that remain anonymous, or do marketers target specific customers with specific ads?
Rick responded on 05 Feb 2008 at 7:55 pm #
robojiannis: I don’t really think about it that way. I am saying that if there is never going to be a negative consequence, there is little need for privacy. In reality, almost every situation can lead to negative consequences, even if hypothetical.
For example, many US citizens oppose gun registries, because if the government does go completely power-hungry or the US gets invaded, they will have a list of people who might oppose them readily available. Some feel they have already abused such lists, and believe that the government used those lists in the aftermath of hurricane Katrina to know where to focus firearm confiscation efforts.
Paul M. Banas responded on 06 Feb 2008 at 5:00 am #
@robojiannis
I wasn’t thinking about shopper card data (versus simply register sales), but we can use that. Participation in shopper card programs is voluntary. In return for sharing some personal data, you can get additional money saving incentives or even a set of dishes. And yes, you may get targeted ads sent to you. Or you can opt out and remain virtually invisible. In the end, I can’t think of any company that has any need or interest to know you as an individual.
The fact is that individual consumers are simply too “individual” for most companies to care about. It is truly the crowds that make meaning in datasets. Most data miners would probably agree with all your points about k-anonymity.
PMB
robojiannis responded on 07 Feb 2008 at 12:49 am #
Rick, I agree that if there were no negative consequences privacy concerns would be unnecessary. Sadly it isn’t so.
Interesting points Paul. I had the impression that individual consumers are the target of the companies now, since they start to notice the impact of word of mouth marketing.