So far this book has explored generative successes and the problems they cause at the technical and content layers of the Internet. This chapter takes up a case study of a problem at the social layer: privacy. Privacy showcases issues that can worry individuals who are not concerned about some of the other problems discussed in this book, like copyright infringement, and it demonstrates how generativity puts old problems into new and perhaps unexpected configurations, calling for creative solutions. Once again, we test the notion that solutions that might solve the generative problems at one layer—solutions that go light on law, and instead depend on the cooperative use of code to cultivate and express norms—might also work at another.
The heart of the next-generation privacy problem arises from the similar but uncoordinated actions of individuals that can be combined in new ways thanks to the generative Net. Indeed, the Net enables individuals in many cases to compromise privacy more thoroughly than the government and commercial institutions traditionally targeted for scrutiny and regulation. The standard approaches that have been developed to analyze and limit institutional actors do not work well for this new breed of problem, which goes far beyond the compromise of sensitive information.
In 1973, a blue-ribbon panel reported to the U.S. Secretary of Health, Education, and Welfare (HEW) on computers and privacy. The report could have been written today:
It is no wonder that people have come to distrust computer-based record-keeping operations. Even in non-governmental settings, an individual’s control over the personal information that he gives to an organization, or that an organization obtains about him, is lessening as the relationship between the giver and receiver of personal data grows more attenuated, impersonal, and diffused. There was a time when information about an individual tended to be elicited in face-to-face contacts involving personal trust and a certain symmetry, or balance, between giver and receiver. Nowadays an individual must increasingly give information about himself to large and relatively faceless institutions, for handling and use by strangers—unknown, unseen and, all too frequently, unresponsive. Sometimes the individual does not even know that an organization maintains a record about him. Often he may not see it, much less contest its accuracy, control its dissemination, or challenge its use by others.1
The report pinpointed troubles arising not simply from powerful computing technology that could be used both for good and ill, but also from its impersonal quality: the sterile computer processed one’s warm, three-dimensional life into data handled and maintained by faraway faceless institutions, viewed at will by strangers. The worries of that era are not obsolete. We are still concerned about databases with too much information that are too readily accessed; databases with inaccurate information; and having the data from databases built for reasonable purposes diverted to less noble if not outright immoral uses.2
Government databases remain of particular concern, because of the unique strength and power of the state to amass information and use it for life-altering purposes. The day-to-day workings of the government rely on numerous databases, including those used for the calculation and provision of government benefits, decisions about law enforcement, and inclusion in various licensing regimes.3 Private institutional databases also continue to raise privacy issues, particularly in the realms of consumer credit reporting, health records, and financial data.
Due to political momentum generated by the HEW report and the growing controversy over President Richard Nixon’s use of government power to investigate political enemies, the U.S. Congress enacted comprehensive privacy legislation shortly after the report’s release. The Privacy Act of 1974 mandated a set of fair information practices, including disclosure of private information only with an individual’s consent (with exceptions for law enforcement, archiving, and routine uses), and established the right of the subject to know what was recorded about her and to offer corrections. While it was originally intended to apply to a broad range of public and private databases to parallel the HEW report, the Act was amended before passage to apply only to government agencies’ records.4 Congress never enacted a comparable comprehensive regulatory scheme for private databases. Instead, private databases are regulated only in narrow areas of sensitivity such as credit reports (addressed by a complex scheme passed in 1970 affecting the handful of credit reporting agencies)5 and video rental data,6 which has been protected since Supreme Court nominee Robert Bork’s video rental history was leaked to a newspaper during his confirmation process in 1987.7
The HEW report expresses a basic template for dealing with the informational privacy problem: first, a sensitivity is identified at some stage of the information production process—the gathering, storage, or dissemination of one’s private information—and then a legal regime is proposed to restrict these activities to legitimate ends. This template has informed analysis for the past thirty years, guiding battles over privacy both between individuals and government and between individuals and “large and faceless” corporations. Of course, a functional theory does not necessarily translate into successful practice. Pressures to gather and use personal data in commerce and law enforcement have increased, and technological tools to facilitate such data processing have matured without correspondingly aggressive privacy protections.8 (Consider Chapter Five’s description of the novel uses of tethered appliances to conduct surveillance.) In 1999, Scott McNealey, CEO of Sun Microsystems, was asked whether a new Sun technology to link consumer devices had any built-in privacy protection. “You have zero privacy anyway,” he replied. “Get over it.”9
McNealey’s words raised some ire at the time; one privacy advocate called them “a declaration of war.”10 McNealey has since indicated that he believes his answer was misunderstood.11 But the plain meaning of “getting over it” seems to have been heeded: while poll after poll indicates that the public is concerned about privacy,12 the public’s actions frequently belie these claims. Apart from momentary spikes in privacy concerns that typically arise in the wake of highprofile scandals—such as Watergate or the disclosure of Judge Bork’s video rentals—we routinely part with personal information and at least passively consent to its use, whether by surfing the Internet, entering sweepstakes, or using a supermarket discount card.
Current scholarly work on privacy tries to reconcile people’s nonchalant behavior with their seemingly heartfelt concerns about privacy. It sometimes calls for industry self-regulation rather than direct governmental regulation as a way to vindicate privacy interests, perhaps because such regulation is seen as more efficient or just, or because direct governmental intervention is understood to be politically difficult to achieve. Privacy scholarship also looks to the latest advances in specific technologies that could further weaken day-to-day informational privacy.13 One example is the increasing use of radio frequency identifiers (RFIDs) in consumer items, allowing goods to be scanned and tracked at a short distance. One promise of RFID is that a shopper could wheel her shopping cart under an arch at a grocery store and obtain an immediate tally of the price of its contents; one peril is that a stranger could drive by a house with an RFID scanner and instantly inventory its contents, from diapers to bacon to flat-screen TVs, immediately discerning the sort of people who live within.
This work on privacy generally hews to the original analytic template of 1973: both the analysis and suggested solutions talk in terms of institutions gathering data, and of developing ways to pressure institutions to better respect their customers’ and clients’ privacy. This approach is evident in discussions about electronic commerce on the Internet. Privacy advocates and scholars have sought ways to ensure that Web sites disclose to people what they are learning about consumers as they browse and buy. The notion of “privacy policies” has arisen from this debate. Through a combination of regulatory suasion and industry best practices, such policies are now found on many Web sites, comprising little-read boilerplate answering questions about what information a Web site gathers about a user and what it does with the information. Frequently the answers are, respectively, “as much as it can” and “whatever it wants”—but, to some, this is progress. It allows scholars and companies alike to say that the user has been put on notice of privacy practices.
Personal information security is another area of inquiry, and there have been some valuable policy innovations in this sphere. For example, a 2003 California law requires firms that unintentionally expose their customers’ private data to others to alert the customers to the security breach.14 This has led to a rash of well-known banks sending bashful letters to millions of their customers, gently telling them that, say, a package containing tapes with their credit card and social security numbers has been lost en route from one processing center to another.15 Bank of America lost such a backup tape with 1.2 million cus- tomer records in 2005.16 That same year, a MasterCard International security breach exposed information of more than 40 million credit card holders.17 Boston College lost 120,000 alumni records to hackers as a result of a breach.18 The number of incidents shows little sign of decreasing,19 despite the incentives provided by the embarrassment of disclosure and the existence of obvious ways to improve security practices. For minimal cost, firms could minimize some types of privacy risks to consumers—for example, by encrypting their backup tapes before shipping them anywhere, making them worthless to anyone without a closely held digital key.
Addressing Web site privacy and security has led to elaborations on the traditional informational privacy framework. Some particularly fascinating issues in this framework are still unfolding: is it fair, for example, for an online retailer like Amazon to record the average number of nanoseconds each user spends contemplating an item before clicking to buy it? Such data could be used by Amazon to charge impulse buyers more, capitalizing on the likelihood that this group of consumers does not pause long enough to absorb the listed price of the item they just bought. A brief experiment by Amazon in differential pricing resulted in bad publicity and a hasty retreat as some buyers noticed that they could save as much as $10 on a DVD by deleting browser cookies that indicated to Amazon that they had visited the site before.20 As this example suggests, forthrightly charging one price to one person and another price to someone else can generate resistance. Offering individualized discounts, however, can amount to the same thing for the vendor while appearing much more palatable to the buyer. Who would complain about receiving a coupon for $10 off the listed price of an item, even if the coupon were not transferable to any other Amazon user? (The answer may be “someone who did not get the coupon,” but to most people the second scenario is less troubling than the one in which different prices were charged from the start.)21
If data mining could facilitate price discrimination for Amazon or other online retailers, it could operate in the tangible world as well. As a shopper uses a loyal-customer card, certain discounts are offered at the register personalized to that customer. Soon, the price of a loaf of bread at the store becomes indeterminate: there is a sticker price, but when the shopper takes the bread up front, the store can announce a special individualized discount based on her relationship with the store. The sticker price then becomes only that, providing little indication of the price that shoppers are actually paying. Merchants can also vary service. Customer cards augmented with RFID tags can serve to identify those undesirable customers who visit a home improvement store, monopolize the attention of the attendants, and exit without having bought so much as a single nail. With these kinds of cards, the store would be able to discern the “good” (profitable) customers from the “bad” (not profitable) ones and appropriately alert the staff to flee from bad customers and approach good ones.
While privacy issues associated with government and corporate databases remain important, they are increasingly dwarfed by threats to privacy that do not fit the standard analytical template for addressing privacy threats. These new threats fit the generative pattern also found in the technical layers for Internet and PC security, and in the content layer for ventures such as Wikipedia. The emerging threats to privacy serve as an example of generativity’s downsides on the social layer, where contributions from remote amateurs can enable vulnerability and abuse that calls for intervention. Ideally such intervention would not unduly dampen the underlying generativity. Effective solutions for the problems of Privacy 2.0 may have more in common with solutions to other generative problems than with the remedies associated with the decades-old analytic template for Privacy 1.0.
The Era of Cheap Sensors
We can identify three successive shifts in technology from the early 1970s: cheap processors, cheap networks, and cheap sensors.22 The third shift has, with the help of the first two, opened the doors to new and formidable privacy invasions.
The first shift was cheap processors. Moore’s Law tells us that processing power doubles every eighteen months or so.23 A corollary is that existing processing power gets cheaper. The cheap processors available since the 1970s have allowed Bill Gates’s vision of a “computer on every desk” to move forward. Cheap processors also underlie information appliances: thanks to Moore’s Law, there is now sophisticated microprocessor circuitry in cars, coffeemakers, and singing greeting cards.
Cheap networks soon followed. The pay-per-minute proprietary dial-up networks gave way to an Internet of increasing bandwidth and dropping price. The all-you-can-eat models of measurement meant that, once established, idle network connections were no cheaper than well-used ones, and a Web page in New York cost no more to access from London than one in Paris. Lacking gatekeepers, these inexpensive processors and networks have been fertile soil for whimsical invention to take place and become mainstream. This generativity has occurred in part because the ancillary costs to experiment—both for software authors and software users—have been so low.
The most recent technological shift has been the availability of cheap sensors. Sensors that are small, accurate, and inexpensive are now found in cameras, microphones, scanners, and global positioning systems. These characteristics have made sensors much easier to deploy—and then network—in places where previously it would have been impractical to have them.
The proliferation of cheap surveillance cameras has empowered the central authorities found within the traditional privacy equation. A 2002 working paper estimated that the British government had spent several hundred million dollars on closed-circuit television systems, with many networked to central law enforcement stations for monitoring.24 Such advances, and the analysis that follows them, fit the template of Privacy 1.0: governments have access to more information thanks to more widely deployed monitoring technologies, and rules and practices are suggested to prevent whatever our notions might be of abuse.25 To see how cheap processors, networks, and sensors create an entirely new form of the problem, we must look to the excitement surrounding the participatory technologies suggested by one meaning of “Web 2.0.” In academic circles, this meaning of Web 2.0 has become known as “peer production.”
The Dynamics of Peer Production
The aggregation of small contributions of individual work can make oncedifficult tasks seem easy. For example, Yochai Benkler has approvingly described the National Aeronautics and Space Administration’s (NASA’s) use of public volunteers, or “clickworkers.”26 NASA had a tedious job involving pictures of craters from the moon and Mars. These were standard bitmap images, and they wanted the craters to be vectorized: in other words, they wanted people to draw circles around the circles they saw in the photos. Writing some custom software and deploying it online, NASA asked Internet users at large to undertake the task. Much to NASA’s pleasant surprise, the clickworkers accomplished in a week what a single graduate student would have needed a year to complete.27 Cheap networks and PCs, coupled with the generative ability to costlessly offer new code for others to run, meant that those who wanted to pitch in to help NASA could do so.
The near-costless aggregation of far-flung work can be applied in contexts other than the drawing of circles around craters—or the production of a free encyclopedia like Wikipedia. Computer scientist Luis von Ahn, after noting that over nine billion person-hours were spent playing Windows Solitaire in a single year, devised the online “ESP” game, in which two remote players are randomly paired and shown an image. They are asked to guess the word that best describes the image, and when they each guess the same word they win points.28 Their actions also provide input to a database that reliably labels images for use in graphical search engines—improving the ability of image search engines to identify images. In real time, then, people are building and participating in a collective, organic, worldwide computer to perform tasks that real computers cannot easily do themselves.29
These kinds of grid applications produce (or at least encourage) certain kinds of public activity by combining small, individual private actions. Benkler calls this phenomenon “coordinate coexistence producing information.”30 Benkler points out that the same idea helps us find what we are looking for on the Internet, even if we do not go out of our way to play the ESP game; search engines commonly aggregate the artifacts of individual Internet activity, such as webmasters’ choices about where to link, to produce relevant search results. Search engines also track which links are most often clicked on in ordered search results in order, and then more prominently feature those links in future searches.31 The value of this human-derived wisdom has been noted by spammers, who create “link farms” of fake Web sites containing fragments of text drawn at random from elsewhere on the Web (“word salad”) that link back to the spammers’ sites in an attempt to boost their search engine rankings. The most useful links are ones placed on genuinely popular Web sites, though, and the piles of word salad do not qualify.
As a result, spammers have turned to leaving comments on popular blogs that ignore the original entry to which they are attached and instead simply provide links back to their own Web sites. In response, the authors of blogging software have incorporated so-called captcha boxes that must be navigated before anyone can leave a comment on a blog. Captchas—now used on many mainstream Web sites including Ticketmaster.com—ask users to prove that they are human by typing in, say, a distorted nonsense word displayed in a small graphic.32 Computers can start with a word and make a distorted image in a heartbeat, but they cannot easily reverse engineer the distorted image back to the word. This need for human intervention was intended to force spammers to abandon automated robots to place their blog comment spam. For a while they did, reportedly setting up captcha sweatshops that paid people to solve captchas from blog comment prompts all day long.33 (In 2003, the going rate was $2.50/hour for such work.)34 But spammers have continued to explore more efficient solutions. A spammer can write a program to fill in all the information but the captcha, and when it gets to the captcha it places it in front of a real person trying to get to a piece of information—say on a page a user might get after clicking a link that says, “You’ve just won $1000! Click here!”35—or perhaps a pornographic photo.36 The captcha had been copied that instant from a blog where a spammer’s robot was waiting to leave a comment, and then pasted into the prompt for the human wanting to see the next page. The human’s answer to the captcha was then instantly ported back over to the blog site in order to solve the captcha and leave the spammed comment.37 Predictably, companies have also sprung up to meet this demand, providing custom software to thwart captchas on a contract basis of $100 to $5,000 per project.38 Generative indeed: the ability to remix different pieces of the Web, and to deploy new code without gatekeepers, is crucial to the spammers’ work. Other uses of captchas are more benign but equally subtle: a project called reCAPTCHA provides an open API to substitute for regular captchas where a Web site might want to test to see if it is a human visiting.39 reCAPTCHA creates an image that pairs a standard, automatically generated test word image with an image of a word from an old book that a computer has been unable to properly scan and translate. When the user solves the captcha by entering both words, the first word is used to validate that the user is indeed human, and the second is used to put the human’s computing power to work to identify one more word of one more book that otherwise would be unscannable.
What do captchas have to do with privacy? New generative uses of the Internet have made the solutions proposed for Privacy 1.0 largely inapplicable. Fears about “mass dataveillance”40 are not misplaced, but they recognize only part of the problem, and one that represents an increasingly smaller slice of the pie. Solutions such as disclosure41 or encryption42 still work for Privacy 1.0, but new approaches are needed to meet the challenge of Privacy 2.0, in which sensitive data is collected and exchanged peer-to-peer in configurations as unusual as that of the spammers’ system for bypassing captchas.
The power of centralized databases feared in 1973 is now being replicated and amplified through generative uses of individual data and activity. For example, cheap sensors have allowed various gunshot-detecting technologies to operate through microphones in public spaces.43 If a shot is fired, sensors associated with the microphones triangulate the shot’s location and summon the police. To avoid false alarms, the system can be augmented with help from the public at large, minimizing the need for understaffed police to make the initial assessment about what is going on when a suspicious sound is heard. Interested citizens can review camera feeds near a reported shot and press a button if they see something strange happening on their computer monitors. Should a citizen do so, other citizens can be asked for verification. If the answer is yes, the police can be sent.
In November of 2006, the state of Texas spent $210,000 to set up eight webcams along the Mexico border as part of a pilot program to solicit the public’s help in reducing illegal immigration.44 Webcam feeds were sent to a public Web site, and people were invited to alert the police if they thought they saw suspicious activity. During the month-long trial the Web site took in just under twenty-eight million hits. No doubt many were from the curious rather than the helpful, but those wanting to volunteer came forward, too. The site registered over 220,000 users, and those users sent 13,000 e-mails to report suspicious activity. At three o’clock in the morning one woman at her PC saw someone signal a pickup truck on the webcam. She alerted police, who seized over four hundred pounds of marijuana from the truck’s occupants after a highspeed chase. In separate incidents, a stolen car was recovered, and twelve undocumented immigrants were stopped. To some—especially state officials— this was a success beyond any expectation;45 to others it was a paltry result for so much investment.46
Beyond any first-order success of stopping crime, some observers welcome involvement by members of the public as a check on law enforcement surveillance.47 Science fiction author David Brin foresaw increased use of cameras and other sensors by the government and adopted an if-you-can’t-beat-themjoin- them approach to dealing with the privacy threat. He suggested allowing ubiquitous surveillance so long as the watchers themselves were watched: live cameras could be installed in police cars, station houses, and jails. According to Brin, everyone watching everywhere would lessen the likelihood of unobserved government abuse. What the Rodney King video did for a single incident48— one that surely would have passed without major public notice but for the amateur video capturing what looked like excessive force by arresting officers— Brin’s proposal could do for nearly all state activities. Of course, Brin’s calculus does not adequately account for the invasions of privacy that would take place whenever random members of the public could watch—and perhaps record— every interaction between citizens and authorities, especially since many of those interactions take place at sensitive moments for the citizens. And ubiquitous surveillance can lead to other problems. The Sheriff’s Office of Anderson County, Tennessee, introduced one of the first live “jailcams” in the country, covering a little area in the jail where jailors sit and keep an eye on everything— the center of the panopticon.49 The Anderson County webcam was very Web 2.0: the Web site included a chat room where visitors could meet other viewers, there was a guestbook to sign, and a link to syndicated advertising to help fund the webcam. However, some began using the webcam to make crank calls to jailors at key moments and even, it is claimed, to coordinate the delivery of contraband.50 The webcam was shut down.
This example suggests a critical difference between Privacy 1.0 and 2.0. If the government is controlling the observation, then the government can pull the plug on such webcams if it thinks they are not helpful, balancing whatever policy factors it chooses.51 Many scholars have considered the privacy problems posed by cheap sensors and networks, but they focus on the situations where the sensors serve only government or corporate masters. Daniel Solove, for instance, has written extensively on emergent privacy concerns, but he has focused on the danger of “digital dossiers” created by businesses and governments.52 Likewise, Jerry Kang and Dana Cuff have written about how small sensors will lead to “pervasive computing,” but they worry that the technology will be abused by coordinated entities like shopping malls, and their prescriptions thus follow the pattern established by Privacy 1.0.53 Their concerns are not misplaced, but they represent an increasingly smaller part of the total picture. The essence of Privacy 2.0 is that government or corporations, or other intermediaries, need not be the source of the surveillance. Peer-to-peer technologies can eliminate points of control and gatekeeping from the transfer of personal data and information just as they can for movies and music. The intellectual property conflicts raised by the generative Internet, where people can still copy large amounts of copyrighted music without fear of repercussion, are rehearsals for the problems of Privacy 2.0.54
The Rodney King beating was filmed not by a public camera, but by a private one, and its novel use in 1991 is now commonplace. Many private cameras, including camera-equipped mobile phones, fit the generative mold as devices purchased for one purpose but frequently used for another. The Rodney King video, however, required news network attention to gain salience. Videos depicting similar events today gain attention without the prior approval of an intermediary.55 With cheap sensors, processors, and networks, citizens can quickly distribute to anywhere in the world what they capture in their backyard. Therefore, any activity is subject to recording and broadcast. Perform a search on a video aggregation site like YouTube for “angry teacher” or “road rage” and hundreds of videos turn up. The presence of documentary evidence not only makes such incidents reviewable by the public at large, but for, say, angry teachers it also creates the possibility of getting fired or disciplined where there had not been one before. Perhaps this is good: teachers are on notice that they must account for their behavior the way that police officers must take responsibility for their own actions.
If so, it is not just officers and teachers: we are all on notice. The famed “Bus Uncle” of Hong Kong upbraided a fellow bus passenger who politely asked him to speak more quietly on his mobile phone.56 The mobile phone user learned an important lesson in etiquette when a third person captured the argument and then uploaded it to the Internet, where 1.3 million people have viewed one version of the exchange.57 (Others have since created derivative versions of the exchange, including karaoke and a ringtone.) Weeks after the video was posted, the Bus Uncle was beaten up in a targeted attack at the restaurant where he worked.58 In a similar incident, a woman’s dog defecated on the floor of a South Korean subway. She refused to clean it up, even when offered a tissue—though she cleaned the dog—and left the subway car at the next stop. The incident was captured on a mobile phone camera and posted to the Internet, where the poster issued an all points bulletin seeking information about the dog owner and her relatives, and about where she worked. She was identified by others who had previously seen her and the dog, and the resulting firestorm of criticism apparently caused her to quit her job.59
The summed outrage of many unrelated people viewing a disembodied video may be disproportionate to whatever social norm or law is violated within that video. Lives can be ruined after momentary wrongs, even if merely misdemeanors. Recall verkeersbordvrij theory from Chapter Six: it suggests that too many road signs and driving rules change people into automatons, causing them to trade in common sense and judgment for mere hewing to exactly what the rules provide, no more and no less. In the same way, too much scrutiny can also turn us into automatons. Teacher behavior in a classroom, for example, is largely a matter of standards and norms rather than rules and laws, but the presence of scrutiny, should anything unusual happen, can halt desirable pedagogical risks if there is a chance those risks could be taken out of context, misconstrued, or become the subject of pillory by those with perfect hindsight.
These phenomena affect students as well as teachers, regular citizens rather than just those in authority. And ridicule or mere celebrity can be as chilling as outright disapprobation. In November 2002 a Canadian teenager used his high school’s video camera to record himself swinging a golf ball retriever as though it were a light saber from Star Wars.60 By all accounts he was doing it for his own amusement. The tape was not erased, and it was found the following spring by someone else who shared it, first with friends and then with the Internet at large. Although individuals want privacy for themselves, they will line up to see the follies of others, and by 2006 the “Star Wars Kid” was estimated to be the most popular word-of-mouth video on the Internet, with over nine hundred million cumulative views.61 It has spawned several parodies, including ones shown on prime time television. This is a consummately generative event: a repurposing of something made for completely different reasons, taking off beyond any expectation, and triggering further works, elaborations, and commentaries— both by other amateurs and by Hollywood.62 It is also clearly a privacy story. The student who made the video has been reported to have been traumatized by its circulation, and in no way did he seek to capitalize on his celebrity.
In this hyperscrutinized reality, people may moderate themselves instead of expressing their true opinions. To be sure, people have always balanced between public and private expression. As Mark Twain observed: “We are discreet sheep; we wait to see how the drove is going, and then go with the drove. We have two opinions: one private, which we are afraid to express; and another one—the one we use—which we force ourselves to wear to please Mrs. Grundy, until habit makes us comfortable in it, and the custom of defending it presently makes us love it, adore it, and forget how pitifully we came by it. Look at it in politics.”63
Today we are all becoming politicians. People in power, whether at parliamentary debates or press conferences, have learned to stick to carefully planned talking points, accepting the drawbacks of appearing stilted and saying little of substance in exchange for the benefits of predictability and stability.64 Ubiquitous sensors threaten to push everyone toward treating each public encounter as if it were a press conference, creating fewer spaces in which citizens can express their private selves.
Even the use of “public” and “private” to describe our selves and spaces is not subtle enough to express the kind of privacy we might want. By one definition they mean who manages the space: a federal post office is public; a home is private. A typical restaurant or inn is thus also private, yet it is also a place where the public gathers and mingles: someone there is “in public.” But while activities in private establishments open to the public are technically in the public eye,65 what transpires there is usually limited to a handful of eyewitnesses— likely strangers—and the activity is ephemeral. No more, thanks to cheap sensors and cheap networks to disseminate what they glean. As our previously private public spaces, like classrooms and restaurants, turn into public public spaces, the pressure will rise for us to be on press conference behavior.
There are both significant costs and benefits inherent in expanding the use of our public selves into more facets of daily life. Our public face may be kinder, and the expansion may cause us to rethink our private prejudices and excesses as we publicly profess more mainstream standards and, as Twain says, “habit makes us comfortable in it.” On the other hand, as law professors Eric Posner and Cass Sunstein point out, strong normative pressure can prevent outlying behavior of any kind, and group baselines can themselves be prejudiced. Outlying behavior is the generative spark found at the social layer, the cultural innovation out of left field that can later become mainstream. Just as our information technology environment has benefited immeasurably from experimentation by a variety of people with different aims, motives, and skills, so too is our cultural environment bettered when commonly held—and therefore sometimes rarely revisited—views can be challenged.66
The framers of the U.S. Constitution embraced anonymous speech in the political sphere as a way of being able to express unpopular opinions without having to experience personal disapprobation.67 No defense of a similar principle was needed for keeping private conversations in public spaces from becoming public broadcasts—disapprobation that begins with small “test” groups but somehow becomes society-wide—since there were no means by which to perform that transformation. Now that the means are there, a defense is called for lest we run the risk of letting our social system become metaphorically more appliancized: open to change only by those few radicals so disconnected from existing norms as to not fear their imposition at all.
Privacy 2.0 is about more than those who are famous or those who become involuntary “welebrities.” For those who happen to be captured doing particularly fascinating or embarrassing things, like Star Wars Kid or an angry teacher, a utilitarian might say that nine hundred million views is first-order evidence of a public benefit far exceeding the cost to the student who made the video. It might even be pointed out that the Star Wars Kid failed to erase the tape, so he can be said to bear some responsibility for its circulation. But the next-generation privacy problem cannot be written off as affecting only a few unlucky victims. Neither can it be said to affect only genuine celebrities who must now face constant exposure not only to a handful of professional paparazzi but also to hordes of sensor-equipped amateurs. (Celebrities must now contend with the consequences of cell phone videos of their slightest aberrations—such as one in which a mildly testy exchange with a valet parker is quickly circulated and exaggerated online68—or more comprehensive peer-produced sites like Gawker Stalker,69 where people send in local sightings of celebrities as they happen. Gawker strives to relay the sightings within fifteen minutes and place them on a Google map, so that if Jack Nicholson is at Starbucks, one can arrive in time to stand awkwardly near him before he finishes his latte.)
Cybervisionary David Weinberger’s twist on Andy Warhol’s famous quotation is the central issue for the rest of us: “On the Web, everyone will be famous to fifteen people.”70 Although Weinberger made his observation in the context of online expression, explaining that microaudiences are worthy audiences, it has further application. Just as cheap networks made it possible for businesses to satisfy the “long tail,” serving the needs of obscure interests every bit as much as popular ones71 (Amazon is able to stock a selection of books virtually far beyond the best sellers found in a physical bookstore), peer-produced databases can be configured to track the people who are of interest only to a few others.
How will the next-generation privacy problem affect average citizens? Early photo aggregation sites like Flickr were premised on a seemingly dubious assumption that turned out to be true: not only would people want an online repository for their photos, but they would often be pleased to share them with the public at large. Such sites now boast hundreds of millions of photos,72 many of which are also sorted and categorized thanks to the same distributed energy that got Mars’s craters promptly mapped. Proponents of Web 2.0 sing the praises of “folksonomies” rather than taxonomies—bottom-up tagging done by strangers rather than expert-designed and -applied canonical classifications like the Dewey Decimal System or the Library of Congress schemes for sorting books.73 Metadata describing the contents of pictures makes images far more useful and searchable. Combining user-generated tags with automatically generated data makes pictures even more accessible. Camera makers now routinely build cameras that use global positioning systems to mark exactly where on the planet each picture it snaps was taken and, of course, to time- and datestamp them. Web sites like Riya, Polar Rose, and MyHeritage are perfecting facial recognition technologies so that once photos of a particular person are tagged a few times with his or her name, their computers can then automatically label all future photos that include the person—even if their image appears in the background. In August 2006 Google announced the acquisition of Neven Vision, a company working on photo recognition, and in May 2007 Google added a feature to its image search so that only images of people could be returned (to be sure, still short of identifying which image is which).74 Massachusetts officials have used such technology to compare mug shots in “Wanted” posters to driver’s license photos, leading to arrests.75 Mash together these technologies and functionalities through the kind of generative mixing allowed by their open APIs and it becomes trivial to receive answers to questions like: Where was Jonathan Zittrain last year on the fourteenth of February?, or, Who could be found near the entrance to the local Planned Parenthood clinic in the past six months? The answers need not come from government or corporate cameras, which are at least partially secured against abuse through well-considered privacy policies from Privacy 1.0. Instead, the answers come from a more powerful, generative source: an army of the world’s photographers, including tourists sharing their photos online without firm (or legitimate) expectations of how they might next be used and reused.
As generativity would predict, those uses may be surprising or even offensive to those who create the new tools or provide the underlying data. The Christian Gallery News Service was started by antiabortion activist Neal Horsley in the mid 1990s. Part of its activities included the Nuremberg Files Web site, where the public was solicited for as much information as possible about the identities, lives, and families of physicians who performed abortions, as well as about clinic owners and workers.76 When a provider was killed, a line would be drawn through his or her name. (The site was rarely updated with new information, and it became entangled in a larger lawsuit lodged under the U.S. Freedom of Access to Clinic Entrances Act.77 The site remains accessible.) An associated venture solicits the public to take pictures of women arriving at clinics, including the cars in which they arrive (and corresponding license plates), and posts the pictures in order to deter people from nearing clinics.78
With image recognition technology mash-ups, photos taken as people enter clinics or participate in protests can be instantly cross-referenced with their names. One can easily pair this type of data with Google Maps to provide finegrained satellite imagery of the homes and neighborhoods of these individuals, similar to the “subversive books” maps created by computer consultant and tinkerer Tom Owad, tracking wish lists on Amazon.79
This intrusion can reach places that the governments of liberal democracies refuse to go. In early 2007, a federal court overseeing the settlement of a class action lawsuit over New York City police surveillance of public activities held that routine police videotaping of public events was in violation of the settlement: “The authority . . . conferred upon the NYPD ‘to visit any place and attend any event that is open to the public, on the same terms and conditions of the public generally,’ cannot be stretched to authorize police officers to videotape everyone at a public gathering just because a visiting little old lady from Dubuque . . . could do so. There is a quantum difference between a police officer and the little old lady (or other tourist or private citizen) videotaping or photographing a public event.”80
The court expressed concern about a chilling of speech and political activities if authorities were videotaping public events. But police surveillance becomes moot when an army of little old ladies from Dubuque is naturally videotaping and sharing nearly everything—protests, scenes inside a mall (such that amateur video exists of a random shootout in a Salt Lake City, Utah, mall),81 or picnics in the park. Peer-leveraging technologies are overstepping the boundaries that laws and norms have defined as public and private, even as they are also facilitating beneficial innovation. Cheap processors, networks, and sensors enable a new form of beneficial information flow as citizen reporters can provide footage and frontline analysis of newsworthy events as they happen.82 For example, OhmyNews is a wildly popular online newspaper in South Korea with citizen-written articles and reports. (Such writers provide editors with their names and national identity numbers so articles are not anonymous.) Similarly, those who might commit atrocities within war zones can now be surveilled and recorded by civilians so that their actions may be watched and ultimately punished, a potential sea change for the protection of human rights.83
For privacy, peer-leveraging technologies might make for a much more constrained world rather than the more chaotic one that they have wrought for intellectual property. More precisely, a world where bits can be recorded, manipulated, and transmitted without limitation means, in copyright, a free-for-all for the public and constraint upon firms (and perhaps upstream artists) with content to protect. For privacy, the public is variously creator, beneficiary, and victim of the free-for-all. The constraints—in the form of privacy invasion that Jeffrey Rosen crystallizes as an “unwanted gaze”—now come not only from the well-organized governments or firms of Privacy 1.0, but from a few people generatively drawing upon the labors of many to greatly impact rights otherwise guaranteed by a legal system.
Privacy and Reputation
At each layer where a generative pattern can be discerned, this book has asked whether there is a way to sift out what we might judge to be bad generative results from the good ones without unduly damaging the system’s overall generativity. This is the question raised at the technical layer for network security, at the content layer for falsehoods in Wikipedia and failures of intellectual property protection, and now at the social layer for privacy. Can we preserve generative innovations without giving up our core privacy values? Before turning to answers, it is helpful to explore a final piece of the Privacy 2.0 mosaic: the impact of emerging reputation systems. This is both because such systems can greatly impact our privacy and because this book has suggested reputational tools as a way to solve the generative sifting problem at other layers.
Search is central to a functioning Web,84 and reputation has become central to search. If people already know exactly what they are looking for, a network needs only a way of registering and indexing specific sites. Thus, IP addresses are attached to computers, and domain names to IP addresses, so that we can ask for www.drudgereport.com and go straight to Matt Drudge’s site. But much of the time we want help in finding something without knowing the exact online destination. Search engines help us navigate the petabytes of publicly posted information online, and for them to work well they must do more than simply identify all pages containing the search terms that we specify. They must rank them in relevance. There are many ways to identify what sites are most relevant. A handful of search engines auction off the top-ranked slots in search results on given terms and determine relevance on the basis of how much the site operators would pay to put their sites in front of searchers.85 These search engines are not widely used. Most have instead turned to some proxy for reputation. As mentioned earlier, a site popular with others—with lots of inbound links—is considered worthier of a high rank than an unpopular one, and thus search engines can draw upon the behavior of millions of other Web sites as they sort their search results.86 Sites like Amazon deploy a different form of ranking, using the “mouse droppings” of customer purchasing and browsing behavior to make recommendations—so they can tell customers that “people who like the Beatles also like the Rolling Stones.” Search engines can also more explicitly invite the public to express its views on the items it ranks, so that users can decide what to view or buy on the basis of others’ opinions. Amazon users can rate and review the items for sale, and subsequent users then rate the first users’ reviews. Sites like Digg and Reddit invite users to vote for stories and articles they like, and tech news site Slashdot employs a rating system so complex that it attracts much academic attention.87
eBay uses reputation to help shoppers find trustworthy sellers. eBay users rate each others’ transactions, and this trail of ratings then informs future buyers how much to trust repeat sellers. These rating systems are crude but powerful. Malicious sellers can abandon poorly rated eBay accounts and sign up for new ones, but fresh accounts with little track record are often viewed skeptically by buyers, especially for proposed transactions involving expensive items. One study confirmed that established identities fare better than new ones, with buyers willing to pay, on average, over 8 percent more for items sold by highly regarded, established sellers.88 Reputation systems have many pitfalls and can be gamed, but the scholarship seems to indicate that they work reasonably well.89 There are many ways reputation systems might be improved, but at their core they rely on the number of people rating each other in good faith well exceeding the number of people seeking to game the system—and a way to exclude robots working for the latter. For example, eBay’s rating system has been threatened by the rise of “1-cent eBooks” with no shipping charges; sellers can create alter egos to bid on these nonitems and then have the phantom users highly rate the transaction.90 One such “feedback farm” earned a seller a thousand positive reviews over four days. eBay intervenes to some extent to eliminate such gaming, just as Google reserves the right to exact the “Google death penalty” by de-listing any Web site that it believes is unduly gaming its chances of a high search engine rating.91
These reputation systems now stand to expand beyond evaluating people’s behavior in discrete transactions or making recommendations on products or content, into rating people more generally. This could happen as an extension of current services—as one’s eBay rating is used to determine trustworthiness on, say, another peer-to-peer service. Or, it could come directly from social networking: Cyworld is a social networking site that has twenty million subscribers; it is one of the most popular Internet services in the world, largely thanks to interest in South Korea.92 The site has its own economy, with $100 million worth of “acorns,” the world’s currency, sold in 2006.93
Not only does Cyworld have a financial market, but it also has a market for reputation. Cyworld includes behavior monitoring and rating systems that make it so that users can see a constantly updated score for “sexiness,” “fame,” “friendliness,” “karma,” and “kindness.” As people interact with each other, they try to maximize the kinds of behaviors that augment their ratings in the same way that many Web sites try to figure out how best to optimize their presentation for a high Google ranking.94 People’s worth is defined and measured precisely, if not accurately, by the reactions of others. That trend is increasing as social networking takes off, partly due to the extension of online social networks beyond the people users already know personally as they “befriend” their friends’ friends’ friends.
The whole-person ratings of social networks like Cyworld will eventually be available in the real world. Similar real-world reputation systems already exist in embryonic form. Law professor Lior Strahilevitz has written a fascinating monograph on the effectiveness of “How’s My Driving” programs, where commercial vehicles are emblazoned with bumper stickers encouraging other drivers to report poor driving.95 He notes that such programs have resulted in significant accident reductions, and analyzes what might happen if the program were extended to all drivers. A technologically sophisticated version of the scheme dispenses with the need to note a phone number and file a report; one could instead install transponders in every vehicle and distribute TiVo-like remote controls to drivers, cyclists, and pedestrians. If someone acts politely, say by allowing you to switch lanes, you can acknowledge it with a digital thumbsup that is recorded on that driver’s record. Cutting someone off in traffic earns a thumbs-down from the victim and other witnesses. Strahilevitz is supportive of such a scheme, and he surmises it could be even more effective than eBay’s ratings for online transactions since vehicles are registered by the government, making it far more difficult escape poor ratings tied to one’s vehicle. He acknowledges some worries: people could give thumbs-down to each other for reasons unrelated to their driving—racism, for example. Perhaps a bumper sticker expressing support for Republicans would earn a thumbs-down in a blue state. Strahilevitz counters that the reputation system could be made to eliminate “outliers”—so presumably only well-ensconced racism across many drivers would end up affecting one’s ratings. According to Strahilevitz, this system of peer judgment would pass constitutional muster if challenged, even if the program is run by the state, because driving does not implicate one’s core rights. “How’s My Driving?” systems are too minor to warrant extensive judicial review. But driving is only the tip of the iceberg.
Imagine entering a café in Paris with one’s personal digital assistant or mobile phone, and being able to query: “Is there anyone on my buddy list within 100 yards? Are any of the ten closest friends of my ten closest friends within 100 yards?” Although this may sound fanciful, it could quickly become mainstream. With reputation systems already advising us on what to buy, why not have them also help us make the first cut on whom to meet, to date, to befriend? These are not difficult services to offer, and there are precursors today.96 These systems can indicate who has not offered evidence that he or she is safe to meet—as is currently solicited by some online dating sites—or it may use Amazon-style matching to tell us which of the strangers who have just entered the café is a good match for people who have the kinds of friends we do. People can rate their interactions with each other (and change their votes later, so they can show their companion a thumbs-up at the time of the meeting and tell the truth later on), and those ratings will inform future suggested acquaintances. With enough people adopting the system, the act of entering a café can be different from one person to the next: for some, the patrons may shrink away, burying their heads deeper in their books and newspapers. For others, the entire café may perk up upon entrance, not knowing who it is but having a lead that this is someone worth knowing. Those who do not participate in the scheme at all will be as suspect as brand new buyers or sellers on eBay.
Increasingly, difficult-to-shed indicators of our identity will be recorded and captured as we go about our daily lives and enter into routine transactions— our fingerprints may be used to log in to our computers or verify our bank accounts, our photo may be snapped and tagged many times a day, or our license plate may be tracked as people judge our driving habits. The more our identity is associated with our daily actions, the greater opportunities others will have to offer judgments about those actions. A government-run system like the one Strahilevitz recommends for assessing driving is the easy case. If the state is the record keeper, it is possible to structure the system so that citizens can know the basis of their ratings—where (if not by whom) various thumbs-down clicks came from—and the state can give a chance for drivers to offer an explanation or excuse, or to follow up. The state’s formula for meting out fines or other penalties to poor drivers would be known (“three strikes and you’re out,” for whatever other problems it has, is an eminently transparent scheme), and it could be adjusted through accountable processes, just as legislatures already determine what constitutes an illegal act, and what range of punishment it should earn.
Generatively grown but comprehensively popular unregulated systems are a much trickier case. The more that we rely upon the judgments offered by these private systems, the more harmful that mistakes can be.97 Correcting or identifying mistakes can be difficult if the systems are operated entirely by private parties and their ratings formulas are closely held trade secrets. Search engines are notoriously resistant to discussing how their rankings work, in part to avoid gaming—a form of security through obscurity.98 The most popular engines reserve the right to intervene in their automatic rankings processes—to administer the Google death penalty, for example—but otherwise suggest that they do not centrally adjust results. Hence a search in Google for “Jew” returns an anti- Semitic Web site as one of its top hits,99 as well as a separate sponsored advertisement from Google itself explaining that its rankings are automatic.100 But while the observance of such policies could limit worries of bias to search algorithm design rather than to the case-by-case prejudices of search engine operators, it does not address user-specific bias that may emerge from personalized judgments.
Amazon’s automatic recommendations also make mistakes; for a period of time the Official Lego Creator Activity Book was paired with a “perfect partner” suggestion: American Jihad: The Terrorists Living Among Us Today. If such mismatched pairings happen when discussing people rather than products, rare mismatches could have worse effects while being less noticeable since they are not universal. The kinds of search systems that say which people are worth getting to know and which should be avoided, tailored to the users querying the system, present a set of due process problems far more complicated than a stateoperated system or, for that matter, any system operated by a single party. The generative capacity to share data and to create mash-ups means that ratings and rankings can be far more emergent—and far more inscrutable.
SOLVING THE PROBLEMS OF PRIVACY 2.0
Cheap sensors generatively wired to cheap networks with cheap processors are transforming the nature of privacy. How can we respond to the notion that nearly anything we do outside our homes can be monitored and shared? How do we deal with systems that offer judgments about what to read or buy, and whom to meet, when they are not channeled through a public authority or through something as suable, and therefore as accountable, as Google?
The central problem is that the organizations creating, maintaining, using, and disseminating records of identifiable personal data are no longer just “organizations”— they are people who take pictures and stream them online, who blog about their reactions to a lecture or a class or a meal, and who share on social sites rich descriptions of their friends and interactions. These databases are becoming as powerful as the ones large institutions populate and centrally define. Yet the sorts of administrative burdens we can reasonably place on established firms exceed those we can place on individuals—at some point, the burden of compliance becomes so great that the administrative burdens are tantamount to an outright ban. That is one reason why so few radio stations are operated by individuals: it need not be capital intensive to set up a radio broadcasting tower—a low-power neighborhood system could easily fit in someone’s attic—but the administrative burdens of complying with telecommunications law are well beyond the abilities of a regular citizen. Similarly, we could create a privacy regime so complicated as to frustrate generative developments by individual users.
The 1973 U.S. government report on privacy crystallized the template for Privacy 1.0, suggesting five elements of a code of fair information practice:
- There must be no personal data record-keeping systems whose very existence is secret.
- There must be a way for an individual to find out what information about him is in a record and how it is used.
- There must be a way for an individual to prevent information about him that was obtained for one purpose from being used or made available for other purposes without his consent.
- There must be a way for an individual to correct or amend a record of identifiable information about him.
- Any organization creating, maintaining, using, or disseminating records of identifiable personal data must assure the reliability of the data for their intended use and must take precautions to prevent misuse of the data.101
These recommendations present a tall order for distributed, generative systems. It may seem clear that the existence of personal data record-keeping systems ought not to be kept secret, but this issue was easier to address in 1973, when such systems were typically large consumer credit databases or government dossiers about citizens, which could more readily be disclosed and advertised by the relevant parties. It is harder to apply the antisecrecy maxim to distributed personal information databases. When many of us maintain records or record fragments on one another, and through peer-produced social networking services like Facebook or MySpace share these records with thousands of others, or allow them to be indexed to create powerful mosaics of personal data, then exactly what the database is changes from one moment to the next—not simply in terms of its contents, but its very structure and scope. Such databases may be generally unknown while not truly “secret.”102
Further, these databases are ours. It is one thing to ask a corporation to disclose the personal data and records it maintains; it is far more intrusive to demand such a thing of private citizens. Such disclosure may itself constitute an intrusive search upon the citizen maintaining the records. Similarly, the idea of mandating that an individual be able to find out what an information gatherer knows—much less to correct or amend the information—is categorically more difficult to implement when what is known is distributed across millions of people’s technological outposts. To be sure, we can Google ourselves, but this does not capture those databases open only to “friends of friends”—a category that may not include us but may include thousands of others. At the same time, we may have minimal recourse when the information we thought we were circulating within social networking sites merely for fun and, say, only among fellow college students, ends up leaking to the world at large.103
What to do? There is a combination of steps drawn from the solutions sketched in the previous two chapters that might ameliorate the worst of Privacy 2.0’s problems, and even provide a framework in which to implement some of the Privacy 1.0 solutions without rejecting the generative framework that gives rise to Privacy 2.0 in the first place.
The Power of Code-Backed Norms
The Web is disaggregated. Its pieces are bound together into a single virtual database by private search engines like Google. Google and other search engines assign digital robots to crawl the Web as if they were peripatetic Web surfers, clicking on one link after another, recording the results, and placing them into a concordance that can then be used for search.104
Early on, some wanted to be able to publish material to the Web without it appearing in search engines. In the way a conversation at a pub is a private matter unfolding in a public (but not publicly owned) space, these people wanted their sites to be private but not secret. The law offers one approach to vindicate this desire for privacy but not secrecy. It could establish a framework delineating the scope and nature of a right in one’s Web site being indexed, and providing for penalties for those who infringe that right. An approach of this sort has well-known pitfalls. For example, it would be difficult to harmonize such doctrine across various jurisdictions around the world,105 and there would be technical questions as to how a Web site owner could signal his or her choice to would-be robot indexers visiting the site.
The Internet community, however, fixed most of the problem before it could become intractable or even noticeable to mainstream audiences. A software engineer named Martijn Koster was among those discussing the issue of robot signaling on a public mailing list in 1993 and 1994. Participants, including “a majority of robot authors and other people with an interest in robots,” converged on a standard for “robots.txt,” a file that Web site authors could create that would be inconspicuous to Web surfers but in plain sight to indexing robots.106 Through robots.txt, site owners can indicate preferences about what parts of the site ought to be crawled and by whom. Consensus among some influential Web programmers on a mailing list was the only blessing this standard received: “It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there [sic] no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community to protect WWW server [sic] against unwanted accesses by their robots.”107
Today, nearly all Web programmers know robots.txt is the way in which sites can signal their intentions to robots, and these intentions are respected by every major search engine across differing cultures and legal jurisdictions.108 On this potentially contentious topic—search engines might well be more valuable if they indexed everything, especially content marked as something to avoid— harmony was reached without any application of law. The robots.txt standard did not address the legalities of search engines and robots; it merely provided a way to defuse many conflicts before they could even begin. The apparent legal vulnerabilities of robots.txt—its lack of ownership or backing of a large private standards setting organization, and the absence of private enforcement devices— may in fact be essential to its success.109 Law professor Jody Freeman and others have written about the increasingly important role played by private organizations in the formation of standards across a wide range of disciplines and the ways in which some organizations incorporate governmental notions of due process in their activities.110 Many Internet standards have been forged much less legalistically but still cooperatively.111
The questions not preempted or settled by such cooperation tend to be clashes between firms with some income stream in dispute—and where the law has then partially weighed in. For example, eBay sued data aggregator Bidder’s Edge for using robots to scrape its site even after eBay clearly objected both in person and through robots.txt. eBay won in a case that has made it singularly into most cyberlaw casebooks and even into a few general property casebooks— a testament to how rarely such disputes enter the legal system.112
Similarly, the safe harbors of the U.S. Digital Millennium Copyright Act of 1998 give some protection to search engines that point customers to material that infringes copyright,113 but they do not shield the actions required to create the search database in the first place. The act of creating a search engine, like the act of surfing itself, is something so commonplace that it would be difficult to imagine deeming it illegal—but this is not to say that search engines rest on any stronger of a legal basis than the practice of using robots.txt to determine when it is and is not appropriate to copy and archive a Web site.114 Only recently, with Google’s book scanning project, have copyright holders really begun to test this kind of question.115 That challenge has arisen over the scanning of paper books, not Web sites, as Google prepares to make them searchable in the same way Google has indexed the Web.116 The long-standing practice of Web site copying, guided by robots.txt, made that kind of indexing uncontroversial even as it is, in theory, legally cloudy.
The lasting lesson from robots.txt is that a simple, basic standard created by people of good faith can go a long way toward resolving or forestalling a problem containing strong ethical or legal dimensions. The founders of Creative Commons created an analogous set of standards to allow content creators to indicate how they would like their works to be used or reused. Creative Commons licenses purport to have the force of law behind them—one ignores them at the peril of infringing copyright—but the main force of Creative Commons as a movement has not been in the courts, but in cultural mindshare: alerting authors to basic but heretofore hidden options they have for allowing use of the photos, songs, books, or blog entries they create, and alerting those who make use of the materials to the general orientation of the author.
Creative Commons is robots.txt generalized. Again, the legal underpinnings of this standard are not particularly strong. For example, one Creative Commons option is “noncommercial,” which allows authors to indicate that their material can be reused without risk of infringement so long as the use is noncommercial. But the definition of noncommercial is a model of vagueness, the sort of definition that could easily launch a case like eBay v. Bidder’s Edge.117 If one aggregates others’ blogs on a page that has banner ads, is that a commercial use? There have been only a handful of cases over Creative Commons licenses, and none testing the meaning of noncommercial.118 Rather, people seem to know a commercial (or derivative) use when they see it: the real power of the license may have less to do with a threat of legal enforcement and more to do with the way it signals one’s intentions and asks that they be respected. Reliable empirical data is absent, but the sense among many of those using Creative Commons licenses is that their wishes have been respected.119
Applying Code-Backed Norms to Privacy: Data Genealogy
As people put data on the Internet for others to use or reuse—data that might be about other people as well as themselves—there are no tools to allow those who provide the data to express their preferences about how the data ought to be indexed or used. There is no Privacy Commons license to request basic limits on how one’s photographs ought to be reproduced from a social networking site. There ought to be. Intellectual property law professor Pamela Samuelson has proposed that in response to the technical simplicity of collecting substantial amounts of personal information in cyberspace, a person should have a protectable right to control this personal data. She notes that a property-based legal framework is more difficult to impose when one takes into account the multiple interests a person might have in her personal data, and suggests a move to a contractual approach to protecting information privacy based in part on enforcement of Web site privacy policies.120 Before turning to law directly, we can develop tools to register and convey authors’ privacy-related preferences unobtrusively.
On today’s Internet, the copying and pasting of information takes place with no sense of metadata.121 It is difficult enough to make sure that a Creative Commons license follows the photograph, sound, or text to which it is related as those items circulate on the Web. But there is no standard at all to pass along for a given work and who recorded it, with what devices,122 and most important, what the subject is comfortable having others do with it. If there were, links could become two-way. Those who place information on the Web could more readily canvas the public uses to which that information had been put and by whom. In turn, those who wish to reuse information would have a way of getting in touch with its original source to request permission. Some Web 2.0 outposts have generated promising rudimentary methods for this. Facebook, for example, offers tools to label the photographs one submits and to indicate what groups of people can and cannot see them. Once a photo is copied beyond the Facebook environment, however, these attributes are lost.123
The Web is a complex social phenomenon with information contributed not only by institutional sources like Britannica, CNN, and others that place large amounts of structured information on it, but also by amateurs like Wikipedians, Flickr contributors, and bloggers. Yet a Google search intentionally smoothes over this complexity; each linked search result is placed into a standard format to give the act of searching structure and order. Search engines and other aggregators can and should do more to enrich users’ understanding of where the information they see is coming from. This approach would shadow the way that Theodor Nelson, coiner of the word “hypertext,” envisioned “transclusion”—a means not to simply copy text, but also to reference it to its original source.124 Nelson’s vision was drastic in its simplicity: information would repose primarily at its source, and any quotes to it would simply frame that source. If it were deleted from the original source, it would disappear from its subsequent uses. If it were changed at the source, downstream uses would change with it. This is a strong version of the genealogy idea, since the metadata about an item’s origin would actually be the item itself. It is data as service, and insofar as it leaves too much control with the data’s originator, it suffers from many of the drawbacks of software as service described in Chapter Five. For the purposes of privacy, we do not need such a radical reworking of the copy-and-paste culture of the Web. Rather, we need ways for people to signal whether they would like to remain associated with the data they place on the Web, and to be consulted about unusual uses.
This weaker signaling-based version of Nelson’s vision does not answer the legal question of what would happen if the originator of the data could not come to an agreement with someone who wanted to use it. But as with robots .txt and Creative Commons licenses, it could forestall many of the conflicts that will arise in the absence of any standard at all.125 Most importantly, it would help signal authorial intention not only to end users but also to the intermediaries whose indices provide the engines for invasions of privacy in the first place. One could indicate that photos were okay to index by tag but not by facial recognition, for example. If search engines of today are any indication, such restrictions could be respected even without a definitive answer as to the extent of their legal enforceability. Indeed, by attaching online identity—if not physical identity—to the various bits of data that are constantly mashed up as people copy and paste what they like around the Web, it becomes possible for people to get in touch with one another more readily to express thanks, suggest collaboration, or otherwise interact as people in communities do. Similarly, projects like reCAPTCHA could seek to alert people to the extra good their solving of captchas is doing—and even let them opt out of solving the second word in the image, the one that is not testing whether they are human but instead is being used to perform work for someone else. Just as Moore v. Regents of the University of California struggled with the issue of whether a patient whose tumor was removed should be consulted before the tumor is used for medical research,126 we will face the question of when people ought to be informed when their online behaviors are used for ulterior purposes—including beneficial ones.
Respect for robots.txt, Creative Commons licenses, and privacy “tags,” and an opportunity to alert people and allow them to opt in to helpful ventures with their routine online behavior like captcha-solving, both requires and promotes a sense of community. Harnessing some version of Nelson’s vision is a self-reinforcing community-building exercise—bringing people closer together while engendering further respect for people’s privacy choices. It should be no surprise that people tend to act less charitably in today’s online environment than they would act in the physical world.127 Recall the discussion of verkeersbordvrij in Chapter Six, where the elimination of most traffic signs can counterintuitively reduce accidents. Today’s online environment is only half of the verkeersbordvrij system: there are few perceived rules, but there are also few ways to receive, and therefore respect, cues from those whose content or data someone might be using.128 Verkeersbordvrij depends not simply on eliminating most legal rules and enforcement, but also, in the view of its proponents, crucially on motorists’ ability to roll down their windows and make eye contact with other motorists and pedestrians, to signal each other, and to pull themselves away from the many distractions like mobile phones and snacking that turn driving into a mechanical operation rather than a social act. By devising tools and practices to connect distant individuals already building upon one another’s data, we can promote the feedback loops found within functioning communities and build a framework to allow the nicely part of Benkler’s “sharing nicely” to blossom.129
Enabling Reputation Bankruptcy
As biometric readers become more commonplace in our endpoint machines, it will be possible for online destinations routinely to demand unsheddable identity tokens rather than disposable pseudonyms from Internet users. Many sites could benefit from asking people to participate with real identities known at least to the site, if not to the public at large. eBay, for one, would certainly profit by making it harder for people to shift among various ghost accounts. One could even imagine Wikipedia establishing a “fast track” for contributions if they were done with biometric assurance, just as South Korean citizen journalist newspaper OhmyNews keeps citizen identity numbers on file for the articles it publishes.130 These architectures protect one’s identity from the world at large while still making it much more difficult to produce multiple false “sock puppet” identities. When we participate in other walks of life—school, work, PTA meetings, and so on—we do so as ourselves, not wearing Groucho mustaches, and even if people do not know exactly who we are, they can recognize us from one meeting to the next. The same should be possible for our online selves.
As real identity grows in importance on the Net, the intermediaries demanding it ought to consider making available a form of reputation bankruptcy. Like personal financial bankruptcy, or the way in which a state often seals a juvenile criminal record and gives a child a “fresh start” as an adult, we ought to consider how to implement the idea of a second or third chance into our digital spaces. People ought to be able to express a choice to deemphasize if not entirely delete older information that has been generated about them by and through various systems: political preferences, activities, youthful likes and dislikes. If every action ends up on one’s “permanent record,” the press conference effect can set in. Reputation bankruptcy has the potential to facilitate desirably experimental social behavior and break up the monotony of static communities online and offline.131 As a safety valve against excess experimentation, perhaps the information in one’s record could not be deleted selectively; if someone wants to declare reputation bankruptcy, we might want it to mean throwing out the good along with the bad. The blank spot in one’s history indicates a bankruptcy has been declared—this would be the price one pays for eliminating unwanted details.
The key is to realize that we can make design choices now that work to capture the nuances of human relations far better than our current systems, and that online intermediaries might well embrace such new designs even in the absence of a legal mandate to do so.
More, Not Less, Information
Reputation bankruptcy provides for the possibility of a clean slate. It works best within informationally hermetic systems that generate their own data through the activities of their participants, such as a social networking site that records who is friends with whom, or one that accumulates the various thumbs-up and thumbs-down array that could be part of a “How’s My Driving”–style judgment.
But the use of the Internet more generally to spread real-world information about people is not amenable to reputation bankruptcy. Once injected into the Net, an irresistible video of an angry teacher, or a drunk and/or racist celebrity, cannot be easily stamped out without the kinds of network or endpoint control that are both difficult to implement and, if implemented, unacceptably corrosive to the generative Internet. What happens if we accept this as fact, and also assume that legal proscriptions against disseminating sensitive but popular data will be highly ineffective?132 We might turn to contextualization: the idea, akin to the tort of false light, that harm comes from information plucked out of the rich thread of a person’s existence and expression.133 We see this in political controversies—even the slightest misphrasing of something can be extracted and blown out of proportion. It is the reason that official press conferences are not the same as bland conversation; they are even blander.
Contextualization suggests that the aim of an informational system should be to allow those who are characterized within it to augment the picture provided by a single snippet with whatever information, explanation, or denial that they think helps frame what is portrayed. Civil libertarians have long suggested that the solution to bad speech is more speech while realizing the difficulties of linking the second round of speech to the first without infringing the rights of the first speaker.134 Criticisms of the “more speech” approach have included the observation that a retraction or amendment of a salacious newspaper story usually appears much less prominently than the original. This is particularly true for newspapers, where those seeing one piece of information may not ever see the follow-up. There is also the worry that the fog of information generated by a free-for-all is no way to have people discern facts from lies. Generative networks invite us to find ways to reconcile these views. We can design protocols to privilege those who are featured or described online so that they can provide their own framing linked to their depictions. This may not accord with our pre-Web expectations: it may be useful for a private newspaper to provide a right of reply to its subjects, but such an entity would quickly invoke a First Amendment–style complaint of compelled speech if the law were to provide for routine rights of reply in any but the narrowest of circumstances.135 And many of us might wish to discuss Holocaust deniers or racists without giving them a platform to even link to a reply. The path forward is likely not a formal legal right but a structure to allow those disseminating information to build connections to the subjects of their discussions. In many cases those of us disseminating may not object—and a properly designed system might turn what would have otherwise been one-sided exchanges into genuine dialogues.
We already see some movement in this direction. The Harvard Kennedy School’s Joseph Nye has suggested that a site like urban legend debunker snopes.com be instituted for reputation, a place that people would know to check to get the full story when they see something scandalous but decontextualized online.136 The subjects of the scandalous data would similarly know to place their answers there—perhaps somewhat mitigating the need to formally link it to each instance of the original data. Google invites people quoted or discussed within news stories to offer addenda and clarification directly to Google, which posts these responses prominently near its link to the story when it is a search result within Google News.137 Services like reputationdefender.com will, for a fee, take on the task of trying to remove or, failing that, contextualize sensitive information about people online.138 ReputationDefender uses a broad toolkit of tactics to try to clear up perceived invasions of privacy—mostly moral suasion rather than legal threat.
To be sure, contextualization addresses just one slice of the privacy problem, since it only adds information to a sensitive depiction. If the depiction is embarrassing or humiliating, the opportunity to express that one is indeed embarrassed or humiliated does not much help. It may be that values of privacy are implacably in tension with some of the fruits of generativity. Just as the digital copyright problem could be solved if publishers could find a way to profit from abundance rather than scarcity, the privacy problem could be solved if we could take Sun Microsystems CEO McNealey’s advice and simply get over it. This is not a satisfying rejoinder to someone whose privacy has been invaded, but, amazingly, this may be precisely what is happening: people are getting over it.
THE GENERATIONAL DIVIDE: BEYOND INFORMATIONAL PRIVACY
The values animating our concern for privacy are themselves in transition. Many have noted an age-driven gap in attitudes about privacy perhaps rivaled only by the 1960s generation gap on rock and roll.139 Surveys bear out some of this perception.140 Fifty-five percent of online teens have created profiles on sites like MySpace, though 66 percent of those use tools that the sites offer to limit access in some way.141 Twice as many teens as adults have a blog.142 Interestingly, while young people appear eager to share information online, they are more worried than older people about government surveillance.143 Some also see that their identities may be discovered online, even with privacy controls.144
A large part of the personal information available on the Web about those born after 1985 comes from the subjects themselves. People routinely set up pages on social networking sites—in the United States, more than 85 percent of university students are said to have an entry on Facebook—and they impart reams of photographs, views, and status reports about their lives, updated to the minute. Friends who tag other friends in photographs cause those photos to be automatically associated with everyone mentioned—a major step toward the world in which simply showing up to an event is enough to make one’s photo and name permanently searchable online in connection with the event.
Worries about such a willingness to place personal information online can be split into two categories. The first is explicitly paternalistic: children may lack the judgment to know when they should and should not share their personal information. As with other decisions that could bear significantly on their lives—signing contracts, drinking, or seeing movies with violent or sexual content—perhaps younger people should be protected from rash decisions that facilitate infringements of their privacy. The second relies more on the generative mosaic concern expressed earlier: people might make rational decisions about sharing their personal information in the short term, but underestimate what might happen to that information as it is indexed, reused, and repurposed by strangers. Both worries have merit, and to the extent that they do we could deploy the tools of intermediary gatekeeping to try to protect people below a certain age until they wise up. This is just the approach of the U.S. Children’s Online Privacy Protection Act of 1998 (COPPA).145 COPPA fits comfortably but ineffectually within a Privacy 1.0 framework, as it places restrictions on operators of Web sites and services that knowingly gather identifiable information from children under the age of thirteen: they cannot do so without parental consent. The result is discernable in most mainstream Web sites that collect data; each now presents a checkbox for the user to affirm that he or she is over thirteen, or asks outright for a birthday or age. The result has been predictable; kids quickly learn simply to enter an age greater than thirteen in order to get to the services they want.146 To achieve limits on the flow of information about kids requires levels of intervention that so far exceed the willingness of any jurisdiction.147 The most common scheme to separate kids from adults online is to identify individual network endpoints as used primarily or frequently by kids and then limit what those endpoints can do: PCs in libraries and public schools are often locked down with filtering software, sometimes due to muchlitigated legal requirements.148
A shift to tethered appliances could greatly lower the costs of discerning age online. Many appliances could be initialized at the time of acquisition with the birthdays of their users, or sold assuming use by children until unlocked by the vendor after receiving proof of age. This is exactly how many tethered mobile phones with Internet access are sold,149 and because they do not allow thirdparty code they can be much more securely configured to only access certain approved Web sites. With the right standards in place, PCs could broadcast to every Web site visited that they have not been unlocked for adult browsing, and such Web sites could then be regulated through a template like COPPA to restrict the transmission of certain information that could harm the young users. This is a variant of Lessig’s idea for a “kid enabled browser,” made much more robust because a tethered appliance is difficult to hack.150
These paternalistic interventions assume that people will be more careful about what they put online once they grow up. And even those who are not more careful and regret it have exercised their autonomy in ways that ought to be respected. But the generational divide on privacy appears to be more than the higher carelessness or risk tolerance of kids. Many of those growing up with the Internet appear not only reconciled to a public dimension to their lives— famous for at least fifteen people—but eager to launch it. Their notions of privacy transcend the Privacy 1.0 plea to keep certain secrets or private facts under control. Instead, by digitally furnishing and nesting within publicly accessible online environments, they seek to make such environments their own. MySpace—currently the third most popular Web site in the United States and sixth most popular in the world151—is evocatively named: it implicitly promises its users that they can decorate and arrange their personal pages to be expressive of themselves. Nearly every feature of a MySpace home page can be reworked by its occupant, and that is exactly what occupants do, drawing on tools provided by MySpace and outside developers.152 This is generativity at work: MySpace programmers creating platforms that can in turn be directed and reshaped by users with less technical talent but more individualized creative energy. The most salient feature of privacy for MySpace users is not secrecy so much as autonomy: a sense of control over their home bases, even if what they post can later escape their confines. Privacy is about establishing a locus which we can call our own without undue intervention or interruption—a place where we can vest our identities. That can happen most directly in a particular location—“your home is your castle”—and, as law professor Margaret Radin explains, it can also happen with objects.153 She had in mind a ring or other heirloom, but an iPod containing one’s carefully selected music and video can fit the bill as well. Losing such a thing hurts more than the mere pecuniary value of obtaining a fresh one. MySpace pages, blogs, and similar online outposts can be repositories for our identities for which personal control, not secrecy, is the touchstone.
The 1973 U.S. government privacy report observed:
An agrarian, frontier society undoubtedly permitted much less personal privacy than a modern urban society, and a small rural town today still permits less than a big city. The poet, the novelist, and the social scientist tell us, each in his own way, that the life of a small-town man, woman, or family is an open book compared to the more anonymous existence of urban dwellers. Yet the individual in a small town can retain his confidence because he can be more sure of retaining control. He lives in a face-toface world, in a social system where irresponsible behavior can be identified and called to account. By contrast, the impersonal data system, and faceless users of the information it contains, tend to be accountable only in the formal sense of the word.
In practice they are for the most part immune to whatever sanctions the individual can invoke.154
Enduring solutions to the new generation of privacy problems brought about by the generative Internet will have as their touchstone tools of connection and accountability among the people who produce, transform, and consume personal information and expression: tools to bring about social systems to match the power of the technical one. Today’s Internet is an uncomfortable blend of the personal and the impersonal. It can be used to build and refine communities and to gather people around common ideas and projects.155 In contrast, it can also be seen as an impersonal library of enormous scale: faceless users perform searches and then click and consume what they see. Many among the new generation of people growing up with the Internet are enthusiastic about its social possibilities. They are willing to put more of themselves into the network and are more willing to meet and converse with those they have never met in person. They may not experience the same divide that Twain observed between our public and private selves. Photos of their drunken exploits on Facebook might indeed hurt their job prospects156—but soon those making hiring decisions will themselves have had Facebook pages. The differential between our public and private selves might be largely resolved as we develop digital environments in which views can be expressed and then later revised. Our missteps and mistakes will not be cause to stop the digital presses; instead, the good along with the bad will form part of a dialogue with both the attributes of a small town and a “world where anyone, anywhere may express his or her beliefs, no matter how singular, without fear of being coerced into silence or conformity.”157 Such an environment will not be perfect: there will be Star Wars Kids who wish to retract their private embarrassing moments and who cannot. But it will be better than one without powerful generative instrumentalities, one where the tools of monitoring are held and monopolized by the faceless institutions anticipated and feared in 1973.
Posted by The Editors on March 16, 2008