No Real Privacy Victory in Google Subpoena Case

Posted on March 18th, 2006 by

Late yesterday afternoon, US District Judge James Ware denied the Department of Justice’s request to compel Google to turn over a sample of its users’ search queries. Privacy advocates have been quick to celebrate a victory. However, although there are several very enouraging signs, a close reading of the reasoning in the judge’s opinion shows that it is too early to celebrate. The denial of access to the search queries was expressly not grounded on privacy concerns, but rather on another, quite shaky, line of reasoning.

First, some background. The DOJ has been engaged in a multi-year court battle with the ACLU and other plaintiffs over the constitutionality of the Child Online Protection Act (COPA), which provides stiff criminal penalties for anyone who makes material “harmful to minors” too readily available on the web. The question in that case is whether the same protection of minors could be achieved in some other way that would cause less collateral damage to communication between adults — communication that is protected by the First Amendment. In the course of this litigation, the DOJ decided it needed to do a statistical study of the effectiveness of filtering software, and it was for that purpose that they supoened Google to produce a list of all the URLs in their index and all the search queries executed by their users over a two month period. (See the original subpoena for these remarkably broad demands.)

Google resisted this subpoena. In the negotiations between their lawyers and the DOJ, the DOJ scaled back the request to a sample of one million URLs from the index and all search queries from a single week. I’ll skip linking those documents in, but the bottom line is that DOJ said they really needed this much and Google said it was still too much, so the DOJ went to Judge Ware’s court in San Jose and asked him to enforce the subpoena (in its narrowed form).

Interestingly, once the DOJ was in a courtroom, forced to explain exactly why their request was reasonable, they scaled back yet further; now they were asking for only 50,000 URLs from the index and, in the biggest change, only a random sample of 5000 user queries, as opposed to a comprehensive list from a whole week or (originally) two months. Thus, the first very encouraging sign for privacy advocates is that the DOJ has crumpled when put under pressure. A comprehensive list of search queries would be sure to include all sorts of rare but scary queries; a sample of a few thousand is probably going to be all “Brad Pitt”, “Britney Spears”, and similar banality.

However, even this drastically scaled back request was too much for Judge Ware. His ruling grants the DOJ’s request for the sample of 50,000 URLs from the index (with some restrictions and some finger wagging regarding what it is to be used for), but very importantly denies the request for any user search queries at all, even the sample of 5000. That outcome is a second very encouraging sign for privacy advocates. A third is that Judge Ware specifically wrote into his opinion a clear acknowledgment that privacy issues are implicated by search queries, even when they are not attached to identification of the users submitting the queries.

However — and this is a big However — the privacy issues raised by Judge Ware didn’t actually lead him to his denial of access to the queries. As best this nonlawyer can tell, that makes his remarks have very scant precedential value; the fact that some district judge was “[given] pause” by the privacy concerns, but ultimately did not find them conclusive, is hardly a solid foundation for the future. Looking at the actual grounds for the ruling does not add to my sense of comfort.

The pivotal point about which Judge Ware’s ruling turns is his determination that the sample of user search queries would be “duplicative” because the DOJ would use them for the same ultimate purpose as the sample of URLs from the index: providing a set of URLs with which to test filters. (The queries would be turned into URLs by submitting them to Google and seeing what the top hits were.) Thus, in Judge Ware’s view, the DOJ could reasonably request one or the other sample, but not both. As they did not specify a preference, he chose to grant them the index sample but not the query sample.

This reasoning is far from comforting on two grounds. First, it quite explicitly suggests that if the DOJ had simply refrained from asking for the index sample, they would have been granted the search query sample — something to keep in mind for future fishing expeditions. Second, it is based on such shaky reasoning that it cries out for reversal on appeal. The index sample would tell how filters perform on the pages that are out there; the query sample would tell how filters perform on what searchers actually encounter — a very different question.

Suppose some litigation concerned how frequently a computer speech synthesizer mangled people’s first names. One of the litigants had access to a book of “names for your baby” that listed each conceivable first name once. Would that render it duplicative to also sample the names that show up in a phone book or other list of individuals? No, because the phone book would show which names occur frequently and which ones infrequently, so as to gain some insight into how often the synthesizer would mangle names in practice. The same reasoning suggests that sampling Google’s queries, rather than only their index, would add value to the DOJ’s study of filter effectiveness, and therefore would not be duplicative.

Privacy advocates should breathe some sigh of relief. Certainly the DOJ was running scared. Moreover, Judge Ware’s comments about privacy lay good groundwork for some future day in which the issue is squarely confronted. However, it is too early to celebrate. At this point, the only actual legal impediment to a government fishing expedition for search queries is a shaky and context-specific argument about duplicativeness.


Comments are closed.