Books and Browsers: Privacy for Digital Library Patrons

Patron privacy has long been a concern for libraries and library users in the digital age. But in light of recent events (for example, reports that Adobe has been collecting data on users of its ADE app), it’s become a red-hot issue, and several articles and essays in the library press in recent weeks have pointed to the importance of safeguarding user privacy in the digital realm. Generally, these essays assert that public libraries are among the last protectors of privacy in contemporary society, and that, as librarians, we must redouble our efforts to offer our users a refuge in an age marred by data breaches and surveillance.

I think there is a lot of bunk in the emerging polemic, however. I know—for many librarians those are fighting words. But to cling to an ideal of the public library as an assured safe haven does a disservice both to our users and to our libraries. In today’s digital world, libraries cannot guarantee the absolute privacy of our users. But, more importantly, for our own purposes, we shouldn’t want to.

Like most modern Web-based organizations, libraries today utilize cloud-based services, open Web stacks, and externally hosted databases—services that rely on a range of third-parties that are out of our control. Thus, to tell any user that the data generated by his or her library transactions is secure is misleading at best.

But the deeper reality is that libraries today must view the collection and use of user data not as an ethical breach but as an imperative. If we are to stay relevant to our communities, we must develop and provide services that offer the same kind of fluidity and personalization that today’s data-rich commercial platforms offer, and that our time-pressed and information-overloaded customers clearly desire.

Libraries today are far removed from the sheltered monasteries that once chained books to benches, and our increasingly tech-savvy users understand this. They also understand that we must put their data to work in order to serve them better. But with data collection also comes responsibility. And to keep the trust of our users, we must consistently demonstrate that we are protecting them as best we can, and we must educate and inform them about what happens—or what might happen—with their data.

Are You Secure?

To begin with, libraries need to address the most egregious points of data loss, privacy invasions, and data insecurity in our services. As librarians, we cannot condone the sloppiness with which we have gleefully set Google Analytics to work, or our ignorance of the advertising networks that silently inhabit our user relationships.

In his blog Go to Hellman, Eric Hellman has documented the “leakage” of user data from common Web tools that libraries employ. Often unknowingly, hidden tools can result in the setting of cookies, trackers, and beacons across our websites. Such persistent cookies and trackers can, for example, associate a user’s Amazon searches with those on his or her library’s website. No one should be shocked at this vulnerability, of course—anyone who has ever searched for a gift on Amazon has probably, at some point, been unnerved to see related ads suddenly appear on the local newspaper’s website or on other Web pages.

Gary Price of InfoDocket has also been a leader in using tools like Ghostery to demonstrate the ease of plucking these everyday queries from our network connections. Ghostery is a browser extension that allows you to see otherwise-invisible trackers that are monitoring your browsing habits—indeed, it can be an eye-opening experience to see who is following your Internet activity. Sure, sometimes such tracking can be useful, but it can also be dangerous. Unencrypted searches for police brutality, in concert with queries on tear gas defensive measures, may not be the kind of thing you want surveillance tools to pick up as you sip your latte.

If we are to maintain the trust of our users, it is incumbent upon libraries to restrict the scope of such unknown tracking as best we can. One of the most basic things we can do to eliminate data leakage is to turn on secure Web connections through https by default. This at least will prevent someone in a cafe from snooping on Web search transactions against the library catalogue or licensed databases.

And the seemingly daily press reports about data breaches should remind us that there are vast numbers of user records in vendor-hosted databases in third-party network co-location sites. The possibility of waking up to a headline that a library’s patron data is available for download on a torrent site is chilling. A recent hack against the Wyoming State Library’s catalogue system revealed the vulnerability of our ILS data stores, which are rarely written in an encrypted format.

At a panel at last month’s member meeting for the Coalition for Networked Information (CNI), Marshall Breeding documented the lack of library vendor data security measures. Yet the most basic, common sense safeguards would have, at most, a negligible impact on the performance of these vendor’s products. So what are we waiting for? Again, we cannot provide guarantees—no one can. But this is a point of vulnerability that we can, and should, actively diminish.

The Power of Data

At the same time, there is no denying that tracking and data collection are now part of the fabric of the contemporary Web. Let’s not pretend that we can eliminate them entirely. Rather, our responsibility should be to educate our users about the kinds of data that will inevitably leak out from their library’s Web services, and the potential consequences of that leakage. And, importantly, we should also disclose our own desire to create services that may harvest this same kind of data, with the user’s full knowledge and consent. For all the risks and threats, let’s also educate our users on how responsible data collection benefits them.

As I wrote in my column last month, NYPL and other libraries are beginning to establish location-based apps that ask users to reveal where they are in the city so we can better serve them. Want to know the hours of the library closest to you? Sharing your location will help us tell you. And single sign-on systems, which are already commonplace in customer-facing sites ranging from those of airlines to media outlets, allow libraries to link your activity across the breadth of digital and physical library interactions (for instance, when you swipe your library card at a branch or log in to a library website) offering significant convenience.

For decades, libraries’ efforts to prevent privacy breaches entailed throwing away user data as soon as possible, as a matter of policy. Now, the risk of social irrelevance requires us to use that data to develop services that will ensure our continued viability in the digital age. To do this, we have to establish the baseline data collection and privacy expectations users must accept to take advantage of Web-based library services, while educating them about other services they can opt in to if they allow additional data harvesting.

For libraries, that’s the challenge: to mitigate the dangers of data collection for our users (dangers we can limit but not fully control), while balancing our data collection practices with our users privacy expectations, in order to enable valuable, innovative new services. It won’t be easy. But it is necessary.

Books and Browsers: Privacy for Digital Library Patrons

Libraries must address users’ privacy concerns while also collecting the data needed to support new services.