Making books accessible to as many people as possible is a core tenet of librarianship. And at the Coalition for Networked Information meeting in Seattle earlier this spring, I learned that librarians and data specialists at the University of California Davis are among those hard at work converting book metadata into a new format called BIBFRAME, which holds promise for significantly better book discovery and accessibility than its forbearer—the MARC record.
For decades, MARC has been the metadata format that libraries used to catalogue information about books. But MARC is an old standard from the 1960s, built on the presumption that online catalogues operate on mainframe computers with scarce memory and storage. BIBFRAME, on the other hand, codes data into a more modern and flexible style of expression called resource description framework (RDF), which enables libraries to work with catalogue data as if it were addressable on the Web (because this information is all publicly accessible, it is called linked open data).
Essentially, with this RDF, every book gets its own URL, as do authors, publishers, and so on. The result is that users can browse across a catalogue through many different vectors, searching for books based on criteria of their own choosing. And here’s the kicker: if publishers were to start providing their data in this kind of RDF format, libraries would no longer have to perform routine cataloging for newly released books—their systems could simply process the data and add it to their collection, building on it as needed.
Think about this for a second. If publishers contributed descriptive book metadata into a publicly shared repository, then all a library would have to do is record that it owns or licenses the book and include a pointer to the relevant record in that database. Imagine what libraries could do as a community with that kind of data set—essentially a registry for all published literature.
Actually, you don’t have to imagine it: you can see a working prototype of this kind of approach using an application that some of my former colleagues at NYPL have developed, out of the NYPL Labs group, the division working to reformat the library’s knowledge for the Internet age. Software engineers Matt Miller and Leonard Richardson took a sample of public domain e-books from the Project Gutenberg database, combined the descriptive metadata with information available in Wikipedia’s DBpedia project (a set of structured data in the RDF format) to create an entirely new way of browsing and searching for books.
It’s just a prototype, but it does a lot already. For example, entering a word such as London into the NYPL app will lead to suggestions based on the city, works that include the word London, events that involve London, and people from London. All of those individual characteristics can be used to launch new searches and queries. It’s still the early days, but the prototype enables us to imagine what having an entire national library of data about books would look like and how powerful it would be.
Librarians, of course, love this kind of work. But publishers should too. There’s a treasure trove of information about books within books themselves. And by mining the tremendous amount of information that is held in any one book, and combining it with information from other books, we can finally begin to grasp ideas and knowledge beyond the current reach of publishers, booksellers, and libraries.
The world of literature is vast, and there is so much we can do to enable readers to find, discover, and enjoy literature in ways that have never before been possible. RDF is just one of the ways that we can make book discovery more effective, and more fun.
The ALA Annual Conference will feature many programs on metadata, including an all-day RDF-themed event on Friday, June 26: the RDF Jane-athon (8:30 a.m.–4 p.m. MC 2024W). Drawing on the “hackathon” model, the ALA publishing and the RDF development teams will come together to catalogue Jane Austen resources with software specifically designed to use RDF—a powerful crash course, to say the least.
PW columnist Peter Brantley is the director of online strategy at the University of California Davis Library.