On the Tyranny of Keywords

When I started my metadata career, at the dawn of mass Web media in 1995, I was immediately confronted with the chaos that is an uncontrolled vocabulary. A fundamental, founding principle of the Web is that “anyone can say anything about anything,” according to the W3C working draft on semantic Web activity, and if anyone can say anything about anything, everybody will.

Take, for example, the sandwich. A hoagie is a grinder is a sub is a po’boy is a hero, depending on where you live in the U.S. We use different words to refer to the same thing—and when the Web was new, it was not smart enough to understand that. (Now, of course, it’s considerably more sophisticated.) If you typed hoagie into a search engine, you would not find any references to sub. If you typed in tennis shoes, you would not see references to sneakers or kicks or trainers.

Keywords, in the form of “folksonomies” or tag clouds, made this abundantly clear. And as someone who spent considerable time in BISAC category meetings, helping the book industry reach consensus on what to call things that books can be about, keywords rankled me.

I found a lot of joy in creating systems that organized the unruly world of books. The BISAC taxonomy is the product of enormous cooperation—publishers, booksellers, wholesalers, and other book industry stakeholders sit around a table (well, these days, much is done via Webex) and debate vocabulary words. How to describe different Bible editions, whether a book should be classified as occult; new age; or body, mind, and spirit—the sheer variety of people in the room ensures that we have a living body of descriptive terms that evolves as our cultural vocabulary evolves. Controlled vocabularies—BISAC, Library of Congress subject headings, and other mutually-agreed-upon descriptive systems—are the result of humanity at its most social. “Here are the things that everybody who works in this business agrees upon.” That’s quite an accomplishment.

Keywords, on the other hand, are individualistic, unstandardized, unpredictable, and meant to break patterns—to make something stand out—rather than create them. Over the years, I’ve spent so much of my career at the Whac-a-Mole table of metadata errata that this independent tagging has gone against everything I’ve stood for.

Any system developed by humans will eventually be gamed. As a species, we are really good at figuring out what the rules are and circumventing them for our own benefit. Just as, at the beginning of the Web, people would insert popular search terms such as Oprah or Britney Spears or sex into the metatags of websites that had nothing to do with any of those things, so too will keyword lists be polluted with popular search terms that have nothing to do with the content of the book in question. That’s just the result of the human tendency to game systems.

There are safeguards against this. Amazon will accept hundreds of keywords from established publishers, and only a few from self-publishers. Barnes & Noble is looking to institute specific keyword rules as well. Eventually, keywords will become systemized to the point that something else will have to be used to point big neon arrows of attention at particular titles.

But for now, keywords are the tool for differentiation. So, of course, businesses are being created around their application. Kadaxis, a company founded by Chris Sim (formerly of Bookish) that uses machine learning to generate book metadata, has a great algorithm for text mining—applied not to the book but to its customer reviews. The question, for Sim, is, how are people describing this book? By using those terms as keywords, he focuses searches on specific titles. His results are impressive; backlist books are probably the main beneficiaries.

But keywords alone don’t raise a book through the rankings. I’ve learned, by collaborating with self-published authors on keyword optimization, that sales rankings are the ultimate arbiter. If your book doesn’t sell more copies within a day or so after keyword optimization, it’s relegated to the last pages of search results.

I’ve learned to live with keywords. As the Web has gotten smarter, it has also gotten bigger: as of 2016, Bowker says there are 38 million books in its database, so any given BISAC category is going to have thousands, if not millions, of books competing within it—and keywords are necessary to fine-tune the signal against all the noise. For now, they provide a necessary tool for differentiation, provided sales have met the retailer’s threshold. Like any other tool, they’re wielded for good in the hands of people with good intentions.

Laura Dawson, CEO of Numerical Gurus, is a book supply chain consultant. She also facilitates Metadata Boot Camp, a webinar series tackling metadata issues in publishing.

On the Tyranny of Keywords

Keywords are unruly, unpredictable, and—for now—necessary