I’ve been a standards advocate my whole career. I base virtually all my consulting work on standards. But in that work I often see the results of thinking that just conforming to a standard means getting things right.
Some real-life examples:
• A publisher who has been getting XML from vendors for a decade and now realizes that it’s virtually useless because it’s so inconsistent.
• A publisher who, realizing that the content in ePub is XML, expects to get repurposable content as a by-product of getting ePubs but finds that it’s nowhere close to useful—and winds up redigitizing thousands of books.
• A publisher who divides the house’s enormous backlist of books among five vendors to get ePubs and winds up with a mess because no two of them did the work the same way.
• A publisher who has a well-regarded conversion vendor produce all of its ePubs, unaware that the inherent accessibility in the ePub model has been undermined because the vendor didn’t use any of the semantic structure provided by HTML; it didn’t even use for paragraphs—everything was the semantically neutral.
Those are all true stories. Were those publishers cheated by their vendors? Absolutely not. The vendors that the first publisher used made sure their XML was valid to the NLM model, which was then the standard for scholarly content (now it’s JATS for journals and BITS for books). The vendors for the other ones all created ePubs that pass ePubCheck, the system that ensures an ePub is valid—in fact, the fourth one even conformed to the built-for-accessibility ePub 3, which specifies that content needs to use HTML markup.
Those vendors each did the work according to their standard procedures. Why? Because the publishers didn’t provide specs. They just said “give me NLM XML” or “give me ePub 3.”
This issue isn’t just about content markup; it’s a big problem for metadata, too. The most prominent example is ONIX. ONIX is a fantastic (but very rich and complex) way to provide metadata for the book supply chain. But recipients of ONIX metadata often complain that no two publishers use ONIX in exactly the same way.
NLM and JATS and ePub and HTML and ONIX (and many others) are important standards providing essential markup models. The trap? A model is not a specification. You need to say how to apply the model to your content, to address your needs and your business rules.
The model is necessary but not sufficient. Virtually nobody today creates purely proprietary content markup specs (though proprietary metadata schemes are still common). Almost all good specs are based on standard models, as they should be. Models enable interoperability. That’s why JATS is so universal in the scholarly world, why ePub is used for virtually all e-books, and why book metadata should be ONIX.
But those models are designed to accommodate a broad range of content, and publications with a broad range of users and uses, within the domain they’re created for. They provide far more features than any one publisher would ever use. They also provide ways to express publisher-specific things, as well as alternative ways to express a given thing. That’s why the divs in those inaccessible ePubs were perfectly adequate HTML, even though they resulted in terrible ePubs: assistive technology, such as screen readers, looks for sections with ps and properly nested h1–h6 headings (and a lot more).
A good specification helps put an end to any misunderstandings. Creating a spec makes publishers think carefully about their content: What are its component parts? Which things are important? Which things go with which other things? How should they be organized? What markup will add value to your content and make it more useful and future-proof?
A specification documents all those decisions. Now the publisher, suppliers, distributors, and other partners can all speak the same language. This helps make publications work and play well together. And it helps the services and systems that receive that content understand what they’re getting.
One more thing: pay attention to industry best practices. In the past, these have just been considered to be generally known, but increasingly they’re being documented—often by the very organizations that govern the models. They recognize that there are many ways to implement their models, but they also recognize when consensus has developed between creators and consumers of the content. They want to help folks get things right. They want things to work.
Doing things in proprietary or inconsistent ways creates friction in the supply chain—from editorial and production workflows to delivery to clients and customers. A good standards-based specification makes things easier for everybody. A model is not a spec!
Bill Kasdorf is principal at Kasdorf & Associates, a consultancy focusing on accessibility, information architecture, and editorial and production workflows