Like many people across the business world, publishers have a skeptical view of so-called Big Data, seeing it as “the next big hype.” This is especially understandable in that book publishing has always been a “gut business,” as Ken Brooks, senior v-p, global supply chain management, at McGraw-Hill Education, puts it. Data-driven decision making is simply 180 degrees away from a business model based on intuition.

Business has always run on data—in publishing, sales data has always been important to decision making. But beneath the hype about new data, is there new value? Just in the last six months, Big Data has been the cover story on publications as diverse as Science, Foreign Affairs, and Harvard Magazine (not to mention scores of IT trade magazines). Has the time come for publishers to focus on data and analytics, which prove to be powerful business tools in this era of digital transformation?

Chantal Restivo-Alessi, chief digital officer, HarperCollins, has an answer. “Of course we are having to make investments in a new area. It is difficult convincing the business to do this so early on. So we put financial skin in the game from center down, to make the importance concrete. And, fortunately, our CEO sees the value.”

As the global COO of Macmillan Science and Education, Ken Michaels, puts it, the goal of collecting and analyzing data is to be able to “chart better strategic business objectives, improve the effectiveness and efficiency in all parts of the business, including developing better products and audience outreach, enhancing how we market, even one to one[marketing].”

Perhaps the best way to understand Big Data is through the lens of social media. At first, Facebook seemed like a cool way for college kids to connect online, and Twitter… only 142 characters? Really! It took a while for publishers to take social media seriously and the arrival of Big Data has only continued to validate its utility. In the aggregate, Facebook and Twitter (and hundreds of other social network systems) now provide enormous information about what customers are thinking, trends in buying patterns, and problems with products. As the WordPress Codex, the online manual for the WordPress blogging system, contends, “loyalty” as expressed by social media support for known authors appears to be a far better predictor of new book sales than even the fame of the author.

Similarly, Restivo-Alessi notes that the “beauty” of publishing over other industries is that publishing “has a lot of product. This can actually help in the day-to-day work of making convincing little case studies that grow the number of internal advocates.” Such is the language of the data miners, but the numbers are welcome.

Just as consumer product companies are now monitoring these streams of data, the same is starting to happen in publishing. “For the first time, we have a chance of understanding what the ultimate consumers are doing with our content,” Restivo-Allessi says. “Not only primary data about heavy users, but now secondary data, that big universe of readers who are not engaged, or are lightly engaged.”

With the increasing ability to store and analyze huge flows of data—and not just data from social media but also from the use of sensors, live video feeds, YouTube, Google searches, email and the rest of our ever growing connectivity—the commercial potential of Big Data rises exponentially. Benefits include supporting in-store sale of physical books; improving discovery; shelf-space and inventory management; direct-to-customer marketing in real time. All of this would have seemed inconceivable a decade ago.

Scientific publishers are actually publishing Big Data itself, making available huge data sets that come from wherever: Mars, or the Hubble telescope or virtually any scientific research. Timo Hannay, managing director of digital science businesses at Macmillan (whose online data “imprint” is Figshare.com, where scientists can post their research data for free), notes that this allows not only for greater scientific transparency, but provides a treasure trove of experimental data for other scientists.

Similarly, Big Data provides a huge upside in discovery, sales, and delivery of digital content, as U.S. publishers expand the market for English-language materials in emerging economies. Nielsen states that there are two billion people around the world for whom English has become a second language. One important caveat commercial companies of all stripes are learning: those populations need and use products in ways that are very different from the U.S.

The key questions about data and analytics, Restivo-Alessi says, involve what kinds of data are relevant, and what kinds of insights can be derived from them in order to take action. An executive with a household-name energy company asserts that Big Data can be understood as a series of Vs:

• volume—how much data are you collecting: megabytes, terabytes, petabytes?

• velocity—how fast is it coming: hours, minutes, seconds, nanoseconds?

• variety—what sources are you using: social media, email, searches, sensors?

• variability—are the important flows continuous, or does it depend on time of day, week, month? And perhaps the most important,

• visualization—how are you analyzing, displaying, and sharing the data across the company to best tell the “story,” break down corporate silos, and thus derive maximum insights and business usefulness?

In higher education, data and analytics have become a mission-critical reality for major publishers. They underpin platforms that combine subject matter, digital learning enhancements, metrics of how students are interacting with the subject, how well they are doing in learning the material, and then connecting all these results to the universities’ information systems. In fact, companies across the business spectrum are realizing that the successful use of data and analytics requires “a different breed of cat” in terms of employee skills. Brooks reflects that, for some time now, “Editors have had to understand the technology perspective just as technologists have had to understand that of subject matter and pedagogy.” Add the ability to incorporate data and analytics to the mix, and you can understand why the “upheaval in higher-ed publishing” continues.

But benefits appear to outweigh “upheavals.” Restivo-Alessi points to the importance of using data and analytics on authors’ behalf, to build a brand and “take authors into a journey beyond ‘title.’ Of course, as with building any brand, this takes time.”

Big Data may seem to be another big hype, but so did the Internet in the late 1990s and social media a few years later. To be sure, this represents yet another challenge and expense for publishers. “It has its bumps, its ups and downs” Restivo-Alessi admits, “but we are committed. This is an on-going process and we are going to stick with it.”

As Ken Brooks exclaimed at the Digital Book World Conference in January, when asked for a final comment after the Big Data panel discussion: “Just do it!”