Sometimes it feels like we’re swimming in an endless sea of digital content, and to the extent I for instance have more than a terabyte of media on my own local network (currently 16 IP devices!), we are. But once you dive in, you realize that a lot of it is digital only on the surface, but analog underneath. From mp3 files that are just a compressed sound wave rather than the original digital tracks to scanned PDF files that you can’t search or index. Consider that Wal-mart PDF annual reports from as early as the mid nineties are images, not text. Similarly, many government documents released after FOIA requests are just scans. What you see happening is digital sources dumbed down into digitized analog content. A lot of structure, meta data and meaning are lost in the process, and “print to PDF” as opposed to “save as PDF” makes a world of difference. I’m impressed by a number of ongoing efforts, from the Show Us a Better Way initiative in the UK to what sites such as Freebase are trying to accomplish, but we’re still a long way away from the universal data cube!
As a publisher we’re starting to work on adding our very modest contribution towards that elusive vision. An enormous amount of time goes wasted within companies to just gather and aggregate market or industry data. You haven’t even started analyzing the data that you’re already exhausted by all the scrape-copy-paste-clean-massage-normalize work involved. This makes it hard to reach conclusive insights because the waters often remain muddled in apples mixed with oranges, and it’s not any better when companies are dealing with their own internal data.
We’re not going to go after this pain by trying to boil the semantic ocean (good luck with that to the start-ups in that field). Rather, we intend to put together tight data packages in our selected verticals. Trade publishing is stuck in the 80’s for the most part, which helps explain the turmoil currently seen in companies such as Reed Elsevier, Penton or Cygnus. The print and events legacy is really hard to shake off for these guys. There’s a lot of value locked there that’s just not delivered to business audiences in convenient ways. Data products tend to be published behind the firewall through expensive and complicated offerings. I’m not saying we have the answer, but I do think we “have the question” better than most.
Hopefully we’ll start fleshing out these ideas into actual products within the next 12 months. It’s been baking for a while, from our Focus Article format at Defense Industry Daily to pretty much what MarketingCharts.com is all about. Now we intend to turn our sites into application/news hybrids (let’s face it, publishing charts in gif format is just a stopgap), and that’s going to be a tough but fun ride. Now let me go back to shutting up and working on execution!