Plowing through Unstructured Data

Jon Udell just experienced some of the practical limitations getting in the way of sharing and representing structured data easily that I’ve been running into myself. To produce an entry about Circuit City’s store closures, I had to spend a lot of time massaging the source data (coming from a PDF) in Excel so that it was properly mappable and chartable.

Tasks that add little value and should take 5 minutes easily balloon into hours of menial work to renormalize and restructure data that should have been published as csv or xml in the first place. “Fake” digital content is going to get in the way of publishers for the foreseeable future. The challenge is to optimize workflow to get a decent production cost/time for enhanced news coverage. It’s all about making things replicable.

