20 Questions with Gary Flake, Head of Yahoo Research Labs

In an interview with Gary Price:

"The beautiful thing about a relational database is that its structure tells you a lot about what is important. Database designers have been brilliant at optimizing databases (both the organization of the information as well as the algorithms) to best exploit this regularity. When you flatten out a database, those paths towards optimization often aren’t available.
A middle ground — which is not perfect, but adds a lot of utility — is to convert structured into a semi-structured form. Today, we treat documents as a big bag of words and index those words. In this semi-structured approach, we take structured information (say, the value of specific fields) and synthesize fake words that represent the fact that “document X has field Y with value Z.? Now, clearly I can’t run a SQL query on this representation; but at least I can search for documents with specific field:value pairs.
I’d like to tell you that we will be able to make an unstructured database as powerful as a structured database; but that simply is not the case. Nonetheless, the fusion of structured and unstructured data and approaches will add a lot of utility to the lives of most users."

Emphasis mine. On one hand it’s still frustratingly inefficient to look for information on the open, unstructured web. On the other hand perfectly structured, metatagged content is a dream that’s not going to happen, so I fully concur with Gary Flake’s statement. Structure and tag what you can, know where to stop, and let the rest self-organize.

Leave a Reply

Your email address will not be published. Required fields are marked *