Software, Digital Content, Geopolitics, Economics & More from of a Libertarian Serial Expat and Entrepreneur
In: search engines
3 Jun 2004In an interview with Gary Price:
"The beautiful thing about a relational database is that its structure tells you a lot about what is important. Database designers have been brilliant at optimizing databases (both the organization of the information as well as the algorithms) to best exploit this regularity. When you flatten out a database, those paths towards optimization often aren’t available.
A middle ground — which is not perfect, but adds a lot of utility — is to convert structured into a semi-structured form. Today, we treat documents as a big bag of words and index those words. In this semi-structured approach, we take structured information (say, the value of specific fields) and synthesize fake words that represent the fact that “document X has field Y with value Z.? Now, clearly I can’t run a SQL query on this representation; but at least I can search for documents with specific field:value pairs.
I’d like to tell you that we will be able to make an unstructured database as powerful as a structured database; but that simply is not the case. Nonetheless, the fusion of structured and unstructured data and approaches will add a lot of utility to the lives of most users."
Emphasis mine. On one hand it’s still frustratingly inefficient to look for information on the open, unstructured web. On the other hand perfectly structured, metatagged content is a dream that’s not going to happen, so I fully concur with Gary Flake’s statement. Structure and tag what you can, know where to stop, and let the rest self-organize.
I'm CEO of an online trade publishing firm in the marketing and defense verticals. We try to make news and data digestible and useful in an environment that is more noisy each day. This personal blog mixes my thoughts and interests on politics, business, software, and more, based on my business and personal experiences. Over the years I have posted items that turned out spectacularly wrong, and a few posts that stood the test of times better. Personal views only.