The amount of information on the World Wide Web is enormous beyond belief. Recent estimates place the surface web (i.e., web pages accessible by search engines) at well over eleven billion pages, and the deep web (i.e., those that are not) at over half a trillion, with more content being added all the time.
One of the drivers of this growth is the recent phenomenon known colloquially as “Web 2.0”, that is, the interactive web content under control of users, not webmasters, such as the videos of YouTube and the social networking of MySpace, as compared to the static web pages that characterize the traditional “Web 1.0” Internet.
Regardless of which web you utilize, there is one glaring obstacle that is readily apparent to anyone who has suffered though countless hours searching for a certain piece of information, namely: How does one effectively manage all that data? To date, two complementary tactics have been pursued: Web 1.0 takes a process-oriented approach though the use of search engines, while Web 2.0 takes more of a data point of view by using data tags.
Data tags consist of one or more keywords or phrases that are associated with some particular content to aid in locating it. For example, this article could be “tagged” with several keywords, such as “business architecture”, “Web 2.0”, “Krawchuk”, and so on. By their nature, tags are pure data, essentially a flat list of words and phrases.
Search engines, on the other hand, are pure process. Their role is to sift though mountains of data and, in an ideal world, directly locate the exact item being searched. Clever algorithms explore tags, content, and other web artifacts to improve the accuracy of the search. But regardless of their cleverness, it is ultimately the responsibility of the user to manually align the process of the search engine with the data of the tags to effect the match – and not always successfully.
This lack of an effective search capability is only one aspect of a much more serious shortcoming, namely, the lack of an effective data modeling capability to empower more-sophisticated tagging. As the web continues to grow, the problem of effective data management will become more and more pronounced until at some point it finally becomes crippling, essentially cutting off entire swaths of web content. The enormous quantity of data will simply become unmanageable; and, as can be seen in far too many cases, it already has.
To save the day, enter stage right our hero business architecture. At its highest level, business architecture is defined as the set of independent processes of a system or enterprise and the relationships between them. It is modeled out in two orthogonal dimensions: process and data; and while orthogonal, these two models are a tightly integrated whole. By taking a business architecture viewpoint, the root causes of the coming meltdown of Web 2.0 become clear, as does the solution.
Looking first at process, it is undeniable that search engines are drastically limited in scope; they are essentially ad hoc SQL SELECT’s or MS-Access queries that lack even the most primitive record-matching capabilities that those tools possess. Until the engines embrace a comprehensive process modeling ability, which is one of the hallmarks of business architecture, they will continue to return millions of unrelated hits; and until they allow for stored processes, they will remain tools for the web dilettante, not the serious business user or managers of the burgeoning content of Web 2.0.
But inflexible search engines do not bear all of the blame; data tags, too, claim their fair share. Chief among their deficiencies is that tags make absolutely no provisions for any data model more complex than a simple list of keywords joined by Boolean operators. Also blatantly missing is the higher-level data structure that captures inherent relationships, such as connecting first and last names, street addresses to city and state, then connecting those structures to each other and to the content in question. Simple tagging provides no structure at all for the data, so it should come as no surprise that the unfortunate search engine should indecorously produce millions of hits. It was designed that way.
Web aficionados are quick to point out that such higher-level process and data capabilities already exist in the form of PERL scripts, Java applets, objects, and similar arcane constructs. But these solutions are themselves a major part of the problem, for who but another geek could possibly understand their intricacies? In order to make the capabilities of these technical tools available to the masses, they must be replaced by data and process modeling techniques that are universally understandable, such as the easy-to-use graphical Business Architecture System; and until they do, the situation will continue to worsen.
But business architecture holds the promise to avert the coming meltdown. By providing a clear, comprehensible, and concise method for non-technical users to model complex data tags and the processes that integrate with them, much richer, more robust catalogs, web searches, indexes, and more can be defined, designed, and developed – call it “Web 3.0”. But until business architecture rides to the rescue, successful and reliable data management will remain elusive at best, condemning the burgeoning Web 2.0 to eventually collapse under its own unimaginable, unmanageable weight.