Richard T. Snodgrass and M. Tamer Özsu.
A topic that has been debated continuously for several decades is whether Computer Science, or more specifically, database research, is engineering or science. Some wags have noted that fields with inferiority complexes try to enhance their image by including "science" in their names (cf. geoscience, environmental science, political science, social science, and yes, computer science), whereas real sciences (such as astronomy, biology, chemistry, physics, psychology) have no need to do so. We personally prefer the term "informatics", oft used in Europe, in part for this reason.
Our feeling is that database research has elements of both engineering and science, and that the best papers often include both in close proximity. "Soft" database papers on topics like data modeling and query languages could often be strengthened by appropriate user studies employing specific hypotheses to be tested, and "hard" database papers on topics like optimization techniques, novel indexes, and evaluation algorithms, which often are replete with measurements and graphs (more on that in a minute), could be enhanced with more discussion of their impact, deleterious or otherwise, on the complexity of the system.
David B. Lomet got to the core of this issue in an ICDE'94 panel on "The Impact of Database Research on Industrial Products," whose summary appeared in the September 1994 issue of SIGMOD Record.
"...Many researchers produce techno-nibbles. They begin with a small idea, which is then diced into several papers. Referees frequently fail to weed them out, giving excessive weight to novelty and having too much tolerance of complexity."
In other words, such papers present perhaps reasonable science, but ineffective engineering. (As a slight diversion, Rick has also discussed this phenomenon of techno-nibbles; his term is LPU: least publishable unit, SIGMOD Record, March 2001, Chair's Message.)
Another popular topic is noting the rate of technological advance, which everyone agrees is ever increasing, and observing when a quantitative increase becomes a qualitative advance. As but one example, Peter Mark Roget, in gathering what is considered a definitive list of synonyms in his Roget's Thesaurus (a copy of which we imagine that you, dear reader, have on your bookshelf), created something new and highly useful in the process. His thesaurus has been coupled with many word processors (Microsoft Word has its own thesaurus, developed by Bloomsbury Publishing); highlight a word, and you can instantly get a list of synonyms, or access Roget's Thesaurus directly.
An intriguing article in the May 2001 issue of the Atlantic Monthly by Simon Winchester baldly states,
"Roget's Thesaurus no longer merits the unvarnished adoration it has over the years almost invariably received. It should be roundly condemned as a crucial part of the engine work that has transported us to our current state of linguistic and intellectual mediocrity."
Winchester's argument is that properly educated writers already know many words and can differentiate between the subtle differences of synonyms, which are often only roughly synonymous; such writers are thus in little need of a thesaurus, whereas the increasingly lazy wordsmiths turn to thesauri to find a fancy-sounding synonym that they would realize, had they actually looked up the word in the dictionary, doesn't quite fit.
So, what does this have to do with the ACM SIGMOD Anthology? It hints at the several viewpoints from which this achievement can be approached.
When we started the Anthology four years ago, our goal was to make SIGMOD material (the SIGMOD and PODS conferences and the SIGMOD Record) available in digital form. We also hoped to convince a few other conferences to join this effort. We figured if we were lucky the effort would result in half-dozen CDROMs. It has been a gratifying and somewhat astonishing surprise that the Anthology now comprises 21 CDROMs (with more on the way!), with the present edition, which covers the last half-century, through 1999, containing more than 150,000 pages of scholarly papers.
Before the Anthology, that is, but a few years ago, most of this material was available only in print in research libraries. In fact, we doubt that any individual library has all of this material in its collection. The qualitative shrinking of 1500 pounds (almost 700 Kg) and 50 linear feet (15 meters) of volumes to a binder half the size of a single volume, weighing less than 2 pounds (1Kg) is certainly a qualitative change. It is now possible to carry around the Anthology when one travels, or back and forth between office and home.
Another qualitative change is the ability to do full-text searches on this material, a task literally impossible with the printed version. Even the digital libraries now being constructed by all of the major publishers don't meet this need, because each library must be searched individually.
There are still some problems. Let's return to Lomet's critique of the scholarly process practiced by some database researchers.
"Referees are overly impressed with syntactically correct papers. These follow something close to the format: (1) introduction, (ii) background, (iii) main idea, (iv) analysis-sprinkled with equations, (v) performance results-sprinkled with graphs and tables, and (vi) a discussion explaining why the new technique crushes the prior methods. A bibliography cites the work of all likely referees."
Given this search capability, an author could do several keyword searches over the Anthology, and within a few minutes gather an impressive bibliography on the topic at issue. The Anthology is to a passive author's bibliography what Roget's is to a sloppy writer's imprecise and diffuse word usage. Both do not serve scholarship when used inappropriately. An author should not cite a paper he or she has never read.
Another problem is that searching 21 CDROMs sequentially is an inconvenient bother. One solution is to copy the entire Anthology to hard disk, so that the Acrobat search tool can use all the index files at one time. However, that requires approximately 13GB of free disk space. Most notebooks have 10GB or less of total disk space, so that is not an option when traveling.
Enter the next quantitative/qualitative change: the Silver Anniversary Edition, Michael Ley's crowning achievement. (Note the reference to the twenty-fifth anniversary of a wedding, when silver is the proper gift. This anthology covers roughly the 25 years of database research spanning 1975-1999, with a few papers from before that period.) This edition puts everything on two DVD disks. Many laptops have DVD readers, and flipping between two disks is an order of magnitude less onerous than 20+ CDROMs. Plus the whole package weighs in at only five ounces (bits are getting lighter all the time), less than a fifth of the weight of the binder. So this version of the Anthology truly is portable.
This Anthology, properly used, can definitely aid writers. We searched for "syntactically correct papers" (a catchy phrase that Rick remembered from the panel, seven years ago) using Acrobat's search tool over the Anthology, as well as Google and CiteSeer. Only the Anthology was able to locate Lomet's paper, even though all three had access to the paper, since SIGMOD Record is freely available on the web. We had the same experience with the evocative term "techno-nibbles". The Anthology is qualitatively better at searching the core database corpus than several prominent alternatives.
We previously listed over 60 people who helped with the Anthology. There are many others who also helped out, including hundreds of authors who gave their permissions to have material they wrote included. We again thank all of these people, who directly contributed to the Anthology. We also thank everyone who attended SIGMOD conferences: this project has cost SIGMOD about a quarter of a million dollars, raised from a decade of conferences. Additionally, we thank the many societies that paid over $100,000 to digitize this material, for use by the scientific community at large.
And finally, we once again express our deep gratitude to the Founding Editors of the Anthology and DiSC, Michael Ley and Isabel Cruz, for seeing this vision to reality, through literally thousands of hours of work by them and the Editorial Boards they managed. The qualitative change in scholarship that these two publications have enabled is their enduring legacy.
August, 2001
Copyright © Fri Mar 12 17:04:54 2010 by Michael Ley (ley@uni-trier.de)