|  |  |  | 
|  | 
At long last, after 15 months of hard work by over 100 people (see below), volumes 2, 3, and 4 of the SIGMOD Anthology are now complete. As such, they are a little overwhelming: fourteen CDROMs chock full of books and papers from conferences, journals, and newsletters.
The SIGMOD Anthology was presaged by a much more ambitious undertaking: the collection of virtually all of Western literature at the Library at Alexandria. The Library was founded around 300 B.C.E. by Ptolemy I, a Greek king who inherited the Egyptian portion of Alexander the Great's empire. The Ptolemys devoted much of their wealth to acquiring every single Greek book, as well as works from Africa, Israel, and other parts of the world. These books, some 500,000 scrolls, of papyrus and later parchment skins, included poetry, drama, criticism, philosophy, history, science, and medicine.
Just as the Library at Alexandria contained the classics of Archimedes, Aristotle, Euclid, Galen, Homer, Plato and Thucydides, the SIGMOD Anthology contains the classics of Bernstein, Chen, Codd, Gray, Maier, Selinger, Ullman, and many others. The original papers proposing the relational model, the ER model, transactions, query optimization, and B-trees are all here. There are also many obscure papers previously found only on a few dusty shelves.
There is another interesting connection between the Library at Alexandria and the Anthology. Ptolemy III requested from Athens the original manuscripts of the great tragedies of Sophocles, Aeschylus, and Euripides, to be copied and returned. The Athenians valued these manuscripts very highly, and parted with them only after Ptolemy insured them with an enormous cash deposit. However, Ptolemy gladly forfeited the deposit, and returned only copies, retaining the original for his library, and infuriating the lenders.
Similarly, for the 100,000 pages that were digitized for this Anthology, we requested printed originals, to ensure a quality scan and accurate OCR. These printed copies were unbounded before scanning, and then destroyed. In return, we provided a digital copy. But at least we made this exchange explicit from the onset. And so I am particularly grateful to those who donated their originals.
The Library was housed in two separate centers: the Royal Library near the harbor, and the Daughter Library, located south of the city. The Royal Library, with 40,000 volumes, burned in 48 B.C.E. when Caesar, finding himself involved in a civil war between Cleopatra and her brother Ptolemy XIII, set fire to the enemy fleet; this fire spread to the dockyards and then to the library. The Daughter Library flourished under the protection of the Sarapeum, which lost its sanctity as Christianity supplanted paganism. In 391 C.E. the Emperor Theodisius ordained the destruction of all pagan temples, and the Sarapeum, along with the library, was totally destroyed. Tragically, it is estimated that only about 10% of its holdings have survived to this day. As an example, of the 123 plays of Sophocles in the Library, only seven survived. All are copies; not a single physical scroll from the Library remains.
As Carl Sagan notes in his book Cosmos, near the site of the Alexandrian Library is a microwave relay tower, exemplifying the technology that will ensure that a similar fate does not befall the Anthology. SIGMOD has made some 5,000 copies, and has sent these copies all over the world. Replication and distribution are powerful mechanisms for fault tolerance and data integrity.
The first four volumes of the SIGMOD Anthology contain 123,500 pages in some 12,000 articles. What portion of all database papers does this represent? We can look at this question from several viewpoints.
DBLP currently contains bibliographic information on 180K papers, split between databases and logic programming. If we assume that half of these papers are database papers, then the Anthology contains about 13% of this total.
The DBLP conference index lists various computer science conferences and workshops (some no longer held; some one-time events). Of these, 70 are related to databases; the SIGMOD Anthology has the proceedings of 23, or a third.
The DBLP journal index lists computer science journals. Of these, 22 are broadly related to databases; the SIGMOD Anthology contains 3, or 14%.
The DBLP 'most frequently cited database publications' page lists the 100 most referenced conference and journal papers, from an analysis of 100K citations. The SIGMOD Anthology contains fully 80% of these, a surprisingly large portion.
I conclude that the Anthology contains perhaps 10-15% of all database papers ever published. But the distribution is skewed towards those that are cited heavily. My guess is that there are better than even odds that the next citation you encounter in your reading will refer to a paper in the Anthology.
This implies a corpus of about 100,000 database papers. It is has been estimated using different data that about 1 million computer science papers have been written since the discipline came into being around 1940. That means that the database community has contributed roughly 1 in 10 computer science papers. The scrolls in the Library at Alexandria correspond to very approximately 4 million typeset pages, perhaps three times the size of the computer science corpus and about two orders of magnitude larger than the Anthology.
The CDRoms you are viewing have occupied a good portion of Michael's life over the last three years. Without Michael, this project would have been inconceivable. It is primarily though his hard work that these documents, some of which were on the verge of being lost, are now available to us and to future generations of scholars. He has my heartfelt thanks for a job superbly done.
Richard T. Snodgrass
Tucson, November, 2000
The Anthology is a hybrid HTML/PDF publication. The bibliographic meta information is presented in HTML and available on the Web (http://dblp.uni-trier.de or http://www.acm.org/sigmod/dblp/db/welcome.html, ...) or on CDROM 4-4. All full text documents are PDF files. To read or print them you should install the Acrobat Reader on your computer. For most files Acrobat Reader Version 3 should be sufficient, some documents on the 2000 volumes require Acrobat Reader Version 4. On CDROM 4-3 (directory AcrobatReader) you may find this software for some popular platforms.
Each of the CDROMs (except 4-4) comes with a full text index of all PDF files of the issue. To use this index you have to start the Acrobat Reader as an application and NOT as a plugin of your Web browser. Unfortunately the Acrobat Reader with the option for searching still is not available for the Linux operating system.
A little statistic derived from the Acrobat Catalog log files gives the exact document and page counts for the Anthology CDROMs:
| Vol/No | 1/1 | 1/2 | 1/3 | 1/4 | 1/5 | 2/1 | 2/2 | 2/3 | 2/4 | 2/5 | 2/6 | 2/7 | 3/1 | 3/2 | 3/3 | 4/1 | 4/2 | 4/3 | total | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PDF Pages | 2705 | 7175 | 5653 | 6764 | 5952 | 9391 | 8403 | 6587 | 6845 | 9678 | 6905 | 6731 | 8734 | 5725 | 3183 | 10550 | 7805 | 4714 | 123500 | 
| PDF Files | 376 | 684 | 553 | 910 | 689 | 672 | 679 | 765 | 816 | 968 | 842 | 858 | 363 | 302 | 272 | 465 | 1073 | 851 | 12138 | 
During ACM SIGMOD Conference 2000 I announced the availability of the XML-style records which are behind DBLP. A short article in SIGMOD Record September 2000 gives more details. A mid November 2000 snapshot of the DBLP records is stored on CDROM 4-2 of the Anthology in the directory "dblpRecords".
The ACM SIGMOD Anthology and the joint volume of the Anthology with IEEE Computer Society were the idea of Rick Snodgrass. Without his highly effective organizational skills and his diplomatic style the Anthology would not exist. The e-mails written and received by Rick for the Anthology nearly fill another CDROM. It is a great experience to cooperate with him to make this project happen.
Michael Ley
Trier, November, 2000