Digital Review dblp.uni-trier.de

Review - EXPRESS: A Data EXtraction, Processing, amd REStructuring System.

Philip A. Bernstein: Review - EXPRESS: A Data EXtraction, Processing, amd REStructuring System. ACM SIGMOD Digital Review 1: (1999)

Review

One of first hot topics in the database research field, in the early 1970s, was data translation. In fact, before SIGMOD gained its name in 1975, it was previously called SIGFIDET, for "FIle DEscription and Translation". Data translation is the problem of cleaning and reformatting data when moving it from one application suite to another. In the late 1990s, this problem has undergone a resurgence of interest, for example, loading a data warehouse from data sources and mapping data between formats offered by different web sources. However, judging from the references in papers on the latter topics, the original work on data translation seems largely forgotten. This is a shame. Many of the best database researchers of the period worked on this problem and there is still much to be learned from their papers.

The EXPRESS project at IBM Research was one of the foremost data translation projects of its day, and its results are still worth studying today. This paper is the best overview of EXPRESS. It summarizes the format definition language, called Define, and the data transformations, called Convert (see also [1,2]). Then it describes the EXPRESS execution engine in some detail.

The first phase is a PL/I reader program that is generated by the Define compiler from a Define description. The reader parses the input stream into the described structures, and checks them for consistency with the definitions. The second phase is a set of Convert operation procedures generated by the Convert compiler from a Convert program. Among other things, the run-time works hard to pipeline the operations as much as possible by analyzing the program and generating a schedule for processing the operators and moving data sections between them with a minimum of data copying. There is lots of implementation detail here, particularly focusing on how to make the translation run fast.

The data translation reserach field largely died down after EXPRESS, so the paper is effectively a checkpoint on the entire line of 1970s data translation research. If you're interested in doing a more thorough study of that work, the paper's bibliography points to other major data translation efforts of the period.

Copyright © 1999 by the author(s). Review published with permission.


References

[1]
Nan C. Shu, Barron C. Housel, Robert W. Taylor, Sakti P. Ghosh, Vincent Y. Lum: EXPRESS: A Data EXtraction, Processing, amd REStructuring System. ACM Trans. Database Syst. 2(2): 134-174(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Barron C. Housel, Diane C. P. Smith, Nan C. Shu, Vincent Y. Lum: DEFINE: A Non-Procedural Data Description Language for Defining Information Easily. ACM Pacific 1975: 62-70 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Nan C. Shu, Barron C. Housel, Vincent Y. Lum: CONVERT: A High Level Translation Definition Language for Data Conversion. Commun. ACM 18(10): 557-567(1975) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:26:56 2010 by Michael Ley (ley@uni-trier.de)