@inproceedings{DBLP:conf/vldb/Tompa76, author = {Frank Wm. Tompa}, editor = {Peter C. Lockemann and Erich J. Neuhold}, title = {Choosing an Efficient Internal Schema}, booktitle = {Systems for Large Data Bases, September 8-10, 1976, Brussels, Belgium}, publisher = {North Holland {\&} IFIP}, year = {1976}, isbn = {0-7204-0546-7}, pages = {65-77}, ee = {db/conf/vldb/Tompa76.html}, crossref = {DBLP:conf/vldb/76}, bibsource = {DBLP, http://dblp.uni-trier.de} }
The internal schema for a data base should not be concocted in an ad hoc manner, but rather result from an algorithmic design. This paper describes a methodology for choosing an efficient internal schema for a data base, given the conceptual schema and a target machine. It is assumed that the conceptual schema is composed of abstract data types which have associated "activity profiles" reflecting expected usage (e.g., expected set sizes and relative frequencies for insertions and deletions).
The methodology requires the construction of a library of permissible realizations for each abstract data type which may be used in any conceptual schema. Each member of the library is a cluster of code implementing the defined operations for one particular representation of a data type. (These clusters, which are invoked when a program accesses a data base, may be regarded as either open or closed subroutines.) Using the target machine's functional characteristics, a translator compiling the code can automatically generate parametric formulas which represent the expected run time and storage space used by the data that is to be manipulated by each cluster.
The application of the methodology itself starts with the construction of an evaluation matrix, for which the entries are derived from the library's parametric formulas by substituting values which reflect each component of the conceptual schema. Thus, for every occurrence of a data type in the conceptual schema, this matrix incorporates the expected run time and storage space required for each cluster which may be used to realize that type. The evaluation matrix is then searched using branch and bound to choose, for each data type occurrence, that cluster which minimizes the internal schema's overall cost.
The major advantage of this approach is that it includes a comparison of all internal representations which can be realized by superimposing members from a library. Therefore, a cluster which is traditionally used in one specific application, and which is therefore present in the library, will be considered for every occurrence of the corresponding type in the conceptual schema. Automation of the algorithm results in a thorough, yet efficient, evaluation of all alternatives, thus ensuring an efficient data base representation.
Copyright © 1976 by International Federation for Information Processing (IFIP).