ICS 1999: Rhodes, Greece

ICS '99, Proceedings of the 1999 International Conference on Supercomputing, June 20-25, 1999, Rhodes, Greece. ACM, 1999

Francisca Quintana, Jesús Corbal, Roger Espasa, Mateo Valero:
Adding a vector unit to a superscalar processor. 1-10
Huy Nguyen, Lizy Kurian John:
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology. 11-20
Kunle Olukotun, Lance Hammond, Mark Willey:
Improving the performance of speculatively parallel applications on the Hydra CMP. 21-30
Jeffrey B. Rothman, Alan Jay Smith:
The pool of subsectors cache design. 31-42
Peter J. Keleher:
Symmetry and performance in consistency protocols. 43-50
F. Jesús Sánchez, Antonio González:
A locality sensitive multi-module cache with explicit management. 51-59
Jeeraporn Srisawat, Nikitas A. Alexandridis:
A new ``quad-tree-based'' sub-system allocation technique for mesh-connected parallel machines. 60-67
Andrei Radulescu, Arjan J. C. van Gemund:
On the complexity of list scheduling algorithms for distributed-memory systems. 68-75
Daniel Jiménez-González, Josep-Lluis Larriba-Pey, Juan J. Navarro:
Communication conscious radix sort. 76-82
Martin C. Rinard, Pedro C. Diniz:
Eliminating synchronization bottlenecks in object-based programs using adaptive replication. 83-92
Kyung Dong Ryu, Jeffrey K. Hollingsworth, Peter J. Keleher:
Mechanisms and policies for supporting fine-grained cycle stealing. 93-100
Dejan Perkovic, Peter J. Keleher:
Responsiveness without interrupts. 101-108
Yuan C. Chou, Jason Fung, John Paul Shen:
Reducing branch misprediction penalties via dynamic control independence detection. 109-118
Alex Ramírez, Josep-Lluis Larriba-Pey, Carlos Navarro, Josep Torrellas, Mateo Valero:
Software trace cache. 119-126
Chi-Hung Chi, Jun-Li Yuan, Chin-Ming Cheung:
Cyclic dependence based data reference prediction. 127-134
Xiaowei Shen, Arvind, Larry Rudolph:
CACHET: an adaptive cache coherence protocol for distributed shared-memory systems. 135-144
Alexander V. Veidenbaum, Weiyu Tang, Rajesh K. Gupta, Alexandru Nicolau, Xiaomei Ji:
Adapting cache line size to application behavior. 145-154
Timothy Sherwood, Brad Calder, Joel S. Emer:
Reducing cache misses using hardware and software page placement. 155-164
Dongming Jiang, Brian O'Kelley, Xiang Yu, Sanjeev Kumar, Angelos Bilas, Jaswinder Pal Singh:
Application scaling under shared virtual memory on a cluster of SMPs. 165-174
Liviu Iftode, Matthias A. Blumrich, Cezary Dubnicki, David L. Oppenheimer, Jaswinder Pal Singh, Kai Li:
Shared virtual memory with automatic update support. 175-183
Evan Speight, Hazim Abdel-Shafi, John K. Bennett:
Realizing the performance potential of the virtual interface architecture. 184-192
Valentin Puente, José A. Gregorio, Cruz Izu, Ramón Beivide, Fernando Vallejo:
Low-level router design and its impact on supercomputer system performance. 193-201
José F. Martínez, Josep Torrellas, José Duato:
Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity. 202-209
Daniel Franco, I. Garcés, Emilio Luque:
A new method to make communication latency uniform: distributed routing balancing. 210-219
Francisco Corbera, Rafael Asenjo, Emilio L. Zapata:
New shape analysis techniques for automatic parallelization of C codes. 220-227
Amy W. Lim, Gerald I. Cheong, Monica S. Lam:
An affine partitioning algorithm to maximize parallelism and minimize communication. 228-237
Claudia Roberta Calidonna, Maurizio Giordano, Mario Mango Furnari:
A graphic parallelizing environment for user-compiler interaction. 238-245
Masato Oguchi, Masaru Kitsuregawa:
Dynamic remote memory acquisition for parallel data mining on ATM-connected PC cluster. 246-252
Yong E. Cho, Marianne Winslett, Szu-Wen Kuo, Jonghyun Lee, Ying Chen:
Parallel I/O for scientific applications on heterogeneous clusters: a resource-utilization approach. 253-259
Shinji Sumimoto, Hiroshi Tezuka, Atsushi Hori, Hiroshi Harada, Toshiyuki Takahashi, Yutaka Ishikawa:
The design and evaluation of high performance communication using a Gigabit Ethernet. 260-267
Donald Yeung:
The scalability of multigrain systems. 268-277
Nandini Mukherjee, John R. Gurd:
A comparative analysis of four parallelisation schemes. 278-285
Thomas L. Sterling, Larry A. Bergman:
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3. 286-293
Xavier Martorell, Eduard Ayguadé, Nacho Navarro, Julita Corbalán, Marc González, Jesús Labarta:
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. 294-301
Suvas Vajracharya, Steve Karmesin, Peter H. Beckman, James Crotinger, Allen D. Malony, Sameer Shende, R. R. Oldehoeft, Stephen Smith:
SMARTS: exploiting temporal locality and parallelism through vertical execution. 302-310
Bradford L. Chamberlain, E. Christopher Lewis, Lawrence Snyder:
Problem space promotion and its evaluation as a technique for efficient parallel computation. 311-318
Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou:
A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000. 319-328
Hongzhang Shan, Jaswinder Pal Singh:
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000. 329-338
Ravi R. Iyer, Nancy M. Amato, Lawrence Rauchwerger, Laxmi N. Bhuyan:
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications. 339-347
Ivan Martel, Daniel Ortega, Eduard Ayguadé, Mateo Valero:
Increasing effective IPC by exploiting distant parallelism. 348-355
Amir Roth, Andreas Moshovos, Gurindar S. Sohi:
Improving virtual function call target prediction via dependence-based pre-computation. 356-364
Pedro Marcuello, Antonio González:
Clustered speculative multithreaded processors. 365-372
Yuanyuan Zhou, Peter M. Chen, Kai Li:
Fast cluster failover using virtual memory-mapped communication. 373-382
Michael D. Beynon, Alan Sussman, Joel H. Saltz:
Performance impact of proxies in data intensive client-server applications. 383-390
A. Ferre-Vilaplana, José M. Bernabéu-Aubán:
A comparison of two approaches for independent scaling up of processing and communication capacities in multicomputer networks. 391-398
Glenn Reinman, Brad Calder, Dean M. Tullsen, Gary S. Tyson, Todd M. Austin:
Classifying load and store instructions for memory renaming. 399-407
Gang Chen, Michael D. Smith:
Reorganizing global schedules for register allocation. 408-416
V. Janaki Ramanan, Ramaswamy Govindarajan:
Resource usage models for instruction scheduling: two new models and a classification. 417-424
John M. Mellor-Crummey, David B. Whalley, Ken Kennedy:
Improving memory hierarchy performance for irregular applications. 425-433
Vijay Menon, Keshav Pingali:
High-level semantic optimization of numerical codes. 434-443
Siddhartha Chatterjee, Vibhor V. Jain, Alvin R. Lebeck, Shyam Mundhra, Mithuna Thottethodi:
Nonlinear array layouts for hierarchical memory systems. 444-453
Jay B. Brockman, Peter M. Kogge, Thomas L. Sterling, Vincent W. Freeh, Shannon K. Kuntz:
Microservers: a new memory semantics for massively parallel computing. 454-463
Ashley Saulsbury, Su-Jaen Huang, Fredrik Dahlgren:
Efficient management of memory hierarchies in embedded DRAM systems. 464-473
Carlos Molina, Antonio González, Jordi Tubella:
Dynamic removal of redundant computations. 474-481
Induprakas Kodukula, Keshav Pingali, Robert Cox, Dror E. Maydan:
An experimental evaluation of tiling and shackling for memory hierarchy management. 482-491
Jacqueline Chame, Sungdo Moon:
A tile selection algorithm for data locality and cache interference. 492-499
Mahmut T. Kandemir, Prithviraj Banerjee, Alok N. Choudhary, J. Ramanujam, Eduard Ayguadé:
An integer linear programming approach for optimizing cache locality. 500-509