High performance switch architectures for CC-NUMA multiprocessors /
Shared-memory multiprocessors are capable of providing significant performance benefits for scientific and commercial applications. Most recent multiprocessors employ the cache coherent non-uniform memory access (CC-NUMA) architecture. This dissertation is focussed on various issues in CC-NUMA multi...
| Main Author: | |
|---|---|
| Format: | Thesis Book |
| Language: | English |
| Published: |
[Place of publication not identified] :
[publisher not identified] ;
1999.
|
| Subjects: | |
| Online Access: | http://proxy.library.tamu.edu/login?url=http://proquest.umi.com/pqdweb?did=730298001&sid=1&Fmt=2&clientId=2945&RQT=309&VName=PQD |
| Summary: | Shared-memory multiprocessors are capable of providing significant performance benefits for scientific and commercial applications. Most recent multiprocessors employ the cache coherent non-uniform memory access (CC-NUMA) architecture. This dissertation is focussed on various issues in CC-NUMA multiprocessor design and evaluation. Crossbar switches are excellent building blocks for designing high performance interconnection networks for CC-NUMA multiprocessors. In this dissertation, four switch design alternatives are presented for multistage interconnection networks. By modeling these switches in an execution driven simulator, performance metrics such as average message latency, stall time and execution time are measured. Performance bottlenecks such as waiting delays and network interference are also identified. Memory management plays an important role in CC-NUMA multiprocessors since it governs the placement of shared data. The effect of three static memory management policies on application performance is measured through extensive execution driven simulations. It is shown that memory access patterns and network performance depend heavily on the memory management employed. Dominant sharing patterns to widely shared data and communication intensive data in scientific and commercial workloads have been identified as performance bottlenecks for CC-NUMA multiprocessors. We propose two latency reduction frame- works to address these bottlenecks. Targeted towards widely-shared data, a hardware caching technique called the switch cache framework is proposed. By embedding a SRAM cache in each crossbar switch, called switch cache, shared data blocks are captured as they bow through the network. The implementation details of a crossbar switch cache called CAche Embedded Switch ARchitecture (CAESAR) is presented. The performance benefits of CAESAR are measured for several scientific applications. Applications with frequent accesses to communication intensive data super in performance due to cache-to-cache transfers that require slow directory lookup and several message transfers over the network. To reduce this cache-to-cache transfer latency, directory caches are embedded in the crossbar switches of the interconnect. These switch directories store ownership information for recently modified blocks and re-route subsequent requests directly to the owner cache. The implementation details of a crossbar switch directory called DiRectory Embedded Switch ARchitecture (DRESAR) is presented. The performance benefits of DRESAR are measured for scientific and commercial workloads. |
|---|---|
| Item Description: | Vita. "Major Subject: Computer Science". |
| Physical Description: | xiv, 157 leaves : illustrations ; 28 cm. Issued also on microfiche from University Microfilm Inc. |
| Bibliography: | Includes bibliographical references (leaves 145-156). |