Ruprecht-Karls-Universität Heidelberg

Multilevel Optimization of Parallel Applications Utilizing a System Area Network

Diploma Thesis by Tobias Jakob


Clusters are a very cost efficient parallel architecture. They exclusively consist of commodity off the shelf components, which enables them to offer high performance at very low cost compared to commercial parallel supercomputers. A further advantage is their versatile usage, which can be seen by the large variety of scientific and commercial applications running on clusters. However, a vulnerable point of a cluster is its interconnection network. The performance can drop considerably, if the network cannot comply with applicationns demands. Recent research addressed this problem, which eventually led to todayos high performance system area networks. The ATOLL system area network is a development of the chair of computer architecture, Mannheim. It supports user level communication via a descriptor based DMA transfer. Nonetheless, these DMA transfers still need a temporary copy, which describes an additional overhead. Therefore a new zero copy mechanism including memory pinning and virtual to physical address translation, is implemented, as well as new API functions, which provide zero copy transfers for user space applications. In this thesis we describe the details of the implementation. By parallelizing a sample application, the thesis also shows, that the network architecture is a main factor of cluster performance. The application is parallelized and optimized in various forms, major versions are a slice and a mesh partitioning. It can be seen, that on the one hand parallelization can result in astonishing performance speedups, but on the other hand speedups of the same parallel application have very high variations, if underlying networks are exchanged.


« back

back to top