|
||||||
![]() |
|
|||||
|
DescriptionThe goal of the EXTOLL project is to develop an interconnection architecture specifically designed to satisfy the needs of low-latency inter-process-communication in parallel machines. The EXTOLL project tries to develop a system architecture which includes advances on the network layer, the attachement of the individual hosts to the interconnection network (network interface controller) and the software layer to enable applications to exploit the EXTOLL hardware and reach higher performance and efficiency. EXTOLL is an on-going project and as new results become available this website will be updated. Hardware Architecture Overview
The complete architecture is formed through a number of major blocks, namely the host interface block, the network interface controller (NIC) block and the network block (green, blue and red in the blockdiagram to the left). The host interface in turn consists of the HyperTransport IP Core and the on-chip network (HyperTransport Advanced Crossbar - HTAX). The NIC block consists of the two communication engines, VELO and RMA, and the supporting units ATU and the registerfile. Finally, the network block consists of network ports, the EXTOLL crossbar and linkports. Networkports convert between the communication engines and the EXTOLL network protocol. Linkports implement the linklayer protocol of EXTOLL, especially retransmission is handled here. The EXTOLL crossbar is the network switching element and forwards incoming packets from any port, either linkport or networkport, to any outgoing port, again either network port or linkport. Both crossbar components of EXTOLL are parametrizable in width in number of ports. A scripted environments allows easy adaption to another number of ports. This feature allows for relatively simple exchange or addition of different engines in the NIC block. A non comprehensive list of features is given below:
The block diagram on the right illustrates an incarnation of the EXTOLL architecture, as it was implemented for an FPGA-based prototype. EXTOLL consists of a number of building-block components. It is possible to adapt the architecture depending on the requirements for a given system environment. The prototype implementation chooses components and a configuration which optimizes communication latency especially for small communication operations in AMD64 host systems. At the same time the implementation is able to support a general set of communication operations and fits into an FPGA based hardware platform. FPGA Prototype Implementation
The EXTOLL prototype is based on the HTX-Board:
Below is a floorplan of the VP4FX100 device with an EXTOLL design after place and route. The HyperTransport interface is colored in green, the NIC layer is blue and the network layer is red. The grey areas are used for SERDES management, I2C interface etc. Software Stack
A complete Software Stack is being developed with the following goals:
The diagram shows the different software components for EXTOLL. These are:
Development ClusterTo develop the EXTOLL software, a (small) development cluster was set-up at the CAG lab. The cluster consists of 10 machines each equipped as follows:
There will soon be a larger cluster available which is also able to run the EXTOLL firmware. Performance ResultsThe following performance numbers have all been measured on a two node configuration, each node equipped with two Opteron 870 (2 GHz, dual-core, K8 generation) and an HTX-Board using HT400. EXTOLL is running at 180MHz, the serial links between the nodes at 3.6Gb/s. The achieved unidirectional payload bandwidth is 316MB/s in this setup, while the half-round trip latency starts at about 1µs. OpenMPI adds about 300 ns of latency. Note that these numbers have been measured on FPGA hardware and include all hardware and software latencies. The hardware latency includes the passing of two switch stages (one EXTOLL crossbar in each node) and an optical serial link, which, due to the sub-optimal latency behavior of the FPGA SerDES, contributes significantly to the overall latency. Also in respect to the software latency, the used CPUs are relatively old and slow, and lower latencies can be reached with higher clocked devices.
Outlook
The results so far are very promising. The latency that was measured on the prototype is in the range of the best available commercial networking
hardware. The important point to remember here is the technology disadvantage of EXTOLL which was benchmarked on an FPGA. The plot on the right side
illustrates the estimated latency and bandwidth that could be reached by an ASIC implementation of the EXTOLL system.
Publications
ContactMondrian Nuessle, email: Mondrian.Nuessle {at} ziti.uni-heidelberg.deLast modified:02.05.2011 |
LecturesSS 12
Digital Semi Custom Design Flow
WS 11/12Functional Verification High Performance Interconnects Entrepreneurship Seminar Hardwareentwurf u. Optik Contact
Universität Heidelberg
LS Rechnerarchitektur Prof. Dr. U. Brüning B6, 26, Building B (3rd floor) 68131 Mannheim Fon: +49 (0) 621 - 181 2723 Fax: +49 (0) 621 - 181 2713 Email: ulrich.bruening {at} ziti.uni-heidelberg.de Delivery Address
Universität Heidelberg
LS Rechnerarchitektur Prof. Dr. U. Brüning B6, 26, Building B (3rd floor) 68159 Mannheim HT Center of Excellence |
||||
|
|
||||||