Ruprecht-Karls-Universität Heidelberg

APU – ATOLL Programmable Unit


So called System Area Networks (SAN) try to address the special characteristics of parallel computing. The major improvements over standard LAN implementations are in higher bandwidth, more sophisticated mechanisms to control data transfer and a better interface providing user-level communication or even a global address space. With an increasing number of single nodes per cluster, a consequence of higher demands on computing power, the communication and of course the impact of communication between the single nodes will also increase.

ATOLL especially targets low latency communication between nodes inside a cluster system, achieving high communication bandwidth even with small packet sizes.
Further research is done on the ATOLL network, especially on reducing the overhead of user-level communication.
The ATOLL Processing Unit (APU) is the prototype of a full 64 bit embedded RISC processor core. It was designed using LISATek's Language for Instruction Set Architectures (LISA), originally developed at the RWTH Aachen, now property of CoWare.

APU aims towards the future extention of the ATOLL Network on a chip! to a Programmable Network Interface.

The APU core is the first implementation of an ATOLL-Network Processing Unit (APU) as a reduced instruction set computer (RISC) microprocessor core. APU implements a full 64-bit architecture.
APU is a simple RISC processor core capable of issuing one instruction per clock. All instructions except for memory load instructions execute in one clock cycle.
The Instruction Set Architecture of APU was kept simple and small to get an efficient and fast implementation of a processor core. For example, multiplication and floating point calculations were kept aside. Nevertheless the APU architecture is highly scalable and flexible and future implementation may target further tasks for a processor core on a network chip.

APU Features

APU Pipeline Structure

Picture of APU Pipeline Structure                             

APU Core Interface

Picture of APU Core Interface

RISC microprocessor core

  • 64-bit architecture
  • five-stage load use interlocked pipeline
  • one instruction issue per clock cycle
  • single clock cycle execution for all except memory load instructions

large register file

  • 32 x 64-bit general purpose registers
  • two read, two write port register file

high pipeline throughput

  • 64-bit integer arithmetic logical unit completely decoupled compare and branch
  • non-interlocked pipeline with forwarding paths to eliminate data dependencies
  • one delay slot for control flow instructions
  • outstanding loads supported
  • out of order load completion supported
  • support for touch-loads
  • hardware interrupt and software signal servicing

 PDF Documents

APU Core Interface Signal Description
APU Core Interface Signal Timing
Registers and Programming Model
APU Instruction Set Summary


back to top