Ruprecht-Karls-Universität Heidelberg

Traversal Algorithms for Ray Tracing - An Architectural Evaluation

Master Thesis by Martin Lingnau


Ray tracing is an inherent parallel process, which is used to render realistic computer graphic by simulating rays of light. It should work very well on parallel architectures like GPUs. However, the ray tracing environment is still dominated by CPUs. This thesis explores the requirements for a well performing ray tracing algorithm and analyzes how they can be fulfilled by the CPU and GPU architecture, respectively. It is shown that ray traversal is the most time consuming portion of ray tracing. Thus, ray traversal is the prime target for optimization. We discuss how this optimization can be implemented on both CPU and GPU. We also discuss how optimization strategies evolved over time together with the change in architectural design. For this purpose, we look at the design of CPUs and GPUs and find out which programming methods lead to well performing code on each architecture. We then analyze past and present ray traversal algorithms to find their architecture dependent differences. Because traversal relies on its underlying data structure, which stores all information about the rendered image, we present different approaches to store the data. We show that the design of all analyzed traversal algorithms can be described as a trade-off between the number of operations and the amount of memory occupancy necessary to trace a single ray. This trade-off is closely related to the different memory hierarchies of CPU and GPU. They favor ray tracing on CPUs. However, the design focus of GPUs has shifted to general purpose computing. While past day GPUs benefited from algorithms that had small cache requirements, nowadays, the best performing ray traversal algorithm is the same for both CPU and GPU: stack-based traversal. Nonetheless, the algorithm still needs to be adapted to use architecture specific intrinsic operations. These allow the programmer to influence how data is stored, how data is retrieved, and how much parallelization can be achieved by the computing cores.


« back

back to top