Contact

Tel	_	+49 681 302-2034
Mail	_
Room	_	426
Address	_	Saarland Informatics Campus Building E1 3 66123 Saarbrücken Germany

Ralf Karrenberg

Research Interests

Compiler Construction (Code Generation, SIMD Vectorization, Data-Parallel Languages)
Computer Graphics (Ray Tracing, Shading Languages, Acceleration Structures)
SIMD Programming

Projects

Whole-Function Vectorization
AnySL: Efficient and Portable Multi-Language Shading.
Noise: User-Defined Optimization Strategies.

Publications

Conferences

Presburger Arithmetic in Memory Access Optimization for Data-Parallel Languages - FroCoS 2013 2013
Karrenberg, R., Kosta, M. and Sturm, T.
Frontiers of Combining Systems, 2013. [bib]

@CONFERENCE{KKS:2013:frocos,
 	author = {Ralf Karrenberg and Marek Kosta and Thomas Sturm},
 	title = {Presburger Arithmetic in Memory Access Optimization for Data-Parallel Languages},
 	booktitle = {Frontiers of Combining Systems},
 	booktitle_short = {FroCoS 2013},
 	year = {2013},
 }

Improving Performance of OpenCL on CPUs - CC 2012
Karrenberg, R. and Hack, S.
Compiler Construction, 2012. [url] [bib]

@CONFERENCE{KH:2012:opencl,
 	author = {Ralf Karrenberg and Sebastian Hack},
 	title = {Improving Performance of OpenCL on CPUs},
 	booktitle = {Compiler Construction},
 	booktitle_short = {CC},
 	year = {2012},
 	url = {http://www.cdl.uni-saarland.de/papers/karrenberg_opencl.pdf}
 }

Whole Function Vectorization
Karrenberg, R. and Hack, S.
International Symposium on Code Generation and Optimization, 2011. [doi] [url] [slides] [bib]

@CONFERENCE{KH:2011:cgo,
 	author = {Ralf Karrenberg and Sebastian Hack},
 	title = {{W}hole {F}unction {V}ectorization},
 	booktitle = {International Symposium on Code Generation and Optimization},
 	series = {CGO},
 	year = {2011},
 	doi = {10.1109/CGO.2011.5764682},
 	abstract = {
 		Abstract—Data-parallel programming languages are an important component
 		in today's parallel computing landscape. Among those are domain-
 		specific languages like shading languages in graphics (HLSL, GLSL,
 		RenderMan, etc.) and "general-purpose" languages like CUDA or OpenCL.
 		Current implementations of those languages on CPUs solely rely on multi-
 		threading to implement parallelism and ignore the additional intra-core
 		parallelism provided by the SIMD instruction set of those processors
 		(like Intel's SSE and the upcoming AVX or Larrabee instruction sets).
 		In this paper, we discuss several aspects of implementing data-parallel
 		languages on machines with SIMD instruction sets. Our main contribution
 		is a language- and platform-independent code transformation that
 		performs whole-function vectorization on low-level intermediate code
 		given by a control flow graph in SSA form.
 		We evaluate our technique in two scenarios: First, incorporated in a
 		compiler for a domain-specific language used in real-time ray tracing.
 		Second, in a stand-alone OpenCL driver. We observe average speedup
 		factors of 3.9 for the ray tracer and factors between 0.6 and 5.2 for
 		different OpenCL kernels.
 	},
 	webslides = {http://www.cdl.uni-saarland.de/projects/wfv/wfv_cgo11_slides.pdf},
 	url = {http://www.cdl.uni-saarland.de/papers/karrenberg_wfv.pdf},
 	acc_rate = {26.7},
 	accepted = {28},
 	submitted = {105},
 }

AnySL: Efficient and Portable Shading for Ray Tracing - HPG 2010
Karrenberg, R., Rubinstein, D., Slusallek, P. and Hack, S.
Proceedings of the Conference on High Performance Graphics, pages 97–105, Eurographics Association, 2010. [url] [slides] [bib]

@CONFERENCE{KRSH:2010:hpg,
 	author = {Ralf Karrenberg and Dmitri Rubinstein and Philipp Slusallek and Sebastian Hack},
 	title = {{AnySL: Efficient and Portable Shading for Ray Tracing}},
 	booktitle = {Proceedings of the Conference on High Performance Graphics},
 	series = {HPG '10},
 	year = {2010},
 	location = {Saarbrucken, Germany},
 	pages = {97--105},
 	numpages = {9},
 	url = {http://portal.acm.org/citation.cfm?id=1921479.1921495},
 	acmid = {1921495},
 	publisher = {Eurographics Association},
 	address = {Aire-la-Ville, Switzerland, Switzerland},
 	booktitle_short = {HPG},
 	abstract = {
 		While a number of different shading languages have been developed,
 		their efficient integration into an existing renderer is notoriously
 		difficult, often boiling down to implementing an entire compiler
 		toolchain for each language. Furthermore, no shading language is
 		broadly supported across the variety of rendering systems.
 		AnySL attacks this issue from multiple directions: We compile shaders
 		from different languages into a common, portable representation, which
 		uses subroutine threaded code: Every language operator is translated to
 		a function call. Thus, the compiled shader is generic with respect to
 		the used types and operators.
 		The key component of our system is an embedded compiler that
 		instantiates this generic code in terms of the renderer's native types
 		and operations. It allows for flexible code transformations to match
 		the internal structure of the renderer and eliminates all overhead due
 		to the subroutine threaded code. For SIMD architectures we
 		automatically perform vectorization of scalar shaders which speeds up
 		rendering by a factor of 3.9 on average on SSE. The results are highly
 		optimized, parallel shaders that operate directly on the internal data
 		structures of a renderer. We show that both traditional shading
 		languages such as RenderMan, but also C/C++-based shading languages,
 		can be fully supported and deliver high performance across different
 		CPU renderers.
 	},
 	webslides = {http://www.cdl.uni-saarland.de/projects/anysl/anysl_hpg10_slides.pdf}
 }

MSc Thesis

Automatic Packetization
Karrenberg, R.
M.Sc. Thesis, Saarland University, 2009. [pdf] [bib]

@MASTERSTHESIS{Karrenberg:2009:MSc,
     author  = {Ralf Karrenberg},
     title   = {{Automatic Packetization}},
     school  = {Saarland University},
     year    = {2009},
     month   = {July},
     webpdf  = {http://www.cdl.uni-saarland.de/publications/theses/karrenberg_msc.pdf},
 	abstract = {
 		Modern processor architectures provide the possibility to execute an
 		instruction on multiple values at once. So-called SIMD (Single
 		Instruction, Multiple Data) instructions work on packets (or vectors)
 		of data instead of scalar values. They offer a significant performance
 		boost for data-parallel algorithms that perform the same operations on
 		large amounts of data, e.g. data encoding and decoding, image
 		processing, or ray tracing.
 		However, the performance gain comes at a price: programming languages
 		provide no elegant means to exploit SIMD instruction sets. Packet
 		operations have to be coded by hand, which is complicated, unintuitive,
 		and error prone.  Thus, packetization - the transformation of scalar
 		code to packet form - is mostly applied automatically by local compiler
 		optimizations (e.g. during loop vectorization) or with a lot of manual
 		effort at performance-critical parts of a system.
 		This thesis describes an algorithm for automatic packetization that
 		allows a programmer to write scalar functions but use them on packets
 		of data. A compiler pass automatically transforms those functions to
 		work on packets of the target-architecture's SIMD width. The resulting
 		packetized function computes the same results as multiple executions of
 		the scalar code.
 		The algorithm is implemented in a source-language and target-
 		architecture independent intermediate representation (the Low Level
 		Virtual Machine (LLVM)), which enables its use in many different
 		environments. The performance of the generated code is shown in a real-
 		world case study in the context of real-time ray tracing: serial shader
 		code written in C++ is automatically specialized, optimized, and
 		packetized at runtime. The packetized shaders outperform their scalar
 		counterparts by an average factor of 3.6 on a standard SSE architecture
 		of SIMD width 4.
 	}
 }

BSc Thesis

Memory Aware Realtime Ray Tracing: The Bounding Plane Hierarchy
Karrenberg, R.
B.Sc. Thesis, Saarland University, 2007. [pdf] [bib]

@BACHELORSTHESIS{Karrenberg:2007:BSc,
     author = { Ralf Karrenberg },
     title = { {M}emory {A}ware {R}ealtime {R}ay {T}racing: {T}he {B}ounding {P}lane {H}ierarchy },
     school = {Saarland University},
     year = { 2007 },
     month = { August },
     webpdf  = {http://www.cdl.uni-saarland.de/publications/theses/karrenberg_bsc.pdf},
 	abstract = {
 		Realtime Ray Tracing has become available on single desktop environments
 		over the last few years, recently moving on to supporting dynamic
 		scenes. Approaches either use general-purpose CPUs, high-end
 		programmable graphics cards or specialised custom hardware. One of the
 		most important factors influencing the performance of all
 		implementations is the algorithm’s dependence on a spatial index
 		structure. Its random memory access is very slow compared to the
 		computational power of current computer hardware, often being the
 		bottleneck of the system.
 		The goal of this thesis was the design and implementation of a spatial
 		index structure that focuses entirely on improving memory efficiency in
 		trade for computational complexity. Two main starting points were
 		followed: The memory requirement of the whole structure and of each of
 		its parts on one hand and the caching efficiency on the other. The
 		resulting contributions are the following:
 		• The Bounding Plane Hierarchy (BPH) is a complete acceleration
 		structure equivalent to a Bounding Volume Hierarchy (BVH) with axis-
 		aligned bounding boxes (AABBs) that needs less than half the size of
 		current BVH-implementations.
 		• Treelets are groups of interconnected nodes that are traversed
 		independently from the rest of the acceleration structure by using a new
 		two-stage algorithm. They are employed to enhance cache efficiency -
 		especially on parallel and/or multi-threaded systems like the CELL -
 		and can be applied to any spatial index structure.
 		A first implementation of the BPH with Treelets using Packet Tracing
 		and SIMD-instructions renders animations in real-time while still
 		offering many possibilities for optimization.
 	}
 }

Teaching

Jobs

Please consider the project pages of AnySL and Whole-Function Vectorization for thesis topics and "HiWi" job offers.