GPUGI: Global Illumination Effects on the GPU

(1)

GPUGI: Global Illumination Effects on the GPU

L´aszl´o Szirmay-Kalos

Budapest University of Technology and Economics, Budapest, Magyar Tud´osok krt. 2., H-1117, HUNGARY Email: szirmay@iit.bme.hu

URL: http://www.iit.bme.hu/˜szirmay

László Szécsi

Budapest University of Technology and Economics, Budapest, Magyar Tud´osok krt. 2., H-1117, HUNGARY Email: szecsi@iit.bme.hu

URL: http://www.iit.bme.hu/˜szecsi

Mateu Sbert

University of Girona, Campus Montilivi, Edifici PIV, 17071 Girona, Spain Email: mateu@ima.udg.es

URL: http://ima.udg.es/˜mateu

Abstract

In this tutorial we explain how global illumination rendering methods can be implemented on Shader Model 3.0 GPUs. These algorithms do not follow the conventional local illumination model of Di- rectX/OpenGL pipelines, but require global geometric or illumination information when shading a point. In addition to the theory and state of the art of these approaches, we go into the details of a few algorithms, including mirror reflections, reflactions, caustics, diffuse/glossy indirect illumination, precomputation aided global illumination for surface and volumetric models, obscurances and tone mapping, also giving their GPU implementation in HLSL or Cg language.

Keywords:Global illumination, GPU programming, HLSL, Radiosity, Soft shadow algorithms, Environ- ment mapping, Diffuse/Glossy indirect illumination, Mirror Reflection/Refraction, Caustics generation, Monte-carlo methods, Pre-computation aided global illumination, PRT, Ambient Occlusion, Obscurances, Participating Media, Multiple scattering.

Contents of the tutorial

This tutorial presents techniques to solve various subproblems of global illumination rendering on the Graphics Processing Unit (GPU). The state of the art is discussed briefly and we also go into the details of a few example methods. Having reviewed the global illumination rendering problem and the operation of the rendering pipeline of the GPUs, we discuss six cat- egories of such approaches.

1. Simple improvements of the local illumination light- ing model.First, to warm up, we examine two rel- atively simple extensions to the local illumination rendering,shadow mapping andimage based light- ing. Although these are not considered global il- lumination methods, they definitely represent the first steps from pure local illumination rendering toward more sophisticated global illumination approaches. These techniques already provide some insight on how the basic functionality of the local illumination pipeline can be extended with the programmable features of the GPU.

2. Ray-tracing. Here we present the implementation of the classic ray-tracing algorithm on the GPU.

Since GPUs were designed to execute rasterization based rendering, this approach fundamentally re- interprets the operation of the rendering pipeline.

3. Specular effects with rasterization. In this section

(2)

we return to rasterization and consider the generation ofspecular effects, includingmirror reflections, refractions, and caustics. Note that these meth- ods are traditionally rendered by ray-tracing, but for the sake of efficient GPU implementation, we need to generate them with rasterization. Having surveyed the proposed possibilities, we concentrate here on the method calledApproximate ray tracing with distance impostors.

4. Diffuse/glossy indirect illumination. This section deals with non-specular effects, which require special data structures stored in the texture memory, from which the total diffuse/glossy irradiance for an arbitrary point may be efficiently retrieved. Of course, these representations always make compro- mises between accuracy, storage requirements, and final gathering computation time. We present two algorithms in detail. The first is the implementation of the stochastic radiosity algorithm on the GPU, which stores the radiance in a color texture. The second considers final gathering of diffuse and glossy indirect illumination usinglocalized cube maps.

5. Pre-computation aided global illumination. These algorithms pre-compute the effects of light paths and store these data compactly in the texture memory for later reuse. Of course, pre-computation is possible if the scene is static. Then during the real- time part of the process, the actual lighting is combined with the prepared data and real-time global illumination results are provided. Having presented the theory offinite element methodsandsampling, we discuss three methods in details:Pre-computed radiance transfer(PRT) using finite-element representation,Light path maps that are based on sampling, and Participating media illumination net- works, which again use sampling.

6. Fake global illumination. There are methods that achieve high frame rates by simplifying the under- lying problem. These approaches are based on the recognition that global illumination is inherently complex because the illumination of every point may influence the illumination of every other point in the scene. However, the influence diminishes with the distance, thus it is worth considering only the local neighborhood of each point during shading.

Methods using this simplification include obscu- rancesandambient occlusion, from which the first is presented in details. Note that these methods are not physically plausible, but provide satisfying results in many applications.

When the particular methods are discussed, images and rendering times are also provided. If it is not stated explicitly, the performance values (e.g. frames per second) have been measured on an Intel P4 3 GHz

PC with 1GB RAM and NVIDIA GeForce 6800 GT graphics card in full screen mode (1280×1024 resolution).

Notations

In this tutorial we tried to use unified notations when discussing different approaches. The most general notations are also listed here.

• L(~x, ~ω): the radiance of point~xat direction~ω.

• L^r(~x, ~ω): the reflected radiance of point~xat direction~ω.

• L^e(~x, ~ω): the emission radiance of point~xat direction~ω.

• L^env(~ω): the radiance of the environment illumination from direction~ω.

• fr(~ω⁰, ~x, ~ω): BRDF function at point~xfor illumination direction~ω⁰, viewing direction~ω. If the surface is diffuse, the BRDF is denoted byfr(~x).

• θ⁰: the angle between the illumination direction and the surface normal.

• ~x: the point to be shaded, which is the receiver of the illumination.

• ~y: the point that is the source of the illumination.

• v(~x, ~y): visibility indicator which is 1 if points~xand

~yare visible from each other and zero otherwise.

• World: a uniform parameter of the shader program of type float4x4, which transforms from modeling to world space.

• WorldIT: a uniform parameter of the shader program of type float4x4, the inverse-transpose of World, used to transform normal vectors from the modeling space to the world space. This matrix is also used to transform rays from world space to modeling space, transposed in shader.

• WorldView: a uniform parameter of the shader program of typefloat4x4, which defines the transformation matrix for place vectors (points) from the modeling space to the camera space

• WorldViewIT: a uniform parameter of the shader program of type float4x4, which defines the inverse-transpose of WorldView, used to transform normal vectors from the modeling space to the camera space.

• WorldViewProj: a uniform parameter of the shader program of typefloat4x4, which defines the transformation matrix from the modeling space to the clipping space.

• DepthWorldViewProj: a uniform parameter of the shader program of typefloat4x4, which defines the transformation matrix from the modeling space to the clipping space used when rendering the depth map.

• DepthWorldViewProjTex: a uniform parameter of the shader program of typefloat4x4, which defines

(3)

the transformation matrix from the modeling space to the texture space of the depth map.

• EyePos: a uniform parameter of the shader program of type float3, which defines the camera position in world space.

• Pos, wPos, cPos, hPos: vertex or fragment positions in modeling, world, camera and clipping spaces, respectively.

• Norm, wNorm, cNorm: vertex or fragment normals in modeling, world and camera spaces, respectively.

• oColor, oTex: vertex shader output color and texture coordinates.

1. Global illumination rendering

Global illumination algorithms should identify all light paths connecting the eye and the light sources via one or more scattering points, and should add up their contribution to obtain the power arriving at the eye through the pixels of the screen [Kaj86]. The scattering points of the light paths are on the surface in case of opaque objects, or can even be inside of translucent objects (subsurface scattering[JMLH01]).

x L(x, )

ω ω

θ ’’ ω

L (y,ω’

’

) y

Figure 1: Notations of the computation of the re- flected radiance

Let us first consider just a single scattering or one- bounce light transport (figure 1). Denoting the radiance of point ~y in direction ~ω⁰ by L(~y, ~ω⁰), reflected radianceL^r(~x, ~ω) at scattering point~x is the sum of contributions from all incoming directions~ω⁰:

L^r(~x, ω) = Z

Ω⁰

L(~y, ~ω⁰)·fr(~ω⁰, ~x, ~ω)·cos⁺θ⁰~xdω⁰, (1) where ~y is the point visible from ~x at direction

−~ω⁰, Ω⁰ is the directional sphere, fr(~ω⁰, ~x, ~ω) is the bi-directional reflection/refraction function (BRDF), and θ_~⁰_x is the angle between the surface normal and direction −ω⁰ at ~x. Ifθ⁰_~_x is greater than 90 degrees, then the negative cosine value should be replaced by zero, which is indicated by superscript⁺.

In order to consider not only single bounce but also

multiple bounce light paths, the same integral should also be recursively evaluated at visible points~y, which leads to a sequence of high dimensional integrals:

L^r= Z

Ω⁰₁

f1cos⁺θ⁰1



^L^e⁺ Z

Ω⁰₂

f2cos⁺θ2⁰ ·(L^e. . .) dω⁰2



^dω¹⁰

(2) whereL^e is the emission radiance.

A straightforward technique to compute high- dimensional integrals is the Monte Carlo (or quasi- Monte Carlo) method [Sob91, SK99a, DBB03], which generates finite number of random light paths and ap- proximates the integral as the sum of the individual path contributions divided by the probability of generating this sample path.

Accurate results need a huge number of light paths.

For comparison, in reality a 100 W electric bulb emits about 10⁴² number of photons in each second, and the nature “computes” the paths of these photons in parallel with the speed of the light independently of the scene complexity. Unfortunately, when it comes to computer simulation, we shall never have 10⁴²parallel processors running with the speed of light. It means that the number of simulated light paths must be significantly reduced and we should accept longer rendering times. For real-time applications, the upper limit for rendering times comes from the requirement that to maintain interactivity and to provide smooth ani- mations, computers must generate at least 20 images per second. Note that on an 1000×1000 resolution display this allows 50 nsec to compute the light paths going through a single pixel.

To meet this performance requirement, the problem to be solved is often simplified. One popular simplification approach is the local illumination model that ignoresindirect illumination (figure 2). The local illumination model examines only one-bounce light paths having a single scattering point and thus can use only local surface properties when the reflection of the illumination of a light source toward the camera is computed. In local illumination shading, having obtained the point visible from the camera through a pixel, the reflected color can be determined without addi- tional geometric queries. In the simplest case when even shadows are ignored, visibility is needed only from the point of the camera. To solve such visibility problems, the GPU rasterizes the scene and finds the visible points using thez-buffer hardware.

Global illumination algorithms also compute indirect illumination. It means that we need visibility information not only from the camera but from every

(4)

Figure 2:Comparison of local illumination rendering (left), local illumination with shadows (middle), and global illumination rendering (right)

shaded point. This is a requirement GPUs are not built for, which makes GPU based global illumination algorithms hard and challenging.

One option is to follow the research directions of the era when global illumination was fully sepa- rated from the graphics hardware and its algorithms were running on the CPU, and try to port those algorithms onto the GPU. For example, it is time to revisit the research on efficient ray-shooting and to consider what kind of space partitioning schemes and algorithms can be implemented on the GPU [PBMH02b, PDC^∗03, OLG^∗05, FS05]. Another fam- ily of techniques that has been proven to be suc- cessful in CPU implementations recognizes that it is not worth generating paths completely independently from scratch, but the visibility and illumination information gained when generating a path should be reused for other paths as well. Photon mapping [Jen96],instant radiosity[Kel97], and deterministic or stochastic iterative radiosity [SK99b] all reuse parts of the previously generated paths to speed up the computation. Furthermore, if the scene is static, then paths can bepre-computedonly once and reused during rendering without repeating the expensive computation steps. Reuse and pre-computation are promising tech-

niques in GPU based real-time global illumination algorithms as well.

On the other hand, GPUGI is not just porting already known global illumination algorithms to the GPU. The GPU is a special purpose hardware, so to efficiently work with it, its special features and limi- tations should also be taken into account. This con- sideration may result in solutions that are completely different from the CPU based methods.

2. Local illumination rendering pipeline of current GPUs

2.1. Evolution of the fixed function rendering pipeline

Current, highly programmable graphics hardware evolved from simple monitor adapters. Their task was barely more than to store an image in memory, and channel the data to control the electron beam lighting monitor pixels. Already this had to be done at a then incredible speed, making it a feat that could only be achieved through parallelization. However, raster adapters began their real revolution when they started supporting incremental 3D graphics, earning them the name graphic accelerators. Indeed, the first achieve-

(5)

ment was to implementlinear interpolation in hardware very effectively, by obtaining values in consecu- tive pixels using a single addition [SKe95].

The coordinates of the internal points of atriangle can be obtained by linearly interpolating the vertex coordinates. Other attributes, such as color, texture coordinates, etc. can be approximated by linear interpolation. Thus, when a triangle is rasterized, i.e.

its corresponding pixels are found, linear interpolation can be used for all data.

Any virtual world description can be translated to a set of triangle mesh surface models with some level of fidelity. The basic assumption of incremental 3D rasterization is that we render triangles, defined by triplets of vertices, the position of which is given as 3D vectors, in coordinates relative to the screen (later this space will be referred to asnormalized device space or clipping space depending on whether Cartesian or homogeneous coordinates are used). Thus, the third Cartesian coordinate,z, denotes depth. For every ver- tex, a color is given, which is obtained from shading computations. The array in graphics memory containing records of vertex data (position and color) is called thevertex buffer.

When rendering, every triangle is clipped to the viewport and rasterized to a set of pixels, in which the color and the depth value is computed via incremental linear interpolation. Besides thecolor buffer memory (also calledframe buffer), we maintain adepth buffer, or z-buffer, containing depth related to the current color value in the color buffer. Whenever a triangle is rasterized to a pixel, the color and depth are only over- written if the new depth value is less, meaning the new triangle fragment is closer to the viewer. As a result, we get a rendering of triangles correctly occluding each other in 3D. The quality of shading depends on how we computed the colors for the vertices. But even if those are highly accurate, colors will be smeared over triangles, meaning that image quality depends on the level of tessellation. Furthermore, every pixel will be filled with a uniform color, computed for a single surface point. Therefore, aliasing artifacts, jagged edges will appear, making image quality highly dependent on image resolution, too. The rendering time already depends linearly on two factors: the number of triangles to be rendered, and the number of pixels to be colored, whichever is the bottleneck.

Triangle mesh models have to be very detailed to offer a realistic appearance. An ancient and essential tool to provide the missing details istexture mapping.

Texture images are stored in graphics card memory as 2D arrays of color records. How the texture should be mapped onto triangle surfaces is specified by texture coordinates assigned to every vertex. Thus, the

vertex buffer does not only contain position and color data, but also texture coordinates. These are linearly interpolated within triangles just like colors, and for every pixel, the interpolated value is used to fetch the appropriate color from the texture memory. A number of filtering techniques combining more texel values may also be applied. Then the texture color is used to modulate the original color received from the vertices.

Textures already allow for a great degree of real- ism in incremental 3D graphics. Not only do they provide detail, but missing shading effects, shadows, indirect lighting may be painted or precomputed into textures. Clearly, these static substitutes do not re- spect dynamic scenes or changing lighting or viewing conditions.

The architecture described above requires the vertex positions and colors to be computed on the CPU, involvingtransformationsandlocal illumination light- ing. However, these are well-established procedures in- tegrated into the pipeline of graphics libraries. Sup- porting them on the graphics card was a straightforward advancement.

The vertex records in the vertex buffer store the raw data. Vertex positions are given in modeling coordinates. The transformation to camera space and then from camera to screen space (or clipping space) are given as 4×4 homogeneous linear transformation matrices, the world-view, and the perspective one, respectively. They are of courseuniformfor all vertices, not stored in the vertex buffer, but in a few registers of the graphics card. Whenever these matrices change due to object or camera animation, the vertex buffer does not need to be altered.

Instead of including already computed color values in the buffer, the data required to evaluate the local shading formula are stored. This includes the surface normal and the diffuse and Phong-Blinn reflection coefficients. Light positions, directions and intensities are specified as uniform parameters over all the vertices.

For every vertex, the coordinates are transformed and shading is evaluated. Thus, the screen position and vertex color are obtained. Then the rasterization and linear interpolation are performed to color the pixels, just like without transformation and lighting.

When a final fragment color is computed, it is not directly written to the color buffer. First of all, as discussed above, the depth test against the depth buffer is performed to account for occlusions. However, some more computations are also supported in the hardware. A third buffer called the stencil buffer is also provided. For most of the time, stencil buffer bits are used as flags set when a pixel is rendered to. While

(6)

CPU command processor

vertex shader (transform

& lighting)

memory

clipping &

homogeneous division

rasterization

&

interpolation

fragment shader (texturing)

texture memory

stencil+depth tests blending & rasterops

depth buffer

color buffer

Graphics Processing Unit

render to texture

texture copy texture upload

texture fetch texture

fetch

Host Display

vertex buffer

stencil buffer

Figure 3:Shader Model 3.0 GPU architecture

drawing other objects, the stencil test may be enabled, discarding pixels previously not flagged. This way, reflections in a planar mirror, or shadows might be rendered. The functionality calledblending allows for combining the computed fragment color with the color already written to the color buffer. This is the technique commonly used to achieve transparency. Colors are typically given as quadruplets of values, containing a so-called alpha channel besides the red, green and blue ones. The alpha value generally represents some opacity measure, and the alpha values already in the color buffer and that of the computed fragment color are used to weight the colors when combining them.

Multiple blending formulae are usually supported.

With transformation and lighting modules, the hardware is able to render images using local illumination with per-vertex lighting. The computed color can be replaced or modulated using texture maps. Every improvement towards global illumination must make use of this architecture.

2.2. Architecture of programmable GPUs Current, Shader Model 3.0 — also called DirectX 9 compatible — GPUs implement the complete process of rendering triangle meshes. Figure 3 shows a typical GPU architecture. Figure 4 depicts the dataflow of such systems. Note that in this tutorial we do not cover hardware havingDirectX 10 features [Bly05].

The commands to the graphics API (e.g. DirectX or OpenGL) are passed to thecommand processor, which fills up the vertex buffer with modeling space vertices and their attributes, and also controls the operation of the whole pipeline. Whenever the data belonging to a vertex is ready, thevertex shader module starts working. It gets all attributes belonging to a vertex in its input registers. In the fixed-function pipeline, this module is responsible for transforming the vertex to homogeneousclipping spaceand may modify the color properties if lighting computations are also requested.

POSITION, NORMAL, COLOR0, TEXTCOORD0, ...

* WorldViewProj

* WorldView * WorldViewIT

Illumination

POSITION, COLOR0, TEXTCOORD0, ...

vertex shader vertex normal color

texture coordinates

State:

Transformations Light sources

Materials

Clipping: -w<X<w, -w<Y<w, -w<Z<w linear interpolation for the new vertices

Rendering state

Homogenous division:

X=X/w, Y=Y/w, Z=Z/w

POSITION, COLOR0, TEXTCOORD0, ...

Triangle setup and rasterization Vertex attributes are interpolated Viewport transformation

COLOR

fragment shader

Texturing

State:

Texture params

Texture memory Scissor+Alpha+Stencil+Depth tests

Blending and RasterOps Color

buffer

if render to texture

GPU CPU

Depth buffer Stencil

buffer

Figure 4:Dataflow in a GPU assuming standard local illumination rendering

Then the fixed pipeline waits for a complete triangle and theclipping hardware keeps only those parts where the [x, y, z, w] homogeneous coordinates meet the following requirements defining an axis aligned, origin center cube of corners (−1,−1,−1) and (1,1,1) in normalized screen space:

−w≤x≤w, −w≤y≤w, −w≤z≤w.

(7)

These equations are valid in OpenGL. In DirectX, however, the normalized device space contains points of positivezcoordinates, which consequently modifies the last pair of inequalities to 0≤z≤w.

Clipping may introduce new vertices for which all properties (e.g. texture coordinates or color) are linearly interpolated from the original vertex properties.

After clipping the pipeline executes homogeneous di- vision, that is, it converts homogeneous coordinates to Cartesian ones by dividing the first three homogeneous coordinates by the fourth (w). The points are then transformed toviewport spacewhere the first two Cartesian coordinates select that pixel in which this point is visible. If triangle primitives are processed, the rasterization module waits for three vertices, forms a triangle from them, and fills its projection on the x, yplane, visiting each pixel that is inside the projection. During filling the hardware interpolates all vertex properties to obtain the attributes of a particular fragment. Thefragment shaderhardware takes the attributes of the particular fragment and computes the fragment color. This computation may involve texture lookups if texturing is enabled, and its multiplication with the color attribute in case of modulative texturing. The computed color may participate in raster op- erations, such as alpha blending, and itszcoordinate goes to the z-buffer to detect visibility. If the fragment is visible, the result is written into the color buffer, or alternatively to the texture memory.

Programmable GPUs allow the modification of the fixed-function pipeline at two stages. We can cus- tomize the vertex shading and the fragment shading steps, using assembly language or high level shading languages, such asHLSL,Cg, etc.

2.2.1. Vertex shader

A vertex shader is connected to its input and output register sets defining the attributes of the vertex be- fore and after the operation. All registers are of type float4, i.e. are capable of storing a four element vector. The input register set describes the vertex position (POSITION), colors (COLOR0, COLOR1), normal vector (NORMAL), texture coordinates (TEXCOORD0,..., TEXCOORD8), etc. The vertex shader unit computes the values of the output registers from the content of the input register. During this computation it may also use global, also calleduniform, variables.

The following example shader realizes the vertex processing of the fixed-function pipeline when the lighting is disabled. It applies the World, View, and Projection transformations to transform the vertex from modeling to world, from world to camera, and from camera to homogeneous clipping space, respectively:

// homogenous linear transformation

// from modeling to homogeneous clipping space float4x4 WorldViewProj;

void StandardNoLightingVS(

in float4 Pos : POSITION, // modeling space in float3 Color : COLOR0, // vertex color in float2 Tex : TEXCOORD0,// texture uv out float4 hPos : POSITION, // clipping space out float3 oColor : COLOR0, // vertex color out float2 oTex : TEXCOORD0 // texture uv ) {

// transform to clipping space hPos = mul(Pos, WorldViewProj);

oColor = Color; // copy input color oTex = Tex; // copy texture coords }

The second example executes local illumination computation for the vertices, and replaces the color attribute by the result. The illumination is evaluated in camera space where the eye is in the origin and looks at the−z direction assuming OpenGL, and at the z direction in DirectX. In order to evaluate the Phong-Blinn illumination formula, normal, lighting, and viewing directions should be obtained in camera space. Note that if the shaded point is transformed to camera space by theWorldViewmatrix, the transformation of its associated normal vector should mul- tiply with the inverse-transpose of the same matrix (WorldViewIT). We consider just a single point light source in the example.

// from modeling to homogeneous clipping space float4x4 WorldViewProj;

// from modeling to camera space float4x4 WorldView;

// Inverse-transpose of WorldView // to transform normals

float4x4 WorldViewIT;

// Light source properties

float3 LightPos; // pos in camera space float4 Iamb, Idiff, Ispec; // intensity // Material properties

float4 ka, kd, ks; // reflectances float shininess;

void StandardLightingVS(

in float4 Pos : POSITION, // modeling space in float3 Norm : NORMAL, // normal vector in float2 Tex : TEXCOORD0,// texture uv out float4 hPos : POSITION, // clipping space out float3 oColor : COLOR0, // vertex color in float2 oTex : TEXCOORD0 // texture uv ) {

hPos = mul(Pos, WorldViewProj);

// transform normal to camera space

(8)

float3 N = mul(Norm, WorldViewIT).xyz;

N = normalize(N);

// transform vertex to camera space and // obtain the lighting direction float3 cPos = mul(Pos, WorldView);

float3 L = normalize(LightPos - cPos);

// evaluate the Phong-Blinn reflection float costheta = sat(dot(N, L));

// Obtain view direction using that // the eye is the origin in camera space

float3 V = normalize(-cPos); // viewing direction float3 H = normalize(L + V); // halfway vector float cosdelta = sat(dot(N, H));

oColor = Iamb * ka + Idiff * kd * costheta + Ispec * ks * pow(cosdelta, shininess);

oTex = Tex; // copy texture coords }

2.2.2. Fragment shader

The fragment shader (also calledpixel shader) receives the fragment properties of those pixels which are inside of the clipped and projected triangles, and also uniform parameters. The main goal of the fragment shader is the computation of the fragment color.

The following example program executes modulative texturing. Taking the fragment input color and texture coordinates interpolated from vertex colors and texture coordinates, respectively, the fragment shader looks up the texture memory with the texture coordinates, and the read texture data is multiplied with the fragment input color:

sampler2D texture; // 2D texture sampler float4 TexModPS( in float2 Tex : TEXCOORD0,

in float3 Color : COLOR0 ) : COLOR // output {

return tex2D(texture, Tex) * Color;

}

2.3. Modification of the standard pipeline operation

Programmable vertex and fragment shaders offer a higher level of flexibility on how the data from the vertex buffer is processed, and how shading is performed.

However, the basic pipeline model remains the same:

a vertex is processed, the results are linearly interpolated, and they are used to find the color of a fragment. The flexibility of the programmable stages will allow us to change the shading model, implement per- fragment lighting, or render unfolded triangle charts instead of the models themselves, among the infinite number of other possibilities.

What programmable vertex and pixel shaders alone do not help us with is non-local illumination. All the

data passed to shaders is still only describing local geometry and materials, or global constants, but noth- ing about other pieces of geometry. When a point is shaded with a global illumination algorithm, its radiance will be the function of all other points in the scene. From a programming point of view it means that we need to access the complete scene description when shading a point. While this is granted in CPU based ray tracing systems [WKB^∗02, WBS03], the stream processing architecture of current GPUs fundamentally contradicts to this requirement. When a point is shaded on the GPU we have just its limited amount of local properties stored in registers, and may access texture data. Thus the required global properties of the scene must be stored intextures.

If the textures must be static, or they must be computed on the CPU, then the lions share of illumination is not making use of the processing power of the parallel hardware, and the graphics card merely presents CPU results. Textures themselves have to be computed on the GPU. Therender-to-texture feature allows this: anything that can be rendered to the screen, may be stored in a texture. Such texture render targets may also require depth and stencil buffers. Along with programmability, various kinds of data may be computed to textures. These data may also be stored in floating point format in the texture memory, unlike in the color buffer which usually stores data of 8 bit precision.

To use textures generated by the GPU, the rendering process must be decomposed topasses, where one pass may render into a texture and may use the textures generated by the previous passes. Since the reflected radiance also depends on geometric properties, these textures usually contain not only conventional color data, but they also encode geometry and prepared, reusable illumination information as well.

Straightforwardly, we may render the surroundings of a particular scene entity. Then, when drawing the entity, the fragment shader may be written so that it retrieves colors from this texture based on the reflected eye vector (computed from the local position and normal, plus the global eye position). This is the technique known asenvironment mapping (figure 5).

It is also possible to write a vertex shader which ex- changes the texture and position coordinates. When drawing a model mesh, the result is that the triangles are rendered to their positions in texture space. Any- thing computed for the texels may later be mapped on the mesh by conventional texture mapping. This technique assumes that the mapping is unique, and such a render texture resource is usually called atexture atlas (figure 6).

When textures are used to achieve various effects, it

(9)

Figure 5:Environment mapping using a metal shader

Figure 6:A texture atlas of a rocking horse (left), and a texture atlas of a staircase storing radiance values.

(right)

becomes a necessity to be able to access multiple textures when computing a fragment color. This is called multi-texturing. This day, fragment shaders are able to access as many as 16 textures, any one of them multiple times. They all can be addressed using different modes or sets of texture coordinates, and their results can be combined freely in the fragment program. This allows the simultaneous use of different techniques like environment mapping, bump mapping, light mapping, shadow mapping, etc.

It is also a likely scenario that we need to compute multiple values for a rendering setup. This is accomplished using multiple render targets: a frag- ment shader may output several pixel colors or values, that will be written to corresponding pixels of re- spective render targets. With this feature, computing data that would not fit in a single texture is feasi- ble. For instance,deferred shading [HH04] renders all visible geometry and material properties into screen- sized textures, and then uses these textures to render

the shaded scene without actually rendering the geometry.

Programmability and render-to-texture together make it possible to create some kind of processed representation of geometry and illumination as textures, and then access the data when rendering and shading other parts of the scene. This is the key to addressing the self-dependency of the global illumination rendering problem. In all GPUGI algorithms, we use multiple passes to different render targets to capture some as- pects of the scene like the surrounding environment, the shadowing or the refracting geometry, illumination due to light samples, etc. These passes belong to the illumination information generationpart of rendering (figure 7). In a final pass, also calledfinal gathering, scene objects are rendered to the frame buffer making use of previously computed information to achieve non-local shading effect like shadows, reflections, caustics, or indirect illumination.

Illumination info precomputation

CPU+GPU Texture memory On the fly

reuseable illumination info generation

GPU

Final gathering GPU

Slower than 20 FPS At least 20 FPS

Off-line preprocessing

frame buffer

Figure 7: Structure of real-time global illumination shaders

Passes of the illumination information generation part are responsible for preparing the reusable illumination information and storing it in the texture memory, from which the final gathering part produces the image for the particular camera. To produce continuous animation, the final gathering part should run at high frame rates. Since the scene may change, the illumination information generation should also be re- peated. However, if the illumination information is ap- propriately defined, then its elements can be reused for many points and many frames. Thus the illumination information data structure is compact and might be regenerated at significantly lower frequency than the final gathering frame rate.

As we shall discuss in these tutorial, these features give us enough freedom to implement global illumination algorithms and even ray-tracing approaches.

However, we must be aware that it is not worth going very far from the original concepts of the pipeline, namely rasterization and texturing, because it might have serious performance penalties. This is why GPU implementation often means the invention of brand

(10)

new approaches and not just adapting or porting ex- isting ones.

(11)

3. Simple improvements of the local illumination lighting model

Local illumination models simplify the rendering problem to shading a surface fragment according to a given point-like light source. This is based on several false assumptions, which results in less realistic images:

The light source is always visible. In reality, incoming lighting depends on the materials and geometry found between the light source and the shaded point. Most prominently, solid objects may occlude the light. Neglecting this effect, we will render images without shadows. In order to eliminate this shortcoming, we may capture occluding geometry in texture maps, and test the light source for visibility when shading. This technique is calledshadow mapping.

The light illuminates from a single direction.

In reality, light emitters occupy some volume.

While the point-like or directional model is suit- able for small artificial lights or the sun, in most environments we encounter extended light sources.

Most prominently, the sky itself is a huge light source. In image based lighting, we place virtual objects in a computed or captured environment, which is also a lighting problem where the environment image is an extended light source.

Volumetric or area lights generate more elaborate shadows, as they might be only partly visible from a given point. These shadows are often calledsoft shadows, as they do not feature a sharp boundary between shadowed and lighted surfaces. Generally, point-sampling of extended light sources is required to render accurate shadows. However, with some simplifying assumptions for the light source, faster approximate methods may be obtained, generating perceptionally plausible soft shadows.

No indirect lighting. In reality, all illuminated objects reflect light, lighting other objects. While this indirect lighting effect constitutes a huge fraction of light we perceive, it tends to be low-frequency and less obvious, as most surfaces scatter the light diffusely. However, for highly specular, metallic, mirror-like or refractive materials, this does not apply. Indirect illumination may exhibit elaborate high frequency patterns called caustics, and of course the color we see on a mirror’s surface depends on the surrounding geometry. These issues require more sophisticated methods, based on the approach ofenvironment mapping, capturing incoming environment radiance in textures.

3.1. Shadow mapping

Shadows are important not only to make the image realistic, but also to allow humans to perceive depth

and distances. Shadows occur when an object called shadow caster occludes the light source from another object, calledshadow receiver, thus prevents the light source from illuminating the shadow receiver. In real time applications shadow casters and receivers are often distinguished, which excludes self shadowing effects. However, in real life all objects may act as both shadow receiver and shadow caster.

Point and directional light sources generate hard shadows having well defined boundaries of illumination discontinuities. However, realistic light sources have non zero area, resulting insoft shadows having continuous transition between the fully illuminated region and the occluded region, calledumbra. The transition is called thepenumbraregion. With hard shadows only the depth order can be perceived, but not the distance relations. Shadows should have real penumbra regions with physically accurate size and density to allow the observer to reconstruct the 3D scene.

The width and the density of the penumbra regions depend on the size of the area light source, on the distance between the light source and shadow caster object, and on the distance between the shadow caster and shadow receiver object.

Real-time shadow algorithms can be roughly cat- egorized as image spaceshadow map or object space shadow volumetechniques. Shadow volumes construct invisible faces to find out which points are in the shadow and require geometric processing. Exact geometric representation allows exact shadow boundaries, but the computation time grows with the geometric complexity, and partially transparent objects, such as billboards become problematic. Furthermore, these methods cannot cope with geometries modified during rendering, as happens when displacement mapping is applied.

Shadow maps, on the other hand, work with a depth image, which is a sampled version of the shadow casters. Since shadow map methods use only a captured image of the scene as seen from the light, they are independent of the geometric complexity, can conve- niently handle displacement mapped and transparent surfaces as well. However, their major drawback is that the shadowing information is in a discretized form, as a collection of shadow map pixels called lexels, thus sampling or aliasing artifacts are likely to occur.

For the sake of simplicity, we assume that the light sources are either directional or spot lights having a main illumination direction.Omnidirectional lights are not considered. Note that this is not a limitation since an omnidirectional light can be replaced by 6 spot lights radiating towards the six sides of a cube placed around the omnidirectional light source.

(12)

Shadow map algorithms use several coordinate systems which are briefly reviewed here:

World space: This is the arbitrary global frame of reference for specifying positions and orientations of virtual world objects, light sources and cameras.

Object models are specified using modeling coordinates. For an actual instance of a model, there is a transformation that moves object points from modeling space to world space. This is typically a homogeneous linear transformation, called the modeling transformation orWorld.

Eye’s camera space: In this space the eye position of the camera is in the origin, the viewing direction is thezaxis in DirectX and the−zdirection in OpenGL, and the vertical direction of the camera is they axis. Distances and angles are not distorted, lighting computations can be carried out identically to the world space. The transformation from modeling space to this space isWorldView.

Light’s camera space: In this space the light is in the origin, and the main light direction is thezaxis.

This space is similar to the eye’s camera space having replaced the roles of the light and the eye. The transformation from modeling space to this space isDepthWorldView, which must be set according to the light’s position.

Eye’s normalized device space: Here the eye is at an ideal point [0,0,1,0], thus the viewing rays get parallel. The visible part of the space is an axis aligned box of corners [−1,−1,0] and [1,1,1] in Cartesian coordinates. The transformation to this space is not an affine transformation, thus the fourth homogeneous coordinate is usually not equal to 1.

The transformation from modeling space to this space isWorldViewProj.

Light’s normalized device space: Here the light is at an ideal point [0,0,1,0], thus the lighting rays get parallel. The illuminated part of the space is an axis aligned box of corners [−1,−1,0] and [1,1,1]

in Cartesian coordinates. The transformation to this space is not an affine transformation, thus the fourth homogeneous coordinate is usually not equal to 1.

The transformation from modeling space to this space isDepthWorldViewProj. This matrix should be set according to light characteristics. For a directional light, the light rays are parallel without any non-affine transformation, so we only need an orthographic projection to scale the interesting, illuminated objects into the unit box. For point lights, a perspective projection matrix is needed, identical to that of perspective cameras. The field of view should be large enough to accommodate for the spread of the spotlight. Omnidirectional lights need to be sub- stituted by six 90^◦FOV angle lights.

Shadow mapping has two stages, shadow map gen-

eration when the camera is placed at the light, and image generation when the camera is at the eye position.

3.1.1. Shadow map generation

Shadow map generation is a regular rendering pass where the z-buffer should be enabled. The actual output is the z-buffer texture with the depth values. Al- though a color target buffer is generally required, color writes should be disabled to increase performance.

Transformation DepthWorldViewProj is set to transform points to the world space, then to the light- camera space, and finally to the light’s normalized device space. The shader executes a regular rendering phase. Note that the pixel shader color is meaningless since it is ignored by the hardware anyway.

//model to depth map’s screen space float4x4 DepthWorldViewProj;

void DepthVS(in Pos : POSITION, out hPos : POSITION) { hPos = mul(Pos, DepthWorldViewProj);

}

float4 DepthPS( ) : COLOR0 { return 0;

}

3.1.2. Rendering with the shadow map

In the second phase, the scene is rendered from the eye camera. Each visible point is transformed to the light space, then to texture space, and its depth value is compared to the stored depth value. The texture space transformation is responsible for mapping spa- tial coordinate range [−1,1] to [0,1] texture range, inverting theycoordinate, since the spatialycoordi- nates increase from bottom to top, while the texture coordinates increase from top to bottom. Furthermore, the transformation shifts theu, v texture address by half a texel, and possibly also adds a small bias to the zcoordinate to avoid self-shadowing.

The necessary transformation steps convert pointp in light’s normalized device space to projective texture spacet:

// center is 0.5 and add half texel offset = 0.5 + 0.5 / SHADOWMAP_SIZE;

t.x = 0.5*p.x + offset; //[-1,1]->[0,1]+halftex t.y = -0.5*p.y + offset; //[1,-1]->[0,1]+halftex t.z = p.z - bias;

t.w = p.w;

Here SHADOWMAP SIZEdenotes the resolution of the shadow map andbiasthezbias.

It is often a delicate issue to choose an appropri-

(13)

ate bias for a given scene. Values exceeding the di- mensions of geometric details will cause light leaks, non-shadowed surfaces closely behind shadow casters.

With a bias not large enough, z-fighting will cause interference-like shadow stripes on lighted surfaces.

Both atrifacts are extremely disturbing and unreal- istic. The issue might be more severe when the depth map is rendered with a large FOV angle, as depth distortion can be extreme. A convenient solution is the second depth value technique. Assuming all our shadow casters are non-intersecting, opaque manifold objects, the backfaces cast the same shadow as the object itself. By reversing the backface culling mecha- nism, we can render depth values into the depth map that do not coincide with any front face depth. The bias in the above code can be set to zero.

The texture space transformation can also be implemented as a matrix multiplication. The following TT, orTexScaleBiasmatrix must be appended to the transformation to light’s normalized device space:





0.5 0 0 0

0 −0.5 0 0

0 0 1 0

offset offset −bias 1



^,

producing DepthWorldViewProjTex. The vertex shader executes this transformation:

// To the normalized screen space of the eye camera float4x4 WorldViewProj;

// To the texture space of the depth map float4x4 DepthWorldViewProjTex;

void ShadowVS(

in float4 Pos : POSITION, // model space in float4 Color : COLOR0, // input color out float4 hPos : POSITION, // clip space out float4 depthPos : TEXCOORD0, // depth tex out float4 oColor : COLOR0) // output color {

oColor = Color; // copy color

// transform model-space vertex position // to light’s (depth map’s) texture space depthPos = mul(Pos, DepthWorldViewProjTex);

// transform model-space vertex position // to eye’s normalized device space:

}

In the pixel shader we check if the stored depth value is smaller than the given point’s depth. That is, the point is in shadow:

sampler2D ShadowMap; // depth map in texture float4 ShadowPS(

float4 depthPos : TEXCOORD0,// depth tex float4 Color : COLOR0 // input color ) : COLOR

{

// returns 0 or 1 as a comparison result // [if shadowMapSampler uses linear // interpolation, these 0/1 values are // interpolated, not depth])

float vis = tex2Dproj(ShadowMap, depthPos).r;

return vis * Color;

}

The code for the shadow map query is virtually identical to the code for projective textures. Projec- tive texturing tex2Dproj(sampler, p) divides p.x, p.y, p.z by p.w and looks up the texel addressed by(p.x/p.w, p.y/p.w). Shadow maps are like other projective textures, except that in projective texture lookups instead of returning a texture color, tex2Dprojreturns the boolean result of the comparison of p.z/p.w and the value stored in the texel.

To force the hardware to do this, the associated texture unit should be configured by the application for depth compare texturing; otherwise, no depth comparison is actually performed. In DirectX, this is done by creating the texture resource with the usage flag D3DUSAGE DEPTHSTENCIL. Note that tex2D will not work on this texture, onlytex2Dprojwill.

Figure 8:Hardware shadow mapping with 512×512 shadow map resolution at 210 FPS.

Classical shadow mapping requires just a single texture lookup in the pixel shader. This naive implementation has many well known problems, caused by storing only sampled information in the depth map. These problems include shadow acnes and aliasing.

(14)

3.2. Image based lighting

In many computer graphics applications it is desir- able to augment the virtual objects with high dynamic range images representing a real environment (sky, city, wood, etc.). In order to provide the illusion that the virtual objects are parts of the real scene, the illumination of the environment should be taken into account when the virtual objects are rendered.

The very same approach can also be used for purely virtual scenes if they can be decomposed to smaller dynamic objects and to a larger static or slowly changing part (such distinction is typical in games and virtual reality systems). In this case, the illumination reflected off the static part of the scene is computed from a reference point placed in the vicinity of the dynamic objects, and is stored in images. Then the static part of the scene is replaced by these images when dynamic objects are rendered.

In both cases, the illumination of virtual objects is defined by these images, also called environment maps. The illumination computation process is called environment mapping[BN76].

The radiance values representing environment illumination may differ by orders of magnitude, thus they cannot be mapped to the usual [0,255] range. Instead, the red, green, and blue colors of the pixels in these images should be stored as floating point values to cope with the high range. Floating point images are calledhigh dynamic range images.

Environment mapping assumes that the illumination stored in images comes from very (infinitely) far surfaces. It means that a ray hitting the environment becomes independent of the ray origin. In this case rays can be translated to the same reference point, and environment maps can be queried using only the direction of the ray.

Environment mapping has been originally proposed to render ideal mirrors in local illumination frameworks, then extended to approximate general secondary rays without expensive ray-tracing [Gre84, RTJ94, Wil01]. Environment mapping has also become a standard technique ofimage based light- ing [MH84, Deb98].

In order to compute the image of a virtual object under infinitely far environment illumination, we should evaluate thereflected radianceL^r due to the environment illumination at every visible point~xat view direction~ω(figure 9):

L^r(~x, ~ω) = Z

Ω⁰

L^env(~ω⁰)·fr(~ω⁰, ~x, ~ω)·cos⁺θ~x⁰·v(~x, ~ω⁰)dω⁰, (3)

x L(x, )ω ω

v=0

v=1 environment

map

occluder

r ω’

θ’

Figure 9:The concept of environment mapping and an environment map stored in a cube map.

where L^env(~ω⁰) is the radiance of the environment map at direction~ω⁰, andv(~x, ~ω⁰) is thevisibility factor checking whether no virtual object is seen from~xat direction~ω⁰(that is, the environment map can illumi- nate this point from the given direction). Note that the assumption that illumination arrives from very distant sources allowed the elimination of the posi- tional dependence from the incoming radiance and its replacement by direction dependent environment ra- dianceL^env(~ω⁰).

The illumination of the environment map on the virtual objects can be obtained by tracing rays from the points of the virtual object in the directions of the environment map, and checking whether or not occlusions occur [Deb98, KK03]. The computation of the visibility factor, that is the shadowing of objects, is rather time consuming. Thus most of the environment mapping algorithms simply ignore this factor and take into account the environment illumination everywhere and in all possible illumination directions.

A natural way of storing the direction dependent environment mapL^env(~ω⁰) as an angular mapped floating point texture. Direction~ω⁰ is expressed by spherical anglesθ⁰, φ⁰ where φ ∈ [0,2π] and θ⁰ ∈ [0, π/2]

in case hemispherical lighting andθ⁰ ∈ [0, π] in case of spherical lighting. Then texture coordinates [u, v]

are scaled from the unit interval to these ranges. For example, in case of spherical lighting

~ω⁰= (cos 2πu·sinπv, sin 2πu·sinπv, cosπv), whereu, v∈[0,1].

Another, more GPU friendly possibility is to param- eterize the directional space as sides of a cube centered at the origin and having edge size 2. A pointx, y, zon the cube corresponds to direction

~ω⁰= (x, y, z) px²+y²+z².

One of the three coordinates is either 1 or −1.

(15)

For example, the directions corresponding to the right (z= 1) face of the cube are

~ω⁰= (x, y,1)

px²+y²+ 1, x, y∈[−1,1].

Current GPUs have built in support to compute this formula and to obtain the stored value from one of the six textures of the sixcube mapfaces (texCUBE in HLSL).

3.2.1. Mirrored reflections and refractions Let us assume that there are no self occlusions, so v(~x, ~ω⁰) = 1. If the surface is an ideal mirror, then its BRDF allows the reflection just from a single direction, thus the rendering equation simplifies to:

L^r(~x, ~ω) = Z

Ω⁰

L^env(~ω⁰)·fr(~ω⁰, ~x, ~ω)·cos⁺θ_~_x⁰·v(~x, ~ω⁰)dω⁰

=L^env(R)~ ·F(N , ~~ R),

where N~ is the unit surface normal at ~x, R~ is the unit reflection direction of viewing direction~ωonto the surface normal, andF is theFresnel function. We can apply an approximation of the Fresnel function, which is similar to Schlick’s approximation [Sch93] in terms of computational cost, but can take into account not onlyrefraction index nbut alsoextinction coefficient k, which is essential for realistic metals [LSK05]:

F(N , ~~ R) =F⊥+ (1−F⊥)·(1−N~ ·R)~ ⁵, where

F⊥= (n−1)²+k²

(n+ 1)²+k² (4) is the Fresnel function (i.e. the probability that the photon is reflected) at perpendicular illumination.

Note that F⊥ is constant for a given material, thus this value can be computed on the CPU from the refraction index and extinction coefficient and passed to the GPU as a global variable.

Environment mapping approaches can be used to simulate not only reflected but also refracted rays, just the direction computation should be changed from the law of reflection to the Snellius-Descartes law of refraction, that is, thereflectoperation should be replaced by therefractoperation in the pixel shader. The in- trinsic functionrefractwill return a zero vector when total reflection should occur. We should note that tracing a refraction ray on a single level is a simplification since the light is refracted at least twice to go through a refractor. Here we discuss only this simplified case

(a method addressing multiple refractions is presented in [SKALP05]).

The amount of refracted light can be computed using weighting factor 1−FwhereF is the Fresnel function. However, this is true just at the point of refraction. While the light traverses inside the object, its intensity decreases exponentially according to the extinction coefficient. For metals, where the extinction coefficient is not negligible, the refracted component is completely eliminated (metals can never be transparent). For dielectric materials, on the other hand, we usually assume that the extinction coefficient is zero, thus that the light intensity remains constant inside the object. The following shader uses this assumption and computes both the reflected and refracted illumination of an infinitely distant environment map:

float Fp; // Fresnel at perpendicular dir.

float n; // index of refraction void EnvMapVS(

in float4 Pos : POSITION, // modeling space in float3 Norm : NORMAL, // modeling space out float4 hPos : POSITION, // clipping space out float3 cNorm : TEXCOORD0,// camera space out float3 cView : TEXCOORD1 // camera space ) {

cNorm = mul(Norm, WorldViewIT);

cView = -mul(Pos, WorldView);

}

samplerCUBE EnvMap; // environment map float4 EnvMapPS(

float3 Norm : TEXCOORD0, // camera space float3 View : TEXCOORD1 // camera space ) : COLOR {

float3 Norm = normalize( IN.Norm );

float3 View = normalize( IN.View );

float3 R = reflect(View, Norm);

float3 T = refract(View, Norm, 1/n);

// sampling from the cube map float4 refl = texCUBE(EnvMap, R);

float4 refr = texCUBE(EnvMap, T);

// approximation of the Fresnel Function float cos_theta = -dot(View ,Norm);

float F = Fp + pow(1-cos_theta, 5.0f) * (1-Fp);

return F * refl + (1-F) * refr;

}

3.2.2. Diffuse and glossy reflections without self-shadowing

Classical environment mapping can also be applied for both glossy and diffuse reflections. If we ignore self occlusions (v(~x, ~ω⁰) = 1), the usual trick is the convo- lution of the angular variation of the BRDF with the