Tuesday, May 7, 2013

Covering the Sun with a finger

The oldest optimization in real-time graphics is to avoid rendering what you don't see. When you explain this to people who are not in the field, they usually shrug and say something in the line of "Duh, Sherlock".

It is easier said that done. Well, actually a big part of it is quite easy. The first trick you see in all graphics books is to render only what's inside the field of view. While the scene surrounds entirely the camera, the camera only captures a narrower slice of it. Anything outside this slice, which is usually 90 degrees along the horizontal, does not need to render. For a mostly horizontal scene, only 90 degrees out of 360 need to be rendered. This simple optimization reduces scene complexity four times. Another way to put it is, now you can have four times more detail without a performance drop. This technique is called Frustum Culling.

Frustum Culling is a no-brainer for small objects that are scattered around the scene. As scene complexity rises you must batch as many objects together as possible. The need for aggressive batching apparently has relaxed a bit recently, but there is no question that batching is still necessary. This goes against Frustum culling. What if there is an entire batch that is only partially in the field of view? You would still need to render it all. So, the more you batch, the more you can loose from the frustum culling optimization... unless your batches are somehow compatible with the scene slices you need to render. More to that later.

Even if you were able to perfectly cull all the information outside the field of view, there is usually a lot of polygons being rendered in a scene that never make it to the screen as pixels. This is because they become hidden later by a closer polygon.

Imagine a huge mountain with a valley behind it. If the mountain was not there you would see the valley. With the mountain in front of you, all the efforts rendering this valley go to waste. If we could somehow detect we can skip this valley, we would save a lot of rendering. We could have a much nicer mountain.

This technique is called Occlusion Culling. It is in principle a difficult problem, as the final rendering is the ultimate test of what is really visible and what not. Obviously some sort of approximation or model has to be used. A simpler model of the scene allows to estimate what portions of the final rendering will become hidden so it is safe to skip them.

And then again, if you had the occlusion problem perfectly solved, you would still have the issue with batching. It is not that different than with frustum culling. Maybe just a small clip of a large batch is visible, still that would require the entire batch to render... unless your batches are somehow compatible with the scene volumes being occluded.

I wondered that maybe there was a single approach that would help with all these issues at once. Yes, some sort of silver bullet. I set out to look for one, and did find something. Well maybe it is not a silver bullet, but it is quite shinny.

It is about the geometry clipmaps. I have covered them many times in the past. The idea is somewhat simple: if your world can be represented as an octree, you can compute any scene from this world as as series of concentric square rings. Each ring is made of cubic cells. The size of these cells grow exponentially as the rings are farther from the viewer.


The image above shows a projection of a clipmap in 2D.

You can see right away how this helps with batching and frustum culling. Each cell is an individual batch, which can contain a few thousand polygons. It is quite simple to determine whether a cell is inside the field of view. Also, cells go out of the field of view quite efficiently as their size is constrained by their very definition.

The clipmap turned to be very friendly for occlusion testing as well. Imagine you could identify some cells as occluders in one specific direction of the clipmap. It becomes fairly simple to test if more distant cells are occluded or not.

The following image shows how this principle works:


Here four cells have been identified as occluders. They show as vertical red lines. Thanks to them, we can safely assume all the cells painted in dark red can be discarded. These batches are never sent to the graphics card.

In my case I am performing the tests doing software rasterization. It is very fast because the actual cell geometry is not rendered, only cell aligned planes. So far a depth buffer of 64x64 provides sufficient resolution.

Not bad!

17 comments:

  1. Interesting, it's for sure a principle that's simple as stated, but I am a lot more curious about implementation...

    ...this should give away that I am a little back on the programming side since I'm not entirely sure I understand how you are determining the occlusion and thus traversing the octree.

    I assume that you have anyway some sort of "visible" flag that gets set/unset byt the test right?

    ReplyDelete
  2. Sooo... does this work in 3D? Without infinitely high walls occluding the terrain behind it?

    ReplyDelete
    Replies
    1. It works in 3D. But it is a bit harder to explain and implement. You need to reorder all the geometry into a 3D version of the above 2D grid.
      Then identify which of these 3D batches are visible.

      I know in openGL you can query whether or not a chunk of geometry is visible in the scene. By first rendering the scene to the depth buffer. Then querying each batch against this depth buffer.

      Delete
    2. You can make it with an Octree, which is a 3D version of the quadtree.

      Delete
    3. Oh, right! Didn't think of that... So this is another bit where his voxel structure helps a lot I take it =P.

      Delete
  3. The figure does not look like a clipmap, it seems to me that it is a quadtree.
    Which one do you use ? Because clipmaps means to update the structure at each move (or to compute the occlusion with an implicit definition of it) although a quadtree can be static (and just look at different depth depending of the distance)

    ReplyDelete
    Replies
    1. It is a clipmap extracted from an octree. The picture shows a quadtree cause they are best to show these ideas in 2D.

      Delete
  4. Thanks for a more technical post.

    You got some smart way to determine that a certain cell is an occluder? Since each cell can contain loads of triangles, as long as you don't find an early occlusion cell you would have to test a load of triangles.

    You precompute some simple occlusion mesh for each cell? Do you check on triangle level at closest LOD level and a simplified occlusion mesh in those farther away or just only care about cells that are completely solid from an octree standpoint.

    I have been thinking about pre-computing and saving information if a cell can be considered solid in the x,y, and z direction and saving these occlusion planes for each cell.

    Any hints as how you determine if a cell is an occluder?

    ReplyDelete
    Replies
    1. The idea is to have a very simplified occlusion model for the cell. I find the largest axis aligned box that is completely inscribed into the solid geometry and use it as an occluder. The planes you describe also do well.

      From there you just render this occluding geometry into a depth buffer. Then you can test each cell against the buffer and see if any point is likely to show.

      Delete
  5. This was that occlusion tease we got to see in the last video right? It's fun to read a detailed description of your approach. It sound like you have found a good balance. Its fascinating to see how many 2D/3D problems can be addressed with quad/octree type storage & addressing. It was a good visual aid during you explanation too, thanks!

    It's got to be fun to refine things like this, since every enhancement means more polys for use elsewhere... as long as it doesn't tax the engine performance in other computational ways.

    Good stuff and thanks for sharing!

    ReplyDelete
  6. I think your clipmap methodology + mega textures would probably be a match made in heaven. love to hear your opinion on the feasibility off it.

    ReplyDelete
    Replies
    1. I tried a simple system where each batch would get a texture space assigned to it based on its polygon area. I ran some rather complex UV unwrapping and rendered the batch into a smaller page of a large texture. The paging was managed using a Quadtree.

      I noticed this was taking a lot of texture memory and the results were not as good as shading the geometry after projected.

      When you use artists, virtual texturing makes a lot of sense. You can have beautiful hand-painted detail everywhere. If you are synthesizing textures on the other hand, it does not make much sense. It is better to synthesize only the pixels you see.

      Delete
  7. If I understand this correctly, due to the perspective of the camera, each succeeding cell will cover the same amount of rendered pixels/screen real estate, correct? What sort of density are you using? On average how many pixels per cell? How long will it take before our hardware will be able to split our viewing screen into the two million odd cells per distance layer necessary for a one cell per pixel ratio? (at this point, I'm guessing that finer cell resolution would simply be wasted processing power?)

    ReplyDelete
  8. The big problem I see with this is that indoor scenes tend to not fit into grid cells very effectively. Say you were inside a room with thin walls - unless the grid was especially fine-grained, the walls wouldn't be thick enough to fill an occlusion cell, even though they probably occlude most of the surrounding area. You could throw in a hack so that the walls were always 1 grid cell thick, but that would still allow a lot of overdraw as both sides of the walls would be rendered.

    I remember quite a few games in the DX7/DX8 era, such as Morrowind, would separate indoor/outdoor regions so that they could change their occlusion strategy. Quad/Octree-based for outdoor scenes and a BSP/portal based strategy for indoors.

    ReplyDelete
    Replies
    1. Right, but you do not need to fill a cell entirely for it to become an occluder. For instance imagine you have a thin wall that crosses two opposite sides of a cell. Even if the thin wall is crossing at an angle, you can be certain the cell occludes anything behind.

      A typical case is floors. Even if the floor is thin, this method identifies the cell is hiding any other cells below it.

      Delete
    2. What if you viewed that floor at an angle? Wouldn't it then not be an occluder? Or do you store occlusion information from multiple viewing angles?

      Delete
  9. I've been wondering for a while now, would it be possible in your engine, or the unity addon, to create road systems procedurally?

    ReplyDelete