One of the drawbacks of video-based augmented reality is that there is no way for elements of the real world to ever pass in front of virtual objects. This is due to the fundamental way video-based AR works: 3D graphics are drawn on top of a live video background. In most cases, the created illusion is sufficient to provide a compelling experience, but there are times where real and virtual objects must interact in a truly realistic fashion.

To make it appear that a real object is in front of a virtual object, we require depth information about the real object. If this depth information can be integrated into the renderer’s depth buffer prior to rendering the virtual object, then correct occlusion will occur. Depending on the nature of the real objects, it can be quite difficult to acquire the necessary depth information. For example, gathering real time depth information for hands would likely require a stereo-camera setup (e.g. Bumblebee ) , or a special camera (e.g. Z-Cam ).

For more simple objects, there is a straighforward solution. If we generate an accurate 3D model of the real object, and place it so that it perfectly aligns with the real object, we can render its depth information into the depth buffer. As we are only interested in the depth, we can ignore the colour information, resulting in a transparent structure, or “phantom object”.

In this tutorial, we will use a simple box to demonstrate this technique.