Real-Time Massive Model Rendering

 

Jean-Daniel Nahmias

Supervised by Anthony Steed

 

Computer Science Department, University College London

Gower Street, London WC1E 6BT, UK

j.nahmias@cs.ucl.ac.uk

 

 

Master’s Thesis

Msc Vision, Imaging, & Virtual Environments

September 2002

 

 

Real-Time Massive Model Rendering

 

Jean-Daniel Nahmias

Supervised by Anthony Steed

 

Computer Science Department, University College London

Gower Street, London WC1E 6BT, UK

j.nahmias@cs.ucl.ac.uk

 

 

Master’s Thesis

Msc Vision, Imaging, & Virtual Environments

September 2002

 

 

Abstract

     This report discussed various techniques to speed up rendering of large and complex virtual environments. It explores algorithms such as Hierarchical Spatial Subdivision as well as Occlusion Culling. The report analyses these algorithms as well as investigates ways of implementing them using recent consumer level Graphics Processing Units.

 

 

Disclaimer

 

This report is submitted as part requirement for the MSc Degree in Vision Imaging and Virtual Environments at University College London. It is substantially the result of my own work except where explicitly indicated in the text. The report may be freely copied and distributed provided the source is explicitly acknowledged.

 

 

 


 

1 Introduction_ 5

1 Introduction_ 5

1.1 Organisation_ 7

2 Background Work_ 8

2.1 Visibility Determination_ 9

2.1.1 View Frustum Culling_ 12

2.1.2 Spatial Coherence 13

2.1.3 Temporal Coherence 17

2.1.4 Occlusion Culling_ 17

2.2 Level of Detail 20

2.2.1 Progressive Meshes [8] 22

2.3 Impostors 24

2.3.1 Per-Object Image Warping with Layered Impostors [5] 24

Figure 13 Example of remaining gap artefacts 26

3 Development Life Cycles 27

3.1 Requirements Analysis 28

3.1.1 Application Feature Set 28

3.1.2 Rendering Feature Set 28

4 Design_ 30

4.1 Code Base Design_ 31

4.1.1 Scene Graph_ 32

4.1.2 3DS Max Plugin_ 36

4.1.3 Application Code 39

4.2 Algorithmic Design_ 40

4.2.1 1st Pipeline 41

4.2.2 2nd Pipeline 42

4.2.3 3rd Pipeline 42

4.2.4 4th Pipeline 43

4.2.5 5th Pipeline 45

5 Implementation_ 47

5.1 Choice of Language_ 47

5.2 OpenGL Issues & Getting the most out of the GPU_ 48

5.3 Oct-Tree Implementation Details 49

5.4 Other details addressed after testing_ 50

6 Testing and Evaluation_ 52

6.1 Testing_ 52

6.2 Evaluation_ 54

6.2.1 Windows Timing Functions 54

6.2.2 Evaluating the Rendering Pipeline 57

6.2.3 Finding Bottlenecks 58

7 Results 60

7.1 Benchmark System_ 60

7.2 Tuning Pipeline 3-5 by Balancing Oct-Tree_ 61

7.3 Tuning Pipeline 5 with Temporal Coherence_ 64

7.4 Rendering Pipeline Comparisons 66

8 Conclusion and Further Work_ 68

References 72

Appendix_ 76

User Requirements 76

User Manual 76

Code_ 76


 1 Introduction

 

There are various fields including Computer Aided Design and Scientific Visualization that require the interactive display of large and complex 3D virtual environments. There is a very strong motivation for the performance of the display of these systems to be high (i.e. Real-Time). The performance of display systems is usually measured in frames per second. This is the total number of images that are sent to the display in one second and is a good indicator of the performance levels of a virtual reality system. This is not the refresh rate of a display system. Various studies have been carried out to find what the lowest acceptable performance should be in order for users not to be impeded. Different studies come to different conclusions, certain performance levels that are considered reasonable for certain applications are considered insufficient for others. However most developers of such systems would agree that in general 30 fps per eye is considered adequate and 60 fps is considered excellent [45]. When these frame rates are not achieved the results can vary from a user wearing a Head Mounted Display feeling the sensation of motion sickness to the user not achieving the desired productivity from the system and experiencing frustration. Another motivation for theses frame rates come from interaction. The simplest form of interaction would be an environment that the user can navigate (i.e. walkthrough or fly through). For this interaction to be continuous the system lag must be below a certain threshold [10]. This allows the user to interact with the system without having to wait for a confirmation of his/her actions. One factor determining lag is frame rate the others are input sampling rate, output sampling rate, and software overhead. Unfortunately computer systems cannot display infinitely complex environment infinitely quickly. Therefore a compromise has to be made between visual quality and rendering speed. A multitude of solutions have been utilized to solve this problem. One example of a very naïve solution that has been used time and time again in the domain of computer games, is for the renderer to simply draw as much as it can in its allocated time slot starting from the objects closest to the user and then ignoring the rest. Developers using this technique try to mask it by using an effect called fog. This gives the user a very short viewing distance and is very noticeable. If user was to try and navigate a vast virtual environment it would be very difficult for the user to make any kind of navigational decisions when he/she could not see the target. This shortfall is just unacceptable today considering the current state of technology in this field.

The last few years has seen an incredible development of high performance consumer level Graphics Processing Units. The development of these GPUs has exceeded Moore’s Law.  These GPUs as they are currently referred to, also known as a graphics card are now capable of theoretically drawing millions of triangles per second. However in reality there are other factors that come into play and prevent one from achieving the GPUs theoretical limit. Nevertheless they are still very capable ASICs that have turned what would seem impossible just a few years ago into reality today. Even though these cards are continuously being updated and improved there is still the need to optimize rendering algorithms to achieve even greater performance. Users still wish to be able to visualize 3D environments that are more complex and have greater levels of detail. However there is still a lot of room to improve rendering performance and by doing so enabling greater levels of detailed and more complex 3D environments to be displayed in real-time.


1.1 Organisation

 

This report will examine ways of improving rendering performance of general 3D environments. It will do so by looking at techniques for high level optimization as well as low level optimization. The former was achieved with the use of carefully chosen algorithms while the later was obtained by careful use of hardware as well as implementation details. Section two will discuss high level optimization; this includes Visibility Determination, Level of Detail, Impostors. Optimizing is an iterative procedure and therefore the development cycle of the software produced for this report was also iterative. Section three will describe the various cycles of development. The design decisions that where made along the way will be discussed in section four. The design section is partitioned in two; firstly the Code Base Design is described and secondly the Algorithmic Design is explained. Section five discusses some of the various implementation details. This sectioned is followed by an overview of the Testing procedures as well as the techniques that were used to Evaluate performance. Section seven presents the results of the different rendering algorithms as well as some of the parameter tuning. The report then concludes with suggestions for further improvement in section eight.


2 Background Work

    

This section of the report will focus on some of the current state of the art algorithms used to optimize visibility determination, level of detail and impostors.

    

Visibility Determination: The fastest polygons are the ones that are not displayed. These are the polygons that are not viewed by the user. These polygons are the ones that lie outside the viewing frustum or that are occluded by other objects. Extensive research has been carried out in order to find algorithms that can quickly discard these polygons and therefore speed up rendering by the fact that less triangles or polygons need to be drawn.

 

     Level of Detail: Another tool that can be used to improve rendering performance is level of detail. Level of Detail refers to simplifying complex 3d geometric objects. There are various algorithms that can achieve this. However once an object has been simplified one looses the visual quality. This is why these algorithms are generally applied to objects that are positioned far away from the user. When objects are at a great distance from the  viewer, the difference between the original object and its simplified counterpart is greatly diminished by the fact that when these objects are actually displayed they only occupy a small number of pixels and therefore the lower detail version of the object is indistinguishable from its high detail counterpart.

 

     Impostors: Impostors are used to fake geometric detail. There are various forms of impostors. One common impostor is a billboard; this is a 2D representation of a 3D object that is then placed into the scene as a textured quad. This quad is then continuously rotated as the viewer navigates in order to always be facing parallel to the viewer. This is one type of impostor that falls into a general category of impostors referred to as textured impostors. There are also other forms of impostors such as point based impostors. Triangles located far from the viewer may have their projection onto the screen smaller then a pixel. If this is the case it is then redundant to represent the geometric surface with the use of a triangle, one can simply use a point based representation for the surface. This may not only speed the rendering performance but may also improve the visual quality due to aliasing of triangles.

2.1 Visibility Determination

 

     Visibility determination has been an active research topic that has produced many algorithms over the past decades [30]. It is the process of determining which objects are visible and which are not. The 3D virtual environment objects presented in this report are represented with the use of triangles therefore this report will focus on visibility algorithms that determine which sets of triangles are visible at any given point in time. They are visible because they are inside the cameras View Frustum and are not occluded by other objects. One very simple solution to the visibility problem is Hidden Surface Removal Z-buffer [23]. The Z-buffer is a buffer that stores the depth values of the currently projected triangles. Before a triangle is rasterized it is project onto the Z-buffer and if any of its projected pixels have a greater depth value than current values, those pixels are not overwritten and subsequently not drawn. The Z-buffer offers a simple way of determining which pixels of which triangles should be drawn at the cost of a little amount of memory. However this operation is performed almost at the very end of the rendering pipeline. This entails that should a triangle not be visible it would still have to be sent down most of the rendering pipeline and therefore wasting resources and impacting performance. This is why the Z-buffer is often used in conjunction with various other algorithms. Computing the exact visibility can be very expensive, more so than sending the extra triangles through the rendering pipeline therefore most algorithms computed the potentially visible set of triangles (PVS). These usually fall into three categories.

    


     Exact Visibility: This is the most precise form of visibility computation; the exact visibility means that all visible triangles are sent down the pipeline clipped if necessary and that all non-visible triangles are culled. This type of visibility algorithm is usually very slow, and requires a vast amount of pre-processing if used for real-time applications.

     Conservative Visibility: These algorithms determine a superset set of the visible triangles. They remove most but not all of the non-visible triangles. The Z-buffer can then be used to remove the remaining triangles. These algorithms are not as precise as exact visibility algorithms but usually perform better in terms of speed.

     Approximate Visibility: Approximate visibility algorithms are not exact or conservative and can therefore in certain situations actually remove triangles that would in fact be visible. If these algorithms are well designed, the errors that they produce are in actual fact barely noticeable. The performance gain of these algorithms can sometimes more then justify their errors.

 

Classification of Visibility Culling Techniques

Cohen-Or, Chrysanthou, and Silva [30] use a set of features to classify various visibility algorithms, these include:

 

·        Conservative vs. Approximate:

Most visibility algorithms are conservative and are used in combinations to produce exact visibility. Techniques such as the Z-buffer are often referred to as hidden surface removal algorithms and are used as a back end for other conservative or approximate culling algorithms.

 

·        Point vs. Region:

Certain techniques calculate the visibility for a certain view point while others perform visibility calculations for certain regions of space. Point based algorithms tend to be more flexible than region based algorithm at the cost of performance or pre-processing.

 

·        Pre-computed vs. Online:

Online algorithms actually perform the visibility calculations during each rendering loop they may however use certain pre-computed data structures while pre-computed algorithms compute all the visibility off line. One example of pre-computed visibility is the algorithm use the Quake engine [46]. These algorithms usually require a great amount of pre-processing and usually impose great constraints on the user’s navigation and the 3D environment.

 

·        Image space vs. Object space:

This refers to where the visibility calculations are performed. Certain algorithms perform their calculation on the 2D projections of a 3D environment while others perform their calculations on the 3D objects before they are projected into 2D.

 

·        Software vs. Hardware:

Certain techniques are implemented on specialized hardware giving them a considerable speed increase. Usually a less efficient algorithms implemented in hardware will outperform a more efficient algorithm implemented on a general purpose processing unit.

 

·        Dynamic vs. Static scenes:

Static scenes are environments that can be navigated by the user but have the restriction of their subset of objects not being able to move in relation to each other. This constraint allows the use of spatial coherence to be taken advantage of with certain amount of pre-processing. Dynamic scenes have no such restrictions.

 

·        Individual vs. Fused occluders:

Both these terms are used when discussing occlusion culling. Occlusion culling is the process of determining which objects (occludee) are obscured by which other objects (occluder). However certain objects that are not occluders individually can become occluders when combined. Occluder fusion is a technique that takes advantage of this fact.

 

2.1.1 View Frustum Culling

    

     Visibility algorithms are usually designed around one fundamental principle. That principle is to remove as much as possible as soon as possible. This reduces the amount of unnecessary processing of triangles along the rendering pipeline. One good method of achieving this is with the use of View Frustum Culling. The View Frustum is defined by the virtual camera. It represents a volume or region of space that the camera can see. It is therefore a region based visibility algorithm.

 

 

Figure 1 View Frustum

 

Sutherland and Hodgman [23] developed a 2D algorithm that is very efficient at clipping polygons against clipping boundaries. This algorithm can trivially be extended to 3D, for more detail descriptions please refer to [23].  It is still unfeasible to clip every triangle against the view frustum no matter how efficient the algorithm used might be with scene containing millions of triangles. However this limitation can be easily overcome using spatial coherence. The following section will describe various hierarchical data structure enabling one to exploit spatial coherence to speedup view frustum culling. However all of these techniques are based more or less on the same principles. They work by testing regions or volumes; if these volumes are discarded by the clipping algorithm then all objects contained by these volumes can also be discarded. If the volume being tested is fully contained by the view frustum then all of its contents can be included, however if the volume intersects the view frustum then further testing is required to determine exactly which objects or triangles are to be clipped. This last case is the worst case. Since most of the scene is usually clipped or visible the penalty for this worst case is greatly out weighed by its benefits. Spatial coherence usually reduces the number of clipping test. Most of the data structures that will be presented in the following section where developed to speedup ray-tracing but are being updated for VFC. One can find some good reviews of these data structures in the area of ray-tracing research such as [41].

2.1.2 Spatial Coherence

    

     The following data structures fall into one of two categories; the top-down and the bottom-up approach. The former is based on selecting a region of space and recursively subdividing it, while the later works by starting at the bottom of the scene hierarchy and recursively grouping objects. This section will present some of techniques and data structures used to exploit spatial coherence starting with the simplest.

 

 

· Bounding Volumes

These are the smallest geometric volumes encapsulating the object, usually spheres or boxes. Because these are simple geometry shapes they are very easy to test against the View Frustum, one can therefore quickly discard objects that are outside the View Frustum without the need to test each triangle of the contained object. Spheres have the advantage of quick intersection test while boxes have the advantage of fitting the geometric objects better (i.e. tighter fit). Therefore reducing the number of redundent intersection tests. Boxes come in two main varieties Axis Aligned Bounding Box and Object Oriented Bounding Box [22]. Again there is a trade off between cost of intersection and tighter fit. Generally OOBB are able to fit objects more tightly than AABB. OOBB can be calculated by using Principal Component Analysis (PCA), this is a statistical tool that involves building a covariance matrix and the finding its eigen solution. From the eigen vectors can then be used to find the orientation of the OOBB. The size of the box can then be inferred by finding the largest distance from the centre of the object to any vertex along each of the eigen vectors for more information on PCA see [18].

 

Figure 2 Bounding Boxes

 

·        Hierarchical Bounding Volumes

This again is an extension to bounding volumes. This is a bottom-up approach where groups of objects can be formed recursively. Objects closer together can be grouped and encapsulated by another bounding volume and so on. This can yield advantages in the sense that whole groups of objects can quickly be discarded. However this comes at a cost, if a group intersects one of the view frustum’s clipping planes more intersection test are required. The deeper the hierarchy the greater the cost of the worst case. Balancing these data structures can be difficult [24].

 

Figure 3 Bounding Sphere Hierarchy

 

· BSP Trees

Binary Space Partitioning Trees where introduced by Fuchs et. al. [47]. The BSP tree for a scene is constructed recursively; this is done by selecting a partitioning plane against which the scene’s triangles are tested. This splits the triangles into two groups, those that are outside the plane and those are inside. If a polygon intersects the plane it is split into two and each part is allocated to its correct section of the tree. This process is then repeated for each subsection recursively. Usually the partitioning planes contain a polygon from the scene for BSP tree construction. The end result is a tree where each node represents a partitioning plane and the leaves represent polygons. This data structure can be traversed in a front to back order. Starting at the root node, the tree is traversed recursively starting with the polygons at the far side of the node with respect to the centre of projection than the root node, then the near side. This is based on the idea that objects on the same side of the plane as the centre of projection or view point cannot be obscured by polygons on the far side. BSP tree can yield significant speed up of static geometry such architectural walkthroughs [13].

 

 

 

·        Oct-Trees

The Oct-tree is the 3D equivalent of a Quad-tree [24]. It is constructed by taking an axis aligned cubic volume that encompasses the whole virtual environment. This cube is recursively subdivided into eight identically sized axis aligned cubes. If any of the children cubes satisfy certain conditions they are then in tern subdivided and so on. These conditions are met if a cube contain a certain number of polygons and the cube itself has not reached a minimum size threshold or the tree has reached a maximum depth. If any polygon intersect the axis aligned cube it can either be split or allocated to both children. Once the tree is constructed the view frustum culling is recursively performed starting with the root node. If any node intersects the view frustum then each of its children nodes is tested.

Figure 5 Oct-Tree

·        Kd-Trees

Kd-trees, Oct-trees and BSP-trees can be seen as generalizations or specializations of each other. A Kd-tree can be view as a BSP-tree where each partitioning plane is axis aligned. It is also recursively constructed by selecting each axis in turn and then selecting a partitioning plane along that axis to subdivide the space. In practice Kd-trees are look very similar to Oct-trees but with much more optimized bounding volumes.


2.1.3 Temporal Coherence

 

     As a user navigates a virtual environment in real-time, most of the time the difference between one frame and the next can be marginal. One can exploit this temporal coherence to achieve culling optimizations. This is performed by assuming that if an object is visible in one frame it is most likely to be visible in the next. Therefore each object that fails to be culled by the view frustum can be assumed to be visible for the next n frames, where is a threshold. However we cannot make the same assumption about objects that did get culled, therefore all objects that were culled in the previous frame must also be tested in the current frame. The worst possible case for this assumption is if the user rotates 180 degrees; if this is the case all of the previously objects are no longer visible but would still be sent down the rendering pipeline. One could however vary the n threshold depending on the user’s movement. One could cancel the previously made assumption if the user moves beyond a certain threshold.

 

2.1.4 Occlusion Culling

 

     In the previous section we examined how to take advantage of View Frustum Culling and how to improve it by taking advantage of spatial and temporal coherence. This section will examine another area of visibility determination based on occlusion culling. This section will examine one of the more recent occlusion culling algorithms developed by Zhang et. al. [4] known as HOM (Hierarchical Occlusion Maps). This algorithm makes no assumptions of the scene and works very well for densely occluded scenes. The algorithm can be broken down into two sections;

 

     Construction of the HOMs: this part of the algorithm involves View Frustum Culling of the bounding volume hierarchy of the occluder database. This is followed by occluder selection. The selected occluders are then rendered into an image buffer; this forms the highest resolution of the occlusion map. From the image buffer a depth estimation map is constructed. The occluders are rendered without any lighting or texturing to speedup the process. The depth buffer is then recursively filtered in order to produce an image hierarchy. This process can be hardware accelerated with the use of mipmap filtering.

 

Figure 6 Rendering Pipeline of HOM algorithm

 

     Visibility Culling with the use of the HOMs: Once the HOMs are constructed the algorithm starts View Frustum Culling the scene database. It then takes the resulting objects and projects their bounding volumes. The algorithm then performs an overlap test between the project bounding box of a potential occludee and the hierarchical occlusion maps. If a projection completely falls within an opaque region of the HOMs a depth comparison is then performed in order to determine whether the potential occludee is actually occluded.

Figure 7 Example of Hierarchical Occlusion Map

 

For a more detailed explanation of this algorithm please refer back to [4]. This algorithm has some interesting properties. It is an image space algorithm that implicitly performs occluder fusion with the HOMs. The overlap test is optimized by the hierarchy and it also supports a conservative early termination of the overlap test. Again many of the previously mentioned techniques to speedup View Frustum Culling can be applied here. Temporal coherence can also be applied for the occluder selection. HOM was developed as an alternative to Hierarchical Z-buffers for a detailed comparison between the two algorithms please refer back to [4] and for an overview of Hierarchical Z-buffers please refer to [12].

 

Figure 8 Occluder Fusion

2.2 Level of Detail

 

     When rendering incredibly large datasets even the best visibility determination algorithms may not suffice. In these situations one must look for alternatives. One such alternative is Level of Detail algorithms. These algorithms purpose is to simplify 3D meshes Oliveira et. al. [44]. Level of Detail algorithms work by taking the highest detail mesh and reducing the number of polygons used to represent that mesh in such a way as to preserve the topology of the mesh. The topology of the mesh defines its general appearance. Good algorithms obtain this by using a combination of error metrics. If the topology of the meshes is preserved then lower detail mesh will appear almost indistinguishable from its higher detail equivalent when projected from a distance. Like visibility algorithms, Level of Detail algorithms can fall into various categories. Eriskon et. al [35] classified these algorithms into three basic categories.

 

·        Geometry removal

This refers to simplifying a mesh by removing vertices or collapsing edges from it representation.

·        Sampling

Here a lower level of detail mesh is reconstructed from the sampled data of the higher detail mesh.

·        Adaptive subdivision

The different level of detail meshes are constructed by starting with a rough model that is recursively refined by adding detail locally where needed.

 

When examining various Level of Detail algorithms one can deduce that there are also other categories that these algorithms can fall into, these include:

·        Continuous vs. Discreet

The most noticeable artefacts created by level of detail algorithms are usually exhibited when swapping the representation of an object from one level of detail to another; this can create popping or gaps. When approach to minimizing this effect involves blending from one level of detail to another, continuous algorithms such as progressive meshes [8] avoid this problem. When we refer to continuous Level of Detail with regards to a polygonal representation we generally mean that any level of detail can be obtained by sampling at any point in the space domain of the algorithm. Therefore an infinite number of meshes can be created between any two distinct levels of detail. In practice however we are limited to the displays that are discreet approximations.

·        View Dependent vs. View Independent

Certain level of detail algorithms produce different results based of the viewing position while others don’t. View dependent algorithms can produce more pleasing results at the cost of extra resources.

·        Pre-processed vs. Online

Some algorithms such as progressive meshes [8] pre-calculate all the possible mesh simplifications while others perform their calculations online.

·        Screen-Error Bounded vs. Screen-Error Unbounded

Screen errors are the projected errors. This is the difference between the projected simplified mesh and the projected high detail mesh. This error metric is much more accurate with respect to the perceived result. It also has the advantage of allowing users to select their own preference with respect to the visual quality vs. performance trade off. It can also be used to reduce temporal coherence aliasing artefacts.

·        Global vs. Local

Global algorithms apply a change globally across the whole mesh while local algorithms might apply changes to local features.

 

One of the more elegant solutions to the Level of Detail problem is that of Hoppe et. al. [8]. His algorithm referred to as progressive meshes is continuous. This report will give a brief overview of his algorithm, for a more detailed explanation, please refer to his paper [8].    

 

2.2.1 Progressive Meshes [8]

 

     Progressive Meshes address many problems. However this report is only concerned with the aspects that speedup rendering it will therefore ignore certain details of Hoppe et. al. [8] paper. Progressive meshes work on the premise that one can define a sequence of meshes differing in level of detail with the careful use of a sequence of edge collapses. However the sequence of edge collapses must carefully be chosen since it determines the quality of the approximating meshes. The edge collapse transformation is also invertible by a vertex split transformation. Therefore one can represent an arbitrary mesh at any given level of detail by a coarse mesh and a sequence of vertex split transformations.

 

Figure 9 Example of edge collapse

 

The progressive mesh algorithm has two very nice features. The first is Geomorphs. This feature enables a smooth transition between two meshes of arbitrary level of detail with the use of a linear interpolation see [8]. The other nice feature is selective refinement. This enables certain local features of the mesh to be refined. Progressive meshes can be constructed using various algorithms. These different algorithms vary between speed and accuracy. One can use different metrics to determine the accuracy, Schroeder et al. [48] uses a “distance to plane” metric while Hoppe et al. [8] uses the Edist metric. Different metrics can be used to preserver different aspects of the mesh such as preserving geometry, scalar attributes and discontinuity curves.

 

Figure 10 Mesh at different levels of detail


2.3 Impostors

 

     Impostor algorithms exploit image data from recently rendered frames. This data is reused in subsequent frames to avoid the expensive re-rendering of the complete 3D scene. This effectively fakes geometric surfaces with the use of an approximate representation. One of the most common representations used to achieve the desired result is an opaque image textured onto a transparent polygon. These types of impostors are known as textured impostors. There also other types such as Point based impostors [7] or line based impostors. There are many different impostor algorithms and like most algorithms examined by this report they can be classified into various categories.

 

·        View dependent vs. View independent

View independent impostors can be used from any given view point while view dependent impostors need to be updated for different view points.

·        Dynamically generated vs. Pre-calculated

Dynamically generated impostors are calculated on the fly (as they are needed) while pre–calculated impostors as their name implies can be computed off line.

 

2.3.1 Per-Object Image Warping with Layered Impostors [5]

 

     Single Layer view dependent impostors have the disadvantage of not containing any depth information. This makes them very vulnerable to errors. They therefore cannot be used for points of view that vary to any significant extent from where the image was taken. To avoid serious visual degradation one must either dynamically update them very often or one must pre-calculate a great number of impostors from various view points and swap between them frequently. Another alternative to these is to use Multi-Layered impostors. As opposed to a single layer impostor that contains a single transparent polygon onto which an image is mapped to, multi-layered impostors consist of many of these transparent polygons. Every layer represents a slice of the object at varying distances from the user. Using a greater number of layers produces a better approximation. 

 

Figure 11 Example of Multi-Layered Impostors

As opposed to single layered impostors, multi layered impostors are stored using RGBz where z is the depth information. Most of today’s graphics hardware support RGBA where A is the alpha component, this alpha component can be used to store the depth information. When these impostors are rendered multiple semi-transparent polygons are rendered as a pyramidal stack with the view point at the apex. For each layer only the areas where the stored z component closely match the distance of the view point to the layer, are drawn. To avoid cracks slightly overlapping depth intervals are drawn onto every layer. See figure 12.

Figure 12 Example of Overlap in Multi-Layered Impostors

The advantage that multi-layered impostors have over single layered ones is accuracy. Multi-layered impostors have a much longer life span due to the fact that view point can move greater distances without producing as much error as single layered impostors. They do however have the draw back of taking longer to generate. Another disadvantage of this algorithm is that it produces aliasing artefacts in the form of gaps for objects that are occluding. Please refer to [5] for a more detailed description of this algorithm.

 

Figure 13 Example of remaining gap artefacts


3 Development Life Cycles

 

Figure 14

 

     This project focuses on optimizing real-time rendering with the use of various techniques to achieve this goal. However optimization is an iterative process. One must optimize the appropriate areas of the rendering pipeline in order to achieve the maximum possible benefits. Please refer to section 4.2 to see the different rendering pipelines that were created during the development cycles. The optimization was initially performed at a high level and then progressively refined at lower levels. It would be very costly in terms of time resource to focus on optimizing a certain algorithm’s implementation and then to come to the conclusion that another algorithm altogether would achieve far superior results. Due to the nature of this problem the development of this software went through several cycles or iterations. Each cycle can be broken down into the same number of steps.

 

·        Requirements Analysis

·        Design

·        Implementation

·        Testing

·        Evaluation

 

The following sections of this report will focus on each one of these steps and their various iterations. Each one of these steps produces information that is passed onto the following step. Once the last step was reached one iteration is completed and the results were then used to start the next iteration until satisfactory results were achieved. Note that initially the first iterations had a tendency to focus more on the design, these where usually high level optimizations while the last iterations that where low level in nature focused more on implementation. This type of Life Cycle is often referred to by software engineers as the prototype approach to developing software. This is often used when the initial requirements are not clearly defined. In this case the high level requirement is simple, “To render large and complex 3D environments in real time”. However the more low level requirements are not initially defined. For research oriented projects prototyping approaches are often employed and this suited the needs of this project perfectly.

 

3.1 Requirements Analysis

 

     Before embarking on any form of design a feature set of the application had to be established. One subset of the application feature set is the rendering feature set. Because the goal of this project was to explore how to improve rendering performance of geometrically complex environments the rendering feature set was kept to an adequate minimum.

 

3.1.1 Application Feature Set

 

·        Rendering Large and Complex 3D Environments

·        Navigation of these Environments

·        Benchmarking Capabilities

 

3.1.2 Rendering Feature Set

 

·        Simple Lighting (Point & Directional Lights)

·        Material Properties

·        Texturing

·        Geometric Objects

 

This was the initial rendering feature set, as the development progressed this feature set was expanded to encompass all the acceleration techniques.


4 Design

 

     This section of the report will not only look at the design of the software but will study some of the design decisions affecting implementations that were made. Due to the nature of this work not all the design could be completed initially and some of it changed during the development. Good design at the beginning of a project can save a lot of unnecessary work later on. The design can be broken down into two sections; Algorithmic Design and Code Base Design. Most of the code base design could be done at the outset while the Algorithmic design went through a few iterations and changed along the way. Certain considerations that could potentially influence the design had to be taken into account during this phase of development. These had a strong impact on the choice made during this project. One of these was the target platform. The target platform for this project was consumer level hardware not a high end special purpose SGI Onyx. To be more precise the target platform was a PC with a good consumer level openGL capable 3D GPU (GeForce3 or higher), a Pentium IV or equivalent CPU and 512MB of RAM. It would be unrealistic to expect to achieve anything near real time performance when rendering models containing at least 1 million+ polygons without the use of hardware acceleration. For this project this hardware acceleration came in the form of a GeForce3. However we are still waiting for a fully programmable GPU and have to rely on the currently available features provided by the current crop of GPUs. This imposes certain limitations in terms of what algorithms can be hardware accelerated. With this in mind certain algorithms were chosen over others on the merit that they could be hardware accelerated or that they could leverage the hardware acceleration. Another underlying thought that influenced certain design decisions was portability. This software is not portable to other platforms, however it was designed and implemented with certain care in order to potentially ease the transition to another platform.

 

 

4.1 Code Base Design

 

     It was important to develop a strong code base that could be built upon and easily extended. The code base was also designed with simplicity in mind. This code base included the data structures of the application, certain basic tools as well as the main application functionality. An Object Oriented approach was used to design the code base. The core of the code base was the scene graph. This is the set of objects that represent the 3D environment. Scene Graphs are common to all 3D rendering applications. They usually share a lot in common.  As opposed to using some of the more commonly available open source scene graphs such as Open Scene Graph, a simple one was developed specifically for this project. This added the benefits of flexibility as well as simplicity. Benchmarking was also a fundamental part of the code base. To improve the software application, appropriate benchmarks had to be performed. The benchmarks were performed by recording camera paths and then playing them back while recording the time spent for each frame see Testing and Evaluation Section 6. Another important tool that was developed as part of the code base was a 3D Studio Max Plugin. There are a lot of commonly used 3D file formats (VRML, DXF, 3DS, OBJ, etc…) supporting most of them directly would be a great undertaking. 3D Studio Max is a very successful 3D authoring package that supports most 3D file formats. Therefore by developing a 3DS Max export plugin that would export any scene one would implicitly support all the formats supported by this package.

 


4.1.1 Scene Graph

 

     The scene graph was kept to a minimum and extended as needed. It supports Lights, Cameras, Geometric Objects, and Materials. The following will describe all the classes used to represent the scene graph through their iterations of design.

 

Figure 15 Scene Graph Hierarchies

Camera

Vector3f    View Reference Point

Vector3f    View Up Vector

Vector3f    View Plane Normal

VF         View Frustum

     This class represents the view of the user. It also supported writing its state to a file. This was used to record camera paths to a file that where later used for benchmarking. User Navigation was support by this class’s methods. This class also contained the View Frustum class.

 

View Frustum

Plane         TOP, BOTTOM, FRONT, BACK, LEFT, RIGHT

     This class represents the View Frustum used for certain visibility calculations. It contains six planes to define that viewing volume.

 

 

 

GeoObject

Vector3f*   Vertices

Vector3f*   Vertex Normals

Vector3f*   Bounding Box Vertices

Vector2f*   UVs

Integer* Triangle Indices

Material*    Material

Matrix4f Current Transformation Matrix

 

     This class is used to represent the geometric objects occupying the virtual world. The geometric information is represented by indexed triangles. There are many other data structures that can be used for this purpose such as the winger edge data structure [23]. For the purpose of this report indexed triangles sufficed. The objects location can be determined by the use of the current transformation matrix [23]. This data structure also contains bounding boxes see section 4.2.2.

 

Material

Colour        Specular

Colour   Diffuse

Colour   Ambient

Float          Transparency

Float      Shininess

Char*         Map Path

Integer   Id

Char*         Material Name

 

     This class is used to describe the material properties of each object. It contains all the necessary data to implement phong shading [23]. The map path is used if the material has a texture associated with it.

 

 

 

 

Light

Vector3f    Light Position

Vector3f    Light Direction (used if light is directional)

Colour        Light Colour

Integer   Type (i.e. Directional or Point)

Float          Intensity

 

     One of the rendering features was to have simple lighting. This class provides support for this.

 

Oct-Tree Node

Vector3f*   Boundary

Octree Node Children Nodes (Pointer to eight children nodes)

Integer   Intersecting Object Index

Integer**    Intersecting Triangle Index

Integer   Depth of tree

 

     This class was used to represent the Oct-Tree. The Boundary defines the cube for that node. Each node contains a pointer to its children should it have any. They also contain pointers to all intersecting geometric objects. For each of these geometric objects the node contains an index into all of its intersecting faces. The depth of each is also stored see section 2.1.2.

 

Scene Graph

Light           Lights

Material          Materials

GeoObject          Geometric Objects

Oct-tree Node    Oct-Tree

Camera          Cameras

 

     The scene graph contains the lights, cameras, materials, geometric objects and the Oct-Tree. This class is used to manage all these objects and provide a few necessary utilities. It can read and write all its data to a file store.

 

For a much greater detailed description of all these classes please refer to the Appendix.

 


4.1.2 3DS Max Plugin

    

In order to view various 3D models stored in a multitude of different file formats, a 3DS Max Plugin was developed. 3DS Max is a professional 3D authoring package that supports a very large feature set. It also comes with an SDK that allows developers to extend its capabilities by developing different plugins. These plugins range from advanced renderers to animation tools. The plugin developed for this report was an exporter plugin that allows any scene currently loaded in 3DS Max to be saved to a file directly supported by the scene graph for this renderer. The plugin supports a subset of the Max scene graph. It supports materials as well as one diffuse texture map. It also supports basic lights and cameras as well as geometric objects. 3DS Max however supports a wide variety of different representations for geometric objects. These include indexed face sets, curved surfaces in the form of NURBS, Bezier Curves and various others. This plugin takes any one of these representations and converts them to an indexed triangle list before saving them to disk. The plugin is very simple and is used by entering the file menu and selecting export. The user then selects the format, filename and location to which he/she wants to save the file. Having selected the format designed for this renderer the user is then presented with a very simple options menu. This menu allows the user to select which parts of the scene to export, it also has a debug option that was implemented for testing purposes.

 

Figure 16 Screen Shot from 3DS Max Plugin

 

4.1.2.1 3DS Max Plugin File Format

 

     The file format used to store the Scene Graph was designed along with the 3DS Max plugin. It is an ASCII format, this was chosen because they can be easily modified by a standard text editor, no care has to be taken with regards the endienness of the platform and it can very easily be debugged. The draw back of ASCII is the space required to store these types of files is greater than a binary equivalent and they take a greater amount of time to be read and written. The structure of file is however very simplistic and is as follows.

 

Example File *.JDN

Number of Materials      = 1

Material Name         = Sky

Ambient                   = 0 0    0

Diffuse                    = 0.2     0.552941    0.176471

Specular                 = 0.9     0.9 0.9

Shine                = 0.25

Shine Strength        = 0.05

Transparency         = 0

Material ID         = 0

Map Path                 = C:\3dsmax4\maps\Skies\Duskcld1.jpg

      :

      :

      :

Number of Lights           = 1

Pos                   = -53.9952  455.163      0

Direction                  = 0 0    0

Colour               = 1 1    1

Intensity                  = 1

Fall                    = 45

Type                 = 2

      :

      :

      :

Number of Objects        = 1

Material ID         = 0

Num Vertices          = 8

Num Faces              = 12

Num textured vertices   = 12

Vert                  = 44.89 92.909  0

      :

      :

      :

Vert                  = 78.8883   171.119      60

Face                 = 0 2    3

      :

      :

      :

Face                 = 4 6    2

Textured Vert         = 0 0    0

      :

      :

      :

Textured Vert         = 1 1    0

Textured Face        = 9 11  10

      :

      :

      :

Textured Face        = 3 2    0

Face Normal            = 0 0    -1

      :

      :

      :

Face Normal            = -1      0    0

Vert Normal             = -1      0    0

      :

      :

      :

Vert Normal             = 0 1    0

 

 

 

 


4.1.3 Application Code

 

     The application code handles the application initialisation, the user interface and also contains the main rendering loop.

Application Initialization

This part of the application consists of initializing the hardware and the Scene Graph. Once the Scene Graph has been instantiated it then reads the file containing the 3D scene data. Once this has been completed the necessary data structure pre-processing occurs.

User Interface

The user interface is a simple one. The navigation of the environment is performed with fly through interaction. The mouse controls the viewing directions while the keyboard allows the user the move forwards or backwards along the viewing direction. Other keys allow a strafing action. The user can also enable certain viewing cues to be enabled. These include the visualization of certain data structures used to perform visibility operations. These structures are not actually part of the 3D environment. Lastly there are two more keys. One of them allows the user to record a camera path that is subsequently stored to the file system. The other allows the user to playback a recorded path while engaging a benchmarking feature that stores the results in a benchmark file.

 

The rendering loop and the pre-processing is discussed in the algorithmic design section 4.2 of this report while more detail with regards to benchmarking is presented on the evaluation and result section 6&7.

 

Figure 17 Initialization

 

 

 

4.2 Algorithmic Design

 

Figure 18 Rendering Pipeline

    

 

The above diagram shows the various iterations of the rendering pipeline. Please note that this is a very simplified representation of each pipeline and is by no means complete. However it does illustrate the differences between each pipeline as well as the evolution the pipeline went through in order to achieve real time performance. Each rendering pipeline is spilt in two; the first part represents the calculations performed by the CPU while the second part represents the calculations performed by the GPU. As GPUs are evolving they are taking more of the burden off the CPU. This section will review every iteration of the rendering pipeline and discuss the different algorithms selected to perform different pipeline tasks. It will also justify all the decisions that were made. These decisions where made with the careful use of evaluation tools. The evaluation techniques will be covered in greater depth further on in section 6.

 

4.2.1 1st Pipeline

    

The GPU can perform visibility with View Frustum Culling. The GeForce3 also supports a certain level of occlusion culling, this hardwired into its architecture. Finally the hidden surface removal is performed with the Z-Buffer. The first step was just to read the scene graph and throw all the triangles across the AGP bus and see what happens. As expected this yielded very poor results. It was quite obvious from the benchmarks see Appendix that too many triangles were being passed across the AGP bus and into the GPU. Careful care was taken to minimise the state changes that would have to occur in the GPU in order to setup Materials. Therefore all the objects where sorted by material.

 


4.2.2 2nd Pipeline

    

One very simple solution to reducing this burden was to remove as much as possible as soon as possible. However there was no point in implementing as exact View Frustum Clipping algorithm when the GPU could perform this task much more efficiently than the CPU. The easiest solution was to implement conservative View Frustum Clipping see Section 2.1.1 with the use of Bounding Volumes. For this task an Axis Aligned Bounding Volume [section 2.1.2] was created for every geometric object in the scene graph. These bounding volumes where then tested against the view frustum. If any intersections with an AABB succeeded the whole geometric object was then passed onto the GPU.  This produced much better results then the previous version of the rendering pipeline see Section 7.  This was an improvement however it was nowhere close to achieving adequate real-time performance. The target scene contained in excess of six thousand objects. Testing each one of these objects bounding volumes was too costly. The evaluation of this pipeline led to the conclusion that the rendering loop was CPU limited see Section 6.

 

4.2.3 3rd Pipeline

    

Having come to the conclusion that the pipeline was CPU limited the CPU conservative VFC had to be optimized. It was time to start taking advantage to spatial coherence. This could potentially reduce the CPU burden greatly. A decision between which spatial subdivision structure to use had to be made. With a little foresight it was clear that even spatial subdivision would not suffice and occlusion culling would eventually have to be used. The spatial subdivision algorithm would have to be useful to both VFC and Occlusion Culling. Two structures satisfied both those needs perfectly these included Oct-Trees and K-DTrees. The Oct-Tree data structure was selected because of its simplicity and elegance. The construction of an Oct-Tree data structure [Section 2.1.2] is very simple and the only parameter that has to be set is the depth at which the tree stops. This depth can depend on two factors; the minimum amount of triangles contained by any one node and the minimum size that a node can reach. These two factors can affect the performance of both the VFC and the Occlusion Culling more on this in Section 7. There are also small implementation decisions that can be made when implementing Oct-Trees [Section 5]. The Oct-Tree creation was performed as a pre-processing step. This occurred straight after the scene graph had been read. A conservative Hierarchical View Frustum Culling algorithm would walk the tree and remove all the non-visible nodes. This produced very good results. The performance of this algorithm produced a variable frame rate. In certain situations it yielded very adequate frame rates while in others the frame rates dropped considerably. At this point in the development most of the triangles that where not contained by the View Frustum where culled very early in the pipeline. However there where still triangles that where not visible due to occlusions, that were still being sent down most of the pipeline until they were removed by the Z-buffer.

 

4.2.4 4th Pipeline

    

It was clear from the previous evaluation that the geometric load was still too great however the previous rendering pipeline was not fill rate limited [see Section 6]. This indicated that some of the fill rate could be sacrificed in order to reduce the geometry throughput. This is referred to as balancing the rendering pipeline between fill rate and geometry throughput [see Section 7]. The goal of this iteration was to achieve Occlusion Culling [see Section 2.1.4] and therefore further reducing the geometric load on the GPU. An Occlusion Culling algorithm had to be employed. The first choice was which type of occlusion algorithm should be selected. Because of this extra fill rate an image space occlusion algorithm was selected over an object space one. This not only enables the rendering pipeline to be balanced but image based algorithms have a tendency to be much more flexible in terms of the information they can be given as well as implicitly supporting occlusion fusion. There was still a limiting factor, imposed by the architecture of the target platform. This was the AGP bus, unfortunately the AGP bus can only achieve high throughput in one direction. This means that one can feed a lot of data to the GPU but it is incredibly slow at retrieving data from the GPU. The problem with some of the previously discussed Image base Occlusion Culling algorithms is that they require the reading back of the Z-buffer from the GPU. Doing this for each frame would have a very serious impact on performance. An alternative had to be found. This alternative came in the form of an OpenGL extension supported by NVidia and various other manufacturers that will soon make it into the OpenGL 2 standard. This extension allows queries to be sent to the graphics card, these queries return the number of Z-buffer pixels that were written to by a set of primitives. I.E. one can create a query then proceed to draw a triangle and subsequently obtain the number of pixels that where projected from that triangle onto the depth buffer. For a pixel to be written to the depth buffer it would have to have a lower depth value (i.e. be in front). With this extension in mind the following algorithm was implemented. This algorithm consisted of firstly performing Hierarchical View Frustum Culling of the Oct-Tree. The Z-Buffer from the previous frame was kept. Then occlusion queries where generated, these queries were used in conjunction with the rendering of every Oct-Tree node that had not been culled previously. Once these nodes had been rendered the queries would return exactly which nodes were visible, (the Z-buffer is then cleared) the triangles contained in this subset of nodes were then selected and rendered. It is important to note that this is an approximate visibility algorithm. This means that in certain situations it will not render objects that in actual fact should be visible. This seems worse then it actually is. The worst possible case is for the camera to rotate 180 degrees about its View Up Vector and for all the objects contained within its newly placed viewing frustum to be further away then all of the previously rendered objects and for the previously rendered objects to have taken up most of the Z-buffer. This would create an almost empty frame where very few objects would be visible. However it would only happen for one frame, the following frame would clear the previous Z-buffer and all the visible objects would reappear. In practice this very rarely occurs. The faster the frame rate the less time that worst case frame would stay present on the screen and the less noticeable this aliasing would be. This potential error could also be capped to a maximum error by introducing some pre-emptive mechanism that would detect very extreme movement from one frame to the next and clear the Z-buffer should such an event occur. The advantage this approach has over a more conservative equivalent is that this algorithm does not need to perform any kind of occluder selection. Therefore no occluder pre-processing steps are required and no resource are utilised in determining which object should be selected as occluders. I would also like to mention that this algorithm was originally implemented with some skepticism but was found to produce very good results.

4.2.5 5th Pipeline

    

The evaluation of the 4th pipeline yielded very good result real-time performance with the test scenes were achieved. However it did introduce a fill rate limitation. This was due to the fact that all these Oct-Tree nodes not removed by the Hierachical View Frustum Culling where being rendered for each and every frame. One way to improve this limitation was to use temporal coherence [see Section 2.1.3]. This can be achieved by considering Oct-Tree node visible for certain number of frames. By doing so it does need to be rendered for occlusion queries for the specified number of frames. If its contents are in actual fact not visible it is the responsibility of the View Frustum Culling on the GPU to discard. This allows greater tuning of the graphics pipeline and enables a much more fine grain control when balancing fill rate vs. geometry throughput [see Section 7]. Another very simple way to double the fill rate was to introducing Back Face Culling, this is the process of removing all triangles that are facing away fro the camera. This very easily implemented with the use one OpenGL function call. It does put the constraint that double sided surfaces are no longer supported. Unfortunately the nature of the scenes used to test this pipeline is architectural in origin and subsequently the geometry describes more the topology of the objects and to a much lesser extent surface detail. This has a tendency of making level of detail less useful because further reductions of triangles alter the topology. A little experimentation was carried out and produced very poor results. An alternative had to be found. The only alternative left was impostors. Texture impostors are very prone to serious aliasing artefacts and for that reason a point based approach was selected. This was also very well suited to the previous pipeline. As one may recall extensive use was made of one OpenGL extension that enabled the number of Z-buffer pixels that where touched to be retrieved. This could also be used to determine which object representation to use. A threshold could be set; this would determine which representation of an object to use. If the threshold was exceeded the original object representation would be used otherwise a point impostor would replace it. However due to time constraints the Point Based impostor algorithm was not completed and therefore could not be tested. Therefore the only difference separating the 4th and 5th rendering pipeline was the temporal coherence technique.

 

Pseudo Code for Final Rendering Loop

ClearColourBuffer();

pCamera->look();                             // Update Camera Location

pCamera->m_pFrustum->calculateFrustum();           // Calculate new view frustum

pOctree->CullOctree(pptrNodes,*(pCamera->m_pFrustum)); // Hierarchical View Frustum Culling

glGenOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries); // Generate Occlusion Queries

glDisable(GL_LIGHTING);                       // Disable Lighting

glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); // Disable Colour Buffer

glDepthMask(GL_FALSE);                      // Disable writing to Z-Buffer

for ( int i=0 ; i<g_NumVisibleNodes ; i++ ){          // For each visible Node

     glBeginOcclusionQueryNV(ptrOcclusionQueries[i]);// Begin Occlusion Query

     pptrNodes[i]->DrawAABB();       // Render Bounding Box the box was not visible 3frames ago

     glEndOcclusionQueryNV();               // End Occlusion Query

}

glEnable(GL_LIGHTING);                        // Enable Lighting

glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); // Enable Colour Buffer

glDepthMask(GL_TRUE);                        // Enable write access to Z-Buffer

glClear(GL_DEPTH_BUFFER_BIT);                // Clear Z-buffer

for ( int j=0 ; j<g_NumVisibleNodes ; j++ ){

    glGetOcclusionQueryuivNV(ptrOcclusionQueries[j], GL_PIXEL_COUNT_NV,

                &pixelCount);                    // Retrieve Occlusion Query result

    if (pixelCount > 0 && < 20 ){

             pptrNodes[j]->DrawImpostor();        // If pixels touched smaller than 20 draw impostor

    }else{

             pptrNodes[j]->DrawNode();       // Otherwise draw true representation

    }

}

glDeleteOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries);   // Delete Queries

     

 

 

5 Implementation

 

     This section will focus on implementation details. It is very important to take extreme care while implementing any kind of system that requires high levels of performance. The key to the success of this project was selecting good algorithms that could truly harness the power of the GPU. It is therefore necessary to fully understand the underlying architecture of the platform. With this understanding one can specifically tailor certain aspects of the implementation to take advantage of the hardware or in other cases avoid some of its pitfalls.

 

5.1 Choice of Language

 

     Gone are the days when highly optimized rendering loops where completely implemented in assembly. Developing in assembly allows the developer full unrestricted control over the hardware. However as compilers have evolved and can produce optimized assembly there is no longer the need to develop at such a low level most of the time. The average developer would not be capable of matching a compilers performance when it came to developing a full blown application in assembly. However there are some exceptions to the rule. This renderer was fully developed in C++. This language was selected because it is Object Oriented which matches the design, its easier to port, it has now reached a certain level of maturity and there are many good compilers for it that can produce optimized code. Some may argue that standard C can yield better performance but it is my opinion that generally if care is taken when implementing software in C++ this is not the case. The advantages that C++ offers in terms of productivity greatly outweigh its disadvantages.

 


5.2 OpenGL Issues & Getting the most out of the GPU

    

To harness the power of the GPU one must be able to interface with it. This functionality is provided by an API. There are really only two to choose from; OpenGL vs. DirectX. The distinction between them is gradually fading away. DirectX has finally reached a certain level of maturity that OpenGL obtained a while ago. The choice was simple; once again portability comes to mind. OpenGL has much wider support for a variety of different platforms. However with the very rapid evolution of GPUs the OpenGL standard has not had the time to catch up. This has led to a multitude of different OpenGL extensions each supporting different GPUs. Unfortunately using these extensions had become unavoidable. There is hope with the new OpenGL 2 standard which hopes to unify most of these conflicts. To achieve good performance levels one must be very careful in choosing which OpenGL function to use when drawing triangles. OpenGL supports a whole variety of different ways of drawing triangles these include:

 

  • Immediate Mode
  • Display Lists
  • Vertex Buffers
  • Compiled Vertex Buffers
  • Vertex Array Range (NVidia Extension)

 

All of these methods with the exception of compiled vertex buffers were implemented with very different results. The immediate mode is the easiest to use and probably the most flexible however this comes at a very high price. Its performance is slow see Section 7. The reason for this is that too much information has to be sent across the AGP bus. This includes many OpenGL commands at least three usually four per triangle. One for each vertex and one for the vertex normal. This can be greatly reduced by wrapping these commands into a display list that can then be cached by the GPU. Display Lists improve performance to a small extent at the cost of only being able to use them for static geometry. Vertex Buffers improved the performance very substantially. This is because much less information has to be sent across the AGP bus and shared vertices only have to be sent once. However the fastest way to draw triangles on NVidia’s GPUs is to use one of their extensions VAR Vertex Array Range. This allows one to cache all the vertices either on the GPU itself or in the AGP memory. Because this project’s focus was on rendering large datasets it was not possible to cache everything on the GPU so AGP memory was used. Once all the vertices where cached the indices of the visible triangles where sent to the GPU. This allowed the GPU to DMA what it needed directly from the AGP memory and yielded substantial performance boosts see Section 7.

    

5.3 Oct-Tree Implementation Details

 

     There where certain details that had to be addressed when implementing the Hierarchical Oct-Tree Culling. Firstly during the pre-processing stage when the Oct-Tree is constructed certain triangles intersect two or more nodes. A decision had to be made on how to deal with these triangles. They split into one or more triangles each associated with its appropriate node. This would however greatly increase the number of scene triangles something that would not be desired. The alternative would be to associate the triangle with more than one node. To avoid drawing the same triangle multiple times these types of triangles could be flagged each time they are rendered for each frame and therefore avoiding overdraw. When building the Oct-Tree there is one crucial decision that has to be made. This decision involves determining when stop subdividing the Tree. One trivial case is to stop subdividing if a node does not contain any triangles. However it would be very costly to build a tree where each node contained one triangle. This would also defeat the purpose of the tree. Oct-Trees are usually subdivided until each contains no more than certain amount of triangles or the node has reached a minimum size. The thresholds used for this determine how the final tree is balanced. Obtaining a good balance is also crucial. This balance is a trade off between VFC, Occlusion Culling efficiency as well as GPU cache hits. This balance is quite tricky to achieve and also very dependent on the type of scene being rendered see Section 7. Sparse Scene where the objects are quite spread out will favour a coarser Oct-Tree while very dense scenes will favour a finer grained Oct-Tree. Then again if the Oct-Tree is too fine grained this will slow down the VFC and minimize vertex cache hits of the GPU, while obtaining more accurate Occlusion Culling. Since building the Oct-Tree is an offline process the best way to balance the tree would be to perform some kind of cost benefit analysis for various parameters and a particular scene. There are also small implementation details with regards the Hierarchical View Frustum Culling of the Oct-Tree that can yield minor performance advantages. Obtaining a back to front ordering of the tree will improve the rendering performance due to the fact that modern GPUs also have some kind of Occlusion Culling in their pipeline. Some other small advantages could be gained by not testing the children of a node that falls completely within the View Frustum as well as testing only a subset of a node’s children depending on which vertices of the parent node were contained within the Frustum. All of these are small examples of simple implementation details that help squeeze that little bit of extra performance out of the culling algorithms.

 

5.4 Other details addressed after testing

 

     This part of the report will focus on the little details that had to be addressed during the implementation. Most of these only became apparent after testing. To test the system a model of the Millennium Dome was used. This model was more or less polygon soup. This presented its own challenges. Firstly the ordering of the triangles was not consistent. This meant that consistent back face culling could not be achieved. Given the size of this model 1million+ triangles it was not feasible to fix this model by hand. Fortunately enough [26] developed an algorithm that could automate this process. This could also be used to fix normals that were pointing in opposite directions and therefore producing certain lighting artefacts. Another issue that quickly came to attention was one of numerical accuracy. Careful consideration when constructing bounding boxes had to be taken. It also created tearing artefacts caused by the low precision of Z-Buffers on consumer level graphic cards. This could be remedied in various ways; firstly the viewing volume should be as small as possible while still containing all the visible objects. This can be achieved by dynamically moving the front and back clipping planes. It could also be achieved by firstly rendering the front half of the scene and subsequently rendering the second half after having appropriately changed the clipping planes. This would however turn the rendering pipeline into a two pass pipeline. It was found that by moving the front clipping plane as far as possible combined with moving the back clipping plane as close as possible used in conjunction with the NVidia VAR extension and Back Face culling eliminated most if not all the tearing artefacts.

Figure 19 above example of tearing artefacts, below example of triangle ordering artefacts


6 Testing and Evaluation

 

6.1 Testing

    

Various tests where performed to verify algorithms and their implementations. Two test scenes were utilised for this project. The first was a relatively simple scene of islands and a sailing boat. This was a small scene that could be quickly loaded. It was used mainly to test all the small changes that were progressively made to the code base. If everything worked with this scene the large Millennium Dome scene was then used. This scene was stored in a large file that would take a few minutes to load and pre-process. Some of the testing was performed with visual cues. One example of this was when generating AABB and Oct-Trees they would be displayed on the screen. Other tests involved rendering various camera positions with different algorithms and comparing there results. The use of various debugging tools was also employed when necessary as well as output of various data to the consol during the rendering loop. Because this project was of a prototyping and research nature it was not tested as rigorously as any commercial product. But the necessary care was taken to insure that the results are accurate.

 

Figure 20 Simple Terrain Test Scene

 

 

 

 

 

Figure 21 Screen shot of developing environment displaying the rendering

Window containing AABB and the consol window indicating

The number of Objects passing the VFC

 

Figure 22 Screen shot showing the Oct-Tree partitions

 

 

6.2 Evaluation

 

     Evaluation is absolutely crucial in order to optimize any type of algorithm. The techniques used to evaluate also have to be relevant to the type of problem. Benchmarks therefore have to be carried out on the various algorithms, these benchmarks involve being able to consistently performming timing analysis on the same scene with the same set of camera location. The timing functions used also have to be very accurate. Sub millisecond accuracy is therefore needed. With the correct evaluation techniques such as profiling, one can pin point the current areas of which algorithms are the limiting factor.

 

6.2.1 Windows Timing Functions

 

     The appropriate timing functions have to be used. These have to be very accurate have a low latency and must require too much system resources. Various timing functions were tested so that the appropriate one could be selected. When timing it is very important to know how much time the actual timing function takes. The following are the results of timing function benchmarks. Please note that all these timing functions were benchmarked consistently. The following is an output of various timing functions that can be performed in windows on the PC platform. This output measures the performance of varying timing functions over different frequencies of use. It outputs for each method the number of times the function was called, the total time taken for all the iterations to be performed and the average time each call takes.

 

QueryPerformanceFrequency() freq  = 0  1193182

method 0:

  QueryPerfCntr..()  100 times

  tot:   0 760

  avg time:               6.36952e-006

method 0:

  QueryPerfCntr..()  500 times

  tot:   0 3842

  avg time:               6.43992e-006

method 0:

  QueryPerfCntr..()  1000 times

  tot:   0 11492

  avg time:                9.63139e-006

method 0:

  QueryPerfCntr..()  10000 times

  tot:   0 98118

  avg time:               8.22322e-006

method 1:

  GetTickCount()  100 times

  tot:   0 10

  avg time:               8.38095e-008

method 1:

  GetTickCount()  500 times

  tot:   0 20

  avg time:               3.35238e-008

method 1:

  GetTickCount()  1000 times

  tot:   0 72

  avg time:               6.03428e-008

method 1:

  GetTickCount()  10000 times

  tot:   0 233

  avg time:               1.95276e-008

 

method 2:

  TimeGetTime()  100 times

  tot:   0 30

  avg time:               2.51429e-007

method 2:

  TimeGetTime()  500 times

  tot:   0 111

  avg time:               1.86057e-007

method 2:

  TimeGetTime()  1000 times

  tot:   0 214

  avg time:               1.79352e-007

method 2:

  TimeGetTime()  10000 times

  tot:   0 2634

  avg time:               2.20754e-007

 

method 3:

  Pentium internal high-freq cntr()  100 times

  tot:   0 11

  avg time:               9.21905e-008

method 3:

  Pentium internal high-freq cntr()  500 times

  tot:   0 20

  avg time:               3.35238e-008

method 3:

  Pentium internal high-freq cntr()  1000 times

  tot:   0 38

  avg time:               3.18476e-008

method 3:

  Pentium internal high-freq cntr()  10000 times

  tot:   0 320

  avg time:               2.6819e-008

 


6.2.2 Evaluating the Rendering Pipeline

 

     Again various techniques were used to evaluate the rendering pipeline. Evaluating the pipeline as a whole was achieved with camera paths that were recorded and played back. When they were played back the exact time spent for each frame was recorded and written to a file. This data is used to tweak certain parameters such as the Oct-Tree balancing [see Section 7]. Timing the frame rate only gives a certain idea of the performance levels. This allows the rendering pipeline to be evaluated as a combination of algorithms. However it does give a good indication of each individual algorithm’s performance. Other techniques had to be employed for this. The camera paths were also carefully created. Different algorithms work more or less well for a variety of situations. Different camera paths where created to try and explore the strengths and weaknesses of each algorithm or a particular rendering pipeline. The test model of the Millennium Dome had the advantage of containing a whole variety of different situations that could challenge the pipeline. It contained dense as well as sparse separation of objects. It had small and very large objects, from the thin wire structures used to hold up the roof to the large towers used to hold the lighting equipment.

 

 

 

 

 

 

 

 


6.2.3 Finding Bottlenecks

 

     The key to re  fining the rendering pipeline is to find bottlenecks. These are the very specific stages of the pipeline that are limiting the overall pipeline. Usually when discussing rendering pipeline bottlenecks the terms; Fill Rate limited, CPU limited and Geometry limited are used.

 

Fill Rate limited: This can be thought of as the pixel rate. GPUs can write a maximum amount of pixels to its buffers. Various factors can influence this such as the GPU’s memory architecture. This limitation occurs at the GPU. Detecting if a rendering pipeline is fill rate limited is very simple. One just has to increase the display resolution and monitor the frame rate. If the rate drops at higher resolutions then the pipeline is fill rate limited. The causes are rasterisation and resolution. For example textures use up fill rate, the higher the resolution of the texture the more fill rate is used up. Image based occlusion culling also utilises fill rate. This is because more objects need to be raterised such as the Oct-Tree nodes in order to test them for occlusion. Fill rate limitation usually occurs towards the end of the pipeline.

 

CPU limited: This is a very general term that refers to any bottleneck occurring at the CPU stage of the rendering pipeline. One example of this would be when using a conservative View Frustum Culling algorithm on AABB with a scene containing thousands of objects. The time taken by the CPU to perform the culling is stalling the rest of the pipeline. Detecting CPU limitation again is usually straight forward. Profiling the application can pinpoint which part of the rendering pipeline calculated by the CPU is using up the most resources. CPU limitation usually occurs at the beginning of the pipeline.

 

Geometry limited: This refers to the maximum amount of geometry or triangles that can be sent down a rendering pipeline for a given frame rate. The causes of this limitation are much more difficult to pinpoint. They can be at CPU, GPU or the bus that links them both. Because of the nature of this type of bottleneck it can occur more or less anywhere in the rendering pipeline.

 

During every iteration of development the appropriate tools were employed to find the bottlenecks and remove them. There are various tools on the market that are can help this process. One example is VTune, this is a tool designed by Intel that specifically analyses code for their range of CPUs. It produces information indicating which portions of code the CPU spends most of its time executing. It also comes with a very specialized compiler that optimizes code to maximise its performance on the Pentium processor. AMD also have an equivalent tool. One can use such tools to help speed up the process of finding bottlenecks. An alternative to using such tools is to time very specific portions of the rendering pipeline. This was performed for this project. By doing so, one can determine the exact percentage of the frame rendering time that is spent at which stage of the pipeline. This can be a very powerful tool in establishing where to focus attention to. For example the second rendering pipeline [see Section 4.2] used conservative view frustum culling on bounding boxes. With the use of specific timing one could clearly see that too much time in proportion of the pipeline was spent at this stage. With this in mind the next pipeline concentrated on optimising this culling procedure with the use of spatial sub division. All of these tools were used and reused time and time again with each generation of the rendering pipeline. They where also used to optimize certain parameters such as the Oct-Tree balancing. The following section of this report will present in much greater detail some of the results that where achieved all of which were obtained with such tools.

 

 

 

 

 

 

 

 


7 Results

    

     This section of the report will present some of the different benchmark results that where obtained when exploring different algorithms and their implementations as well as some of the effects different parameters had on the performance. Please note that during the Development Cycle [see Section 3] the algorithms were changed but so was the implementation. Different OpenGL routines were used and various GPU specific extensions were utilized. These extensions add a substantial performance to the application. Before taking the final benchmarks some of the previous rendering pipelines were re-implemented using some of the more advanced extensions in order to be compared more subjectively. This however did change some of the results that where previously obtained during the earlier development. These effects will be described in more detail where appropriate. This section will explore the various pipelines, their performance and their properties. Please note that camera 2 represents a walkthrough while camera 3 is a worst case. Camera3 exploits the rendering pipeline’s weaknesses. Camera 1 is a simple path that consists of a 360 degree rotation about the centre of the scene.

 

7.1 Benchmark System

    

     The following is the technical specification of the system these various benchmarks were run on.

CPU

AMD Athlon 650Mhz

GPU

NVidia GeForce3

Driver Revision

Detonator 28.32

RAM

768 MB SDRAM 100Mhz

OS

Windows XP

HD

Ultra Wide SCSI AtlasIV

BUS

AGP 2X

MotherBoard

ASUS K7M revision 2

Monitor Refresh

100 HZ

 

The PC system used to conduct the benchmarking was outdated, it did however have a good GPU but one major bottleneck in this system was the AGP2x bus. This bus not did permit the GPU to be fully exploited. Other benchmarks were performed on other systems that were more recent and up to date. The results obtained on these machines where substantially better. Again this was most probably caused by the fact that these machines had a much faster bus from the CPU to the GPU.

    

7.2 Tuning Pipeline 3-5 by Balancing Oct-Tree

 

     As mentioned previously [see Section 4.2.3] it is very important to balance the Oct-Tree. In this case it was achieved by choosing two thresholds. The first is the maximum recursion depth and the second is the maximum number of triangles a leaf node can contain. While constructing the Oct-Tree, a node is subdivided if it contains more triangles than the maximum threshold and it has not reached the maximum depth. These two thresholds balance the tree. Benchmarks where taken with four different threshold values for three given camera paths.

 

Series 1: Max Depth = 12, Max Triangles = 2000 Pipeline 5

Series 2: Max Depth =32, Max Triangles = 2000  Pipeline 5

Series 3: Max Depth = 46, Max Triangles = 1000 Pipeline 5

Series 4: Max Depth = 32, Max Triangles = 4000 Pipeline 5

 

 

As one can see from the above graphs these values can affect the frame rate quite substantially. The best threshold values also vary from scene to scene. Therefore one must select the most appropriate values for a particular scene individually. These thresholds allow the trade off between Culling Optimization and Hardware Cache Hits to be set. Greater depth will improve culling both for the View Frustum Culling and will optimize the Occlusion Culling by using up less fill rate. However the greater depth will also decrease the number of cache hits in the GPU when drawing triangles. Both these factors can impact performance greatly. The Oct-Tree balance affects these two tradeoffs. One should consider both the hardware platform and the type scene to determine which values are best suited. It would also be possible to implement a rendering application that would perform certain pre-processing test that would determine how to balance the Oct-Tree for a given GPU and a given scene. This could be achieved simply by running various sets of benchmarks to find the perfect performance convergence and balance the tree appropriately. 


7.3 Tuning Pipeline 5 with Temporal Coherence

 

     Another threshold that can have an impact on the frame rate is temporal coherence values used for the occlusion culling. This value indicates how long a node is considered visible for after it has passed the occlusion test. Please not that if that node is culled by the view frustum this assumption is cancelled.

Series 1: No temporal coherence Pipeline 4

Series 2: 3 Frames Pipeline 5

Series 3: 5 Frames Pipeline 5

 

 

From these graphs we can see that using temporal coherence does improve rendering performance. One must however be very careful. If the user does not navigate the scene very erratically this value can be relatively large. However if there is a lot of user navigation then temporal coherence may actually slow down the rendering pipeline. This is the trade off between small and large thresholds for temporal coherence. One possible improvement would be to make this value dynamic based of the user navigation. Should navigational movement increase this value would decrease and oppositely should the user navigational movement decrease this value would increase. This would add certain flexibility to the pipeline.

 


7.4 Rendering Pipeline Comparisons

 

     Finally we can compare the differences between different rendering pipelines.

 

Series 1: View Frustum Culling on Bounding Boxes Pipeline 2

Series 2: Hierarchical View Frustum Culling Pipeline 3

Series 3: Hierarchical View Frustum Culling Occlusion Culling and the use of Temporal Coherence of 5 frames Pipeline 5

 

 

There is a problem with these graphs. They seem to indicate that pipeline 3 performs better than pipeline 5 this should not be the case. The reason for this contradiction is that to perform these benchmarks pipeline 3 was re-implemented when doing so, asynchronicity was added between the Hierarchical View Frustum Culling performed on the CPU and the rest of the stages performed on the GPU [see Section 4.2]. This enabled the GPU’s pipe to get fuller than in the pipeline 5 and therefore the GPU kept busy most of the time and its resources were maximized. This produced superior results to the pipeline 5 implementation where unfortunately the GPU is stalled waiting for the Occlusion Queries [see Section 4.2.4] to return. However during the development of pipeline 5 the initial implementation of pipeline 3 did not contain this asynchronicity and its performance was significantly inferior to pipeline 5 most of the time. These results suggest that a new algorithm for the occlusion culling should be designed or that the current algorithm should be re-implemented to take advantage of asynchronicity between the CPU and the GPU [see Section 8]. This would probably yield much better performance than the current implementation.

 

 

 

 


8 Conclusion and Further Work

 

     This report has examined various techniques to speed up rendering of large and complex environments. It has emphasised on some of the more subtle parameter tuning that is required to obtain the most out of a particular algorithm. The importance of implementation has also been covered. What is the best way to get the most out of the target platform?  Optimising a rendering pipeline to gain substantial speed improvements is not a black art. This report has shown that with careful testing and benchmarking as well as experimentation this can be achieved. The end pipeline is by no means optimal. There is still a lot of potential to improve performance further. One great way to achieve substantial gains is it to keep the GPU pipeline full of data. This would involve introducing asynchronicity between the CPU and the GPU [see Section 7.4]. One example of this would be to start sending culled triangles to the GPU before the CPU has finished [see Section 7.4]. This would effectively overlap some of the rendering pipeline stages and keep the GPU busy for more of the rendering time. This same principle could also be applied to the occlusion stages of the pipeline. This kind of technique should yield some very substantial improvements in performance. Another way the to further optimize the rendering pipeline would be to use K-DTrees as replacement to Oct-Trees. The K-DTree has much more optimal bounding structure than the Oct-Tree. This would introduce certain questions that would have to be addressed such as how to balance the K-DTree but it would also improve the View Frustum Culling as well as the Occlusion Culling. The Occlusion Culling algorithm introduced in this report is far from optimal. Looking at alternatives would obtain greater insights. As the report has shown in certain circumstances the Occlusion Culling can reduce the performance. Developing a rendering pipeline that could dynamically modify itself depending on the rendering circumstances could potentially achieve greater overall performance while overcoming the Occlusion Culling weakness. For example such a pipeline would be able to disable Occlusion Culling if there were no good Occluders. Another technique that could help speed the pipeline would be to build an Adjacency Graph with the spatial subdivision data structure. This would enable the data structure to be traversed in a specified order and could be used to speed the occlusion algorithm as well as maximise the performance on the GPU as GPUs perform better when they are given triangles in a front to back order. These are just some of the ways to further improve on the work performed for this project. It would also have been beneficial to have fully explore point based impostors. This would have improved performance as well as possibly removed some aliasing artefacts. Point Based Impostor algorithms would be ideal for this type of project.

 

 

Figure 23 Screen shot of Millennium Dome

Figure 24 Screen shot of Millennium Dome

Figure 25 Screen shot of Millennium Dome

 

Figure 26 Screen shot of Millennium Dome

Figure 27 Screen shot of Millennium Dome

Figure 28 Screen shot of Millennium Dome


References

 

[1] William V.Baxter III, Avneesh Sud, Naga K.Govindaraju, Dinesh Manocha, “GigaWalk: Interactive Walkthrough of Complex Environments”, UNC-CH Technical Report TR02-013

[2] D.Aliaga, J.Cohen, A.Wilson. H.Zang, C.Erikson, K.Hoff, T.Hudson, W.Stuerzlinger, E.Baker, R.Bastos, M.Whitton, F.Brooks, D.Manocha, “A Framework for the Real-Time Walkthrough of Massive Models”, UNC-CH Technical Report TR# 98-013

[3] Thomas A.Funkhouser and Carlo H.Sequin, “Adaptive Display Algorithm for Interactive Frame Rates During Visualization of Complex Visual Environments”, University of California Berkeley

[4] H.Zang, D.Manocha, T.Hudson, K E.Hoff III, “Visibility Culling using Hierarchical Occlusion Maps”, University of North Carolina, Chapel Hill

[5] Gernot Schaufler, “Per-Object Image Warping with Layered Impostors”, EUROGRAPHICS Workshop ‘99

[6] Marc Stamminger, George Drettakis, “Interactive Sampling and Rendering for Complex and Procedural Geometry”

[7] Michael Wimmer, Peter Wonka, Fancois Sillion, “Point-Based Impostors for Real-Time Visualization”

[8] H.Hoppe, “Progressive Meshes”, Computer Graphics SIGGRAPH ’96 Proceedings

[9] Xavier Decoret, Grenot Schaufler, Francois Sillion, Julie Dorsey, “Mulit-Layered impostors for accelerated rendering”, EUROGRAPHICS ‘99

[10] Scott MacKenzie and Colin Ware, “Lag as a Determinant of Human Performance in Interactive Systems”, Proceedings of the ACM Conference on Human Factors in Computing Systems INTERCHI ’93, 488-493

[11] Seth J.Teller, Carlo H.Sequin, “Visibility Preprocessing For Interactive Walkthroughs”, ACM Computer Graphics, Volume 25, Number 4, July 1991

[12] N.Greene, M.Kass and G.Miller, “Hierarchical Z-buffer Visibility”, Proc of ACM Siggraph, pages 231-238, 1993

[13] J.Airey, J.Rohlf, and F.Brooks, “Towards image realism with interactive update rates in complex virtual building environments”, Symposium on Interactive 3D Graphics, pages 41-50, 1990

[14] D.Blythe and T.McReynolds, “Programming with OpenGL: Advanced course”, SIGGRAPH ’96 Course Notes, 1996

[15] S R.Ellis, “Nature and Origins of Virtual Environments: A Bibliography Essay”, Computing Systems in Engineering Vol. 2, No.4, pages 321-347, 1991

[16] Paeth, “Graphics Gems 5”, AP Academic Press

[17] Weiss, “Data Structures & Algorithm Analysis”, Addison Wesley

[18] Press, Teukoslky, Vetterling, Flannery, “Numerical Recipes in C”, Camberidge

[19] Foley, van Dam, Feiner, Hughes, “Computer Graphics Principles and Practice”, Addison Wesley

[20] Angel, “Interactive Computer Graphics with OpenGL”, Addison Wesley

[21] Woo, Neider, Davis, “OpenGL Programming Guide”, Addison Wesley

[22] Alan Watt, “3D Computer Graphics 3rd Edition”, Addison Wesley

[23] Mel Slater, Anthony Steed, Yiorgos Chrysanthou, “Computer Graphics and Virtual Environments”, Addison Wesley

[24] Alan Watt, Mark Watt, “Advanced Animation and Rendering Techniques Theory and Practise”, Addison Wesley

[25] Tomas Akenine-Moller, Eric Haines, “Real-Time Rendering”, A.K. Peters Ltd

[26] J. Oliveira, A.Steed, “Determining Orientation of Laser Scanned Surfaces”, SIACG 2002, 2-5 Julho 2002

[27] Florent Duguet and George Drettakis, “Robust Epsilon Visibility”, Proceedings of ACM SIGGRAPH 2002

[28] Seth Jared Teller, “Visibility Computations in Densely Occluded Polyhedral Environments”, PhD Thesis University of California at Berkely

[29] Szymon Rusinkeiwicz, Marc Levoy, “QSplat: A Multiresolution Point Rendering System for Large Meshes”¸ SIGGRAPH 2000

[30] D.Cohen-Or, Y.Chrysanthou, and C.Silva, “A survey of visibility for walkthrough applications.” SIGGRAPH Course Notes #30, 2001

[31] S.Coorg and S.Teller, “Real-time occlusion culling for models with large occluders.” Proc of ACM Symposium on Interactive 3D Graphics, 1997

[32] D.Bartz, M.Meibner and T.Huttner, “OpenGL assisted occlusion culling for large polygonal models”, Computer and Graphics, 23(3):667-6679, 1999

[33] C.Andujar, C.Saona-Vazquez, I.Navazo and P.Brunet, “Integrating occlusion culling and level of detail through hardly-visible sets”, Proceedings of Eurographics, 2000

[34] F.Durand, G.Drettakis, J.Thollot and C.Puech, “Conservative visibility preprocessing using extended projections”, Proc. Of ACM SIGGRAPH, pages 239-248, 2000

[35] C.Erikson and D.Manocha, “Gaps: General and Automatic Polygon Simplification”, Proc. Of ACM Symposium on Interactive 3D Graphics, 1999

[36] C.Erikson, D.Manocha and W.Baxter, “HLODS for fast display of large static and dynamic environments”, Proc of ACM Symposium on Interactive 3D Graphics, 2001

[37] J.El-Sana, N.Sokolovsky, and C.Silva, “Integrating occlusion culling with view dependent rendering”, Proc. Of IEEE Visualization, 2001

[38] T.A Funkhouser, D.Khorramabadi, C.H.Sequin, and S.Teller, “The ucb system for interactive visualization of large architectural modes”, Presence, 5(1):13-44, 1996

[39] B.Garlick, D. Baum, and J.Winget, “Interactive Viewing of Large Geometric Databases Using Multiprocessor Graphics Workstations. SIGGRAPH ’90 Course Notes (Parallel Algorithms and Architectures for 3D Image Generation), Volume 28, 1990

[40] J.Klowoski and C.Silva. “The prioritized-layered projection algorithm for Visible set estimation”, IEEE Trans. On Visualization and Computer Graphics, 6(2):108-123, 2000

[41] I.Wald, P.Slusallek, and C.Benthin, “Interactive distributed ray-tracing of highly complex models”, Rendering Techniques, p274-285, 2001

[42] P.Wonka, M.Wimmer, and D.Schmalstieg, “Visibility preprocessing with occluder fusion for urban walkthroughs”, Rendering Techniques, p71-82, 2000

[43] P.Wonka, M.Wimmer, and F.Sillion, “Instant visibility”, Proc. of EUROGRAPHICS, 2001

[44] J.Oliveira, B.Buxton, “Non-linear simplification of scanned models, Numérisation”, 3D - SCANNING 2002, Paris-France (invited paper)

[45] K.M.Stanney, R.R.Mourant, and R.s.Kennedy, “Human Factors Issues in Virtual Environments: A Review of the Literature”

[46] Alan Watt, Fabio Policarpo, “3D Games Real-time Rendering and Software Technology”, Addison Wesley

[47] Fuchs, H., Kedem, Z.M., Naylor, B. “On visible surface generation by a priori tree structures, Proc. SIGGRAPH '80, Comput. Graph., 14 (1980), 124--133.

[48] Schroder, and Zarge, and Lorensen, “Decimation of triangle meshes”, Computer Graphics (SIGGRAPH ’92 Proceedings) 26, 2(1992), 65-70

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Appendix

 

User Requirements

 

CPU      Intel Pentium 3 or higher

GPU      NVidia GeForce 3 or higher

RAM      512 MB

HD         100 MB

Drivers  OpenGL 1.2 Drivers

OS         Windows 2000, XP

 

User Manual

 

Navigation: Hold left mouse button and move the mouse to change camera orientation the key A,W,S,D are used to move the camera left, forwards, backwards, right respectively.

 

Keys

Z:      Stop/Start Camera path recording

X:      Play/Stop Camera path

O:      Draw Oct-Tree

ESC:          Close Application

 

Code

MassiveMain.cpp

 

#include "MassiveMain.h"

 

 

 

#ifdef _WIN32

#include <windows.h>        // Header File For Windows

#endif

 

#include <GL\gl.h>               // Header File For The OpenGL32 Library

#include <GL\glu.h>                   // Header File For The GLu32 Library

#include <GL\glaux.h>         // Header File For The Glaux Library

#include <GL\glut.h>

 

 

 

#include <iostream.h>

#include <ostream.h>

#include <istream.h>

#include <fstream.h>

 

#include <string.h>

 

#include "WGLEXT.h"

#include "glext.h"

#include "SceneGraph.h"

#include "Timer.h"

#include "JDNCamera.h"

#include "Octree.h"

 

//

 

 

PFNGLGENOCCLUSIONQUERIESNVPROC          glGenOcclusionQueriesNV              = NULL;

PFNGLDELETEOCCLUSIONQUERIESNVPROC glDeleteOcclusionQueriesNV = NULL;

PFNGLISOCCLUSIONQUERYNVPROC                      glIsOcclusionQueryNV                     = NULL;

PFNGLBEGINOCCLUSIONQUERYNVPROC         glBeginOcclusionQueryNV              = NULL;

PFNGLENDOCCLUSIONQUERYNVPROC                   glEndOcclusionQueryNV                        = NULL;

PFNGLGETOCCLUSIONQUERYIVNVPROC         glGetOcclusionQueryivNV               = NULL;

PFNGLGETOCCLUSIONQUERYUIVNVPROC             glGetOcclusionQueryuivNV             = NULL;

 

PFNGLGENFENCESNVPROC                                           glGenFencesNV                  = NULL;

PFNGLDELETEFENCESNVPROC                                      glDeleteFencesNV  = NULL;

PFNGLSETFENCENVPROC                                              glSetFenceNV               = NULL;

PFNGLTESTFENCENVPROC                                            glTestFenceNV             = NULL;

PFNGLFINISHFENCENVPROC                                    glFinishFenceNV           = NULL;

PFNGLISFENCENVPROC                                                  glIsFenceNV                        = NULL;

PFNGLGETFENCEIVNVPROC                                           glGetFenceivNV            = NULL;

 

PFNGLVERTEXARRAYRANGENVPROC                         glVertexArrayRangeNV                   = NULL;

PFNGLFLUSHVERTEXARRAYRANGENVPROC  glFlushVertexArrayRangeNV = NULL;

 

PFNWGLALLOCATEMEMORYNVPROC                    wglAllocateMemoryNV = NULL;

 

 

ofstream           bfile;

ofstream           ofile;

ifstream            ifile;

 

int g_DisplayListCtr;

int g_NumNodes;

int g_NumVisibleNodes;

 

Octree**            pptrNodes;

 

SceneGraph* pScene;

JDNCamera*     pCamera;

bool                         bLigting;

int                            iWidth, iHeight;

TimingInfo  timings;

DWORD                     tmp_tick, last_tick;

int                            lastX, lastY;

LARGE_INTEGER freq;

char    buffer[50];

Vector4f*          pBBox;

int                            totalVertices;

Octree*                   pOctree;

bool                         bDrawOctree;

bool                         bDrawScene;

GLuint*     ptrOcclusionQueries;

GLuint      pixelCount;

 

bool                         isRecording;

bool                         isPlayingback;

 

void setupExt(void){

 

      // Occlusion query extensions

      glGenOcclusionQueriesNV              = (PFNGLGENOCCLUSIONQUERIESNVPROC)    wglGetProcAddress("glGenOcclusionQueriesNV");

      glDeleteOcclusionQueriesNV    = (PFNGLDELETEOCCLUSIONQUERIESNVPROC)     wglGetProcAddress("glDeleteOcclusionQueriesNV");

      glIsOcclusionQueryNV                     = (PFNGLISOCCLUSIONQUERYNVPROC)    wglGetProcAddress("glIsOcclusionQueryNV");

      glBeginOcclusionQueryNV              = (PFNGLBEGINOCCLUSIONQUERYNVPROC)   wglGetProcAddress("glBeginOcclusionQueryNV");

      glEndOcclusionQueryNV                        = (PFNGLENDOCCLUSIONQUERYNVPROC)       wglGetProcAddress("glEndOcclusionQueryNV");

      glGetOcclusionQueryivNV               = (PFNGLGETOCCLUSIONQUERYIVNVPROC)   wglGetProcAddress("glGetOcclusionQueryivNV");

      glGetOcclusionQueryuivNV             = (PFNGLGETOCCLUSIONQUERYUIVNVPROC) wglGetProcAddress("glGetOcclusionQueryuivNV");

     

      // Fence Extensions     

      glGenFencesNV                  = (PFNGLGENFENCESNVPROC)    wglGetProcAddress("glGenFencesNV");

      glDeleteFencesNV  = (PFNGLDELETEFENCESNVPROC) wglGetProcAddress("glDeleteFencesNV");

      glSetFenceNV               = (PFNGLSETFENCENVPROC)               wglGetProcAddress("glSetFenceNV");

      glTestFenceNV             = (PFNGLTESTFENCENVPROC)       wglGetProcAddress("glTestFenceNV");

      glFinishFenceNV           = (PFNGLFINISHFENCENVPROC)     wglGetProcAddress("glFinishFenceNV");  

      glIsFenceNV                        = (PFNGLISFENCENVPROC)                   wglGetProcAddress("glIsFenceNV");   

      glGetFenceivNV            = (PFNGLGETFENCEIVNVPROC)            wglGetProcAddress("glGetFenceivNV");

      // VAR extensions

      glVertexArrayRangeNV                   = (PFNGLVERTEXARRAYRANGENVPROC)             wglGetProcAddress("glVertexArrayRangeNV");

      glFlushVertexArrayRangeNV = (PFNGLFLUSHVERTEXARRAYRANGENVPROC)wglGetProcAddress("glFlushVertexArrayRangeNV");

 

      wglAllocateMemoryNV = (PFNWGLALLOCATEMEMORYNVPROC)wglGetProcAddress("wglAllocateMemoryNV");

 

}

 

 

static void

output( float x, float y, char *string )

{

      int len, i;

 

      glRasterPos2f(x, y);

      len = (int) strlen(string);

      for (i = 0; i < len; i++)

      {

             glutBitmapCharacter(GLUT_BITMAP_8_BY_13, string[i]);

      }

}

 

void recordfunc(void){

 

      if ( isRecording ){

             //ofile << '\0';

             ofile.close();

             isRecording = false;

             cout << "Stop recording" << endl;

             return;

      }

 

      if ( isPlayingback ){

             cout << "Sorry cannot record camera path while playing back" << endl;

             return;

      }

 

      ofile.open("CameraPath.txt");

      if ( ofile.is_open() ){

             isRecording = true;

             cout << "Start recording" << endl;

      }

      else{

             cout << "Sorry could not create camera path file" << endl;

             isRecording = false;

      }

      return;

}

 

 

void playbackfunc(int cam){

 

      if ( isPlayingback ){

             isPlayingback = false;

             ifile.close();

             bfile.close();

             cout << "Stop playback" << endl;

             return;

      }

 

      if ( isRecording ){

             cout << "Sorry cannot playing back camera path while recording" << endl;

             return;

      }

 

      switch ( cam ){

             case 1:

                   ifile.open("CameraPath1.txt");

                   bfile.open("Benchmarks.txt");

                   break;

             case 2:

                   ifile.open("CameraPath2.txt");

                   bfile.open("Benchmarks.txt");

                   break;

             case 3:

                   ifile.open("CameraPath3.txt");

                   bfile.open("Benchmarks.txt");

                   break;

      }

 

      if ( ifile.is_open() && bfile.is_open()){

             isPlayingback = true;

             cout << "Start playing back " << endl;

      }

      else{

             cout << "Could not open camera path file" << endl;

             isPlayingback = false;

      }

      return;

 

}

 

 

void drawGL(void)

{

     

  glClearColor(0.0, 0.0, 0.0, 0.0);

      glClear(GL_COLOR_BUFFER_BIT );//| GL_DEPTH_BUFFER_BIT);

 

      glMatrixMode(GL_MODELVIEW);

      glLoadIdentity();

 

      int numObjects=0;

      pCamera->update();

 

      if ( isPlayingback ){

             if ( pCamera->read(ifile) ){

            

             }else{

                   ifile.close();

                   bfile.close();

                   isPlayingback = false;

                   cout << "Stop Playingback" << endl;

             }    

      }

 

      pCamera->look();

      pCamera->m_pFrustum->calculateFrustum();

     

      if ( isRecording ){

             pCamera->write(ofile);

      }

     

      //if ( bDrawScene ){

      //    pOctree->NaiveDraw(*(pCamera->m_pFrustum));

      //}

 

     

      g_NumVisibleNodes = 0;

 

     

      pOctree->CullOctree(pptrNodes,*(pCamera->m_pFrustum));

     

 

      //QueryPerformanceCounter( &( timings.m_start_clk ) );

      glGenOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries);

     

      glDisable(GL_LIGHTING);

      glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);

  glDepthMask(GL_FALSE);

 

      if ( bDrawScene ){

            

             for ( int i=0 ; i<g_NumVisibleNodes ; i++ ){

                   glBeginOcclusionQueryNV(ptrOcclusionQueries[i]);

            // render bounding box for object i

                   if ( pptrNodes[i]->m_Life == 0 ) pptrNodes[i]->DrawAABB();

                   glEndOcclusionQueryNV();

             }

 

 

             glEnable(GL_LIGHTING);

             glEnable(GL_LIGHT0);

 

             glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

    glDepthMask(GL_TRUE);

             glClear(GL_DEPTH_BUFFER_BIT);

 

             for ( int j=0 ; j<g_NumVisibleNodes ; j++ ){

                  

                   glGetOcclusionQueryuivNV(ptrOcclusionQueries[j], GL_PIXEL_COUNT_NV,

                                     &pixelCount);

      if (pixelCount > 50 || pptrNodes[j]->m_Life > 0) {

                // render object i

                         pptrNodes[j]->m_Life--;

                         pptrNodes[j]->DrawNode();

                   }

 

                  

             }

      }

 

      //glFlushVertexArrayRangeNV();

      glDeleteOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries);

      //QueryPerformanceCounter( &( timings.m_end_clk ) ); 

     

      /*   

      glPushMatrix();

 

      for ( int i=0 ; i<pScene->numMaterials ; i++ ){

            

             JDNMaterial mat = pScene->pMaterials[i];

             if ( mat.numGeoObjects > 0 ){

                  

                  

                  

                  

                   GLfloat no_mat[] = { 0.0,  0.0,  0.0,  1.0};

 

                   GLfloat mat_a[4];

                   GLfloat mat_d[4];

                   GLfloat mat_s[4];

                   GLfloat mat_shine[1];

 

                   mat_a[0] = (GLfloat)mat.ambient.x;

                   mat_a[1] = (GLfloat)mat.ambient.y;

                   mat_a[2] = (GLfloat)mat.ambient.z;

                   mat_a[3] = 1.0;

 

                   mat_d[0] = (GLfloat)mat.diffuse.x;

                   mat_d[1] = (GLfloat)mat.diffuse.y;

                   mat_d[2] = (GLfloat)mat.diffuse.z;

                   mat_d[3] = 1.0;

 

                   mat_s[0] = (GLfloat)mat.specular.x;

                   mat_s[1] = (GLfloat)mat.specular.y;

                   mat_s[2] = (GLfloat)mat.specular.z;

                   mat_s[3] = 1.0;

 

                   mat_shine[1] = 0.0;

 

                   glMaterialfv( GL_FRONT_AND_BACK, GL_AMBIENT, mat_a );

                   glMaterialfv( GL_FRONT_AND_BACK,  GL_DIFFUSE, mat_d );

                   glMaterialfv( GL_FRONT_AND_BACK, GL_SPECULAR, mat_s );

                   glMaterialfv( GL_FRONT_AND_BACK,  GL_SHININESS, mat_shine );

                   glMaterialfv( GL_FRONT_AND_BACK,  GL_EMISSION, no_mat);

 

                   if ( true ){

                         //glEnable(GL_TEXTURE_2D);

                         for ( int x=0 ; x<mat.numGeoObjects ; x++ ){

                               

                                JDNGeoObject* obj = mat.pGeoObjects[x];

 

                                float cm[16] = { obj->currentTM.m[0][0],obj->currentTM.m[0][1],obj->currentTM.m[0][2],obj->currentTM.m[0][3],

                                obj->currentTM.m[1][0],obj->currentTM.m[1][1],obj->currentTM.m[1][2],obj->currentTM.m[1][3],

                                obj->currentTM.m[2][0],obj->currentTM.m[2][1],obj->currentTM.m[2][2],obj->currentTM.m[2][3],

                                obj->currentTM.m[3][0],obj->currentTM.m[3][1],obj->currentTM.m[3][2],obj->currentTM.m[3][3], };

 

                                pBBox = new Vector4f[8];

                                for ( int q=0 ; q<8 ; q++){

                                      pBBox[q] = obj->currentTM*obj->pBBVertices[q];

                                }

                               

                               

                                if ( pCamera->m_pFrustum->isCubeInFrustum(pBBox) ){

                                      numObjects++;             

                                      glPushMatrix();

                               

                                      glMultMatrixf(cm);

 

 

 

                                      // -- Draw BBox --- If selected

                                      /*

                                      if ( obj->drawBB ){

                                            glColor3f(1.0,1.0,1.0);

                                            glBegin(GL_LINES);

                                                   glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y, obj->pBBVertices[0].z);

                                                   glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y, obj->pBBVertices[1].z);

                                                   glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y, obj->pBBVertices[1].z);

                                                   glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y, obj->pBBVertices[2].z);

                                                   glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y, obj->pBBVertices[2].z);

                                                   glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y, obj->pBBVertices[3].z);

                                                   glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y, obj->pBBVertices[3].z);

                                                   glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y, obj->pBBVertices[0].z);

                                                   glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y, obj->pBBVertices[0].z);

                                                   glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y, obj->pBBVertices[4].z);

                                                   glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y, obj->pBBVertices[4].z);

                                                   glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y, obj->pBBVertices[5].z);

                                                   glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y, obj->pBBVertices[5].z);

                                                   glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y, obj->pBBVertices[6].z);

                                                   glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y, obj->pBBVertices[6].z);

                                                   glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y, obj->pBBVertices[7].z);

                                                   glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y, obj->pBBVertices[7].z);

                                                   glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y, obj->pBBVertices[4].z);

                                                   glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y, obj->pBBVertices[1].z);

                                                   glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y, obj->pBBVertices[5].z);

                                                   glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y, obj->pBBVertices[2].z);

                                                   glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y, obj->pBBVertices[6].z);

                                                   glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y, obj->pBBVertices[3].z);

                                                   glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y, obj->pBBVertices[7].z);

                                            glEnd();

                                      }

                               

                                                              

                                      // --------------------------------

                                      // ---- Draw Object ---------------

                                      //glBindTexture(GL_TEXTURE_2D, pTextures[i]);

 

                                      //for ( int j=0 ; j<obj->numFaces ; j++){

 

                                     

                                      glBegin(GL_TRIANGLES);

                                            //glColor3f(mat_d[0],mat_d[1],mat_d[2]);                                     

                                            glNormal3f( obj->pVNormals[obj->pFaces[j*3]].x, obj->pVNormals[obj->pFaces[j*3]].y, obj->pVNormals[obj->pFaces[j*3]].z);

                                            //glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3]].x, obj->pTVertices[obj->pTFaces[j*3]].y);

                                            glVertex3f(obj->pLVertices[obj->pFaces[j*3]].x, obj->pLVertices[obj->pFaces[j*3]].y, obj->pLVertices[obj->pFaces[j*3]].z);

                                     

                                            glNormal3f( obj->pVNormals[obj->pFaces[j*3+1]].x, obj->pVNormals[obj->pFaces[j*3+1]].y, obj->pVNormals[obj->pFaces[j*3+1]].z);

                                            //glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3+1]].x, obj->pTVertices[obj->pTFaces[j*3+1]].y);

                                            glVertex3f(obj->pLVertices[obj->pFaces[j*3+1]].x, obj->pLVertices[obj->pFaces[j*3+1]].y, obj->pLVertices[obj->pFaces[j*3+1]].z);

                                     

                                            glNormal3f( obj->pVNormals[obj->pFaces[j*3+2]].x, obj->pVNormals[obj->pFaces[j*3+2]].y, obj->pVNormals[obj->pFaces[j*3+2]].z);

                                            //glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3+2]].x, obj->pTVertices[obj->pTFaces[j*3+2]].y);

                                            glVertex3f(obj->pLVertices[obj->pFaces[j*3+2]].x, obj->pLVertices[obj->pFaces[j*3+2]].y, obj->pLVertices[obj->pFaces[j*3+2]].z);

                                      glEnd();

                               

                                     

                                            glVertexPointer(3,GL_FLOAT,4*sizeof(GLfloat),obj->pLVertices);

                                            glNormalPointer(GL_FLOAT,4*sizeof(GLfloat),obj->pVNormals );

                        

                        

                                            glDrawElements(GL_TRIANGLES,obj->numFaces*3, GL_UNSIGNED_INT, obj->pFaces);

 

 

                                     

                                      glPopMatrix();

                        

                        

                                }

                                delete [] pBBox;

                        

                         }

                         //glDisable(GL_TEXTURE_2D);

                         //cout << "\nHas Texture";

                   }

             }

      }

 

      glPopMatrix();

     

      */

 

      glDisable(GL_LIGHTING);

 

      if ( bDrawOctree ){

             glColor3f(1.0f,1.0f,1.0f);

             pOctree->DrawDebug();

      }

 

      glLoadIdentity();

      gluLookAt(0.0f,0.0f,-11.0f, 0.0f,0.0f,1.0f, 0.0f,1.0f,0.0f);

      QueryPerformanceCounter( &( timings.m_end_clk ) );

 

      timings.m_interval_sum.HighPart = timings.m_end_clk.HighPart - timings.m_start_clk.HighPart;

      timings.m_interval_sum.LowPart  = timings.m_end_clk.LowPart  - timings.m_start_clk.LowPart;

 

      timings.m_start_clk = timings.m_end_clk;

     

       _gcvt( ( 1.0f/(double)timings.m_interval_sum.LowPart) *(double)freq.LowPart, 7, buffer );

       //_gcvt( (double)totalVertices, 7, buffer );

     

     

      output(0.0f,0.0f,buffer);

     

      if ( isPlayingback ){

             bfile << buffer << endl;

      }

      //cout << "\nNumber of objects drawn = " << numObjects;

     

  glutSwapBuffers();

     

}

 

void idle(void){

 

      glutPostRedisplay();

     

}

 

 

void initGL(void)

{

      g_NumVisibleNodes=0;

      g_NumNodes=0;

      g_DisplayListCtr=0;

      totalVertices = 0;   

      pScene = new SceneGraph();

 

      cout << "Reading Scene Graph file \n" << endl;

  pScene->read("new.JDN");

      cout << "Setting Materials \n" << endl;

  pScene->SetupMaterials();

      cout <<"Checking for textures\n"<< endl;

  pScene->CheckMatForTextures();

     

 

     

     

      cout << "\nScene Statistics\n"<< endl;

      cout << "Num of geometric objects = " << pScene->numGeoObjects << "\n"<< endl;

      cout << "Num of lights                                   = " << pScene->numLights << "\n"<< endl;

      cout << "Num of materials                             = " << pScene->numMaterials << "\n"<< endl;

     

      for ( int i=0 ; i<pScene->numGeoObjects ; i++){

             totalVertices += pScene->pGeoObjects[i].numVertices;

      }

 

      pCamera = new JDNCamera();

      pBBox = new Vector4f[8];

 

      Vector4f centre = pScene->CalculateSceneCentre();

      float width = pScene->CalculateSceneWidth();

 

      cout << "Centre = " << centre.x << " " << centre.y << " " << centre.z << endl;

      cout << "Width = " << width;

      cout <<"\n\n" << endl;

     

     

 

      pOctree = new Octree(pScene,0,centre,width);

      int* sceneIndex = new int[pScene->numGeoObjects];

      for ( int z=0; z<pScene->numGeoObjects ; z++){

             sceneIndex[z]=z;

      };

      pOctree->CreateNode(pScene->numGeoObjects , sceneIndex);

      cout << "Number of Octree Nodes = " << g_NumNodes << endl;

     

      ptrOcclusionQueries = new GLuint[g_NumNodes];

      pptrNodes = new Octree*[g_NumNodes];

      pOctree->CreateAABBDisplayList();

 

     

 

 

      bDrawOctree = true;

      bDrawScene    = true;

 

      pScene->SetupVAR();

 

      glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV);

      glEnableClientState(GL_VERTEX_ARRAY);

      glEnableClientState(GL_NORMAL_ARRAY);

      glEnableClientState(GL_INDEX_ARRAY);

 

  /* Enable a single OpenGL light. */

  GLfloat light0_ambient[] = { 0.2f,  0.2f,  0.2f,  1.0f};

  GLfloat light0_diffuse[] = { 1.0f,  1.0f,  1.0f,  1.0f};

  GLfloat light0_specular[] = { 1.0f,  1.0f,  1.0f,  1.0f};

  GLfloat light0_position[] = {-120000.0f, 40000.0f, -28000.0f, 1.0f};

  GLfloat light0_direction[] = {-100, -100.0, -100.0};

 

  GLfloat mat_ambient[] = { 0.7f, 0.7f,  0.7f,  1.0f};

  GLfloat mat_diffuse[] = { 1.0f,  0.5f,  0.8f,  1.0f};

  GLfloat mat_specular[] = { 1.0f,  1.0f,  1.0f,  1.0f};

      GLfloat mat_shininess[] = { 50.0f};

      GLfloat no_mat[] = { 0.0f,  0.0f,  0.0f,  1.0f};

 

      glLightfv(GL_LIGHT0, GL_AMBIENT,  light0_ambient);

      glLightfv(GL_LIGHT0, GL_DIFFUSE,  light0_diffuse);

      glLightfv(GL_LIGHT0, GL_SPECULAR,  light0_specular);

      glLightfv(GL_LIGHT0, GL_POSITION,  light0_position);

      //glLightfv(GL_LIGHT0, GL_SPOT_DIRECTION, light0_direction);

 

      glMaterialfv( GL_FRONT, GL_AMBIENT, mat_ambient );

      glMaterialfv( GL_FRONT,  GL_DIFFUSE, no_mat );

      glMaterialfv( GL_FRONT, GL_SPECULAR, mat_specular );

      glMaterialfv( GL_FRONT,  GL_SHININESS, mat_shininess );

      glMaterialfv( GL_FRONT,  GL_EMISSION, no_mat);

     

 

      glEnable(GL_LIGHTING);

      glEnable(GL_LIGHT0);

 

 

  /* Use depth buffering for hidden surface elimination. */

  glEnable(GL_DEPTH_TEST);

      glEnable(GL_NORMALIZE);

      glEnable(GL_CULL_FACE);

 

      glCullFace(GL_BACK);

 

  /* Setup the view of the cube. */

  glMatrixMode(GL_PROJECTION);

  gluPerspective( /* field of view in degree */ 45.0,

    /* aspect ratio */ 1.33,

    /* Z near */ 10.0, /* Z far */ 250000.0);

 

  QueryPerformanceFrequency( & freq );

 

 

     

 

}

 

void reshape(int w, int h){

     

  glMatrixMode(GL_PROJECTION);

  glLoadIdentity();

      gluPerspective( /* field of view in degree */ 45.0,

    /* aspect ratio */ w/h,

    /* Z near */ 10.0, /* Z far */ 250000.0);

  glMatrixMode(GL_MODELVIEW);

      glViewport(0, 0, w, h);

      iWidth = w; iHeight = h;

}

 

void cleanUp(){

      //if ( pScene ) delete pScene;

      if ( bfile.is_open() ) bfile.close();

      if ( ofile.is_open() ) ofile.close();

      if ( ifile.is_open() ) ifile.close();

}

 

static void

key(unsigned char c, int x, int y)

{

      switch (c){

 

             case 27:

                   cleanUp();

                  

                   exit(0);  /* IRIS GLism, Escape quits. */

                   break;

             case 'w':

                   pCamera->move(1000.5);

                   break;

             case 's':

                   pCamera->move(-1000.5);

                   break;

             case 'a':

                   pCamera->strafe(-1000.5);

                   break;

             case 'd':

                   pCamera->strafe(1000.5);

                   break;

             case 'o':

                   bDrawOctree = !bDrawOctree;

                   break;

             case 'p':

                   bDrawScene = !bDrawScene;

                   break;

             case 'z':

                   recordfunc();

                   break;

             case 'x':

                   playbackfunc(1);

                   break;

             case 'b':

                   playbackfunc(1);

                   break;

             case 'n':

                   playbackfunc(2);

                   break;

             case 'm':

                   playbackfunc(3);

                   break;

      }

  glutPostRedisplay();

}

 

static void leftMotion (int x, int y){

     

      float xRot, yRot;

      yRot = (float)(x-lastX)/500.0f;

      xRot = (float)(y-lastY)/500.0f;

 

      Vector4f axis, vuv, vpn;

      vuv = pCamera->VUV();

      vpn = pCamera->VPN();

      axis = Vector4f::cross(vuv,vpn);

     

      pCamera->rotate(-yRot, Vector4f(0.0f,1.0f,0.0f));

      pCamera->update();

      pCamera->rotate(xRot, axis);

 

      lastX = x;

      lastY = y;

}

 

static void rightMotion (int x, int y){

 

 

 

}

 

static void mouseCallback (int button, int state, int x, int y){

 

  if (state==GLUT_DOWN) {

 

  /* A button is being pressed. Set the correct motion function */

 

    if (button==GLUT_LEFT){

 

                   lastX = x;

                   lastY = y;

                   glutMotionFunc (leftMotion);

 

             }

      //else if (button==GLUT_RIGHT)

 

        // glutMotionFunc (rightMotion);

 

  }

 

}

 

 

 

static GLboolean QueryExtension(char* extName){

 

      GLubyte* ext = (GLubyte* )glGetString(GL_EXTENSIONS);

      char *p = (char *)ext;

      char *end = p + strlen(p);

      while (p < end){

             int n = strcspn(p, " ");

             if ((strlen(extName)==n) && (strncmp(extName,p,n)==0)){

                   return GL_TRUE;

             }

             p += (n+1);

      }

      return GL_FALSE;

}

 

 

 

int main(int argc, char **argv)

{

 

  isRecording = false;

      isPlayingback = false;

 

      glutInit(&argc, argv);

  //glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);

      glutInitDisplayString("rgb double depth=32 ");

  glutInitWindowSize(640,480);

  glutCreateWindow("Jean-Daniel Nahmias Msc VIVE Massive Model Rendering");

     

 

      if ( !QueryExtension("GL_NV_occlusion_query") ){

             cout << "Your Graphics card does not support occlusion extension" << endl;

             return 0;

      }

      if ( !QueryExtension("GL_NV_fence") ){

             cout << "Your Graphics card does not support fence extension" << endl;

             return 0;

      }

      if ( !QueryExtension("GL_NV_vertex_array_range") ){

             cout << "Your Graphics card does not support VAR extension" << endl;

             return 0;

      }

     

      //ofstream bfile = ofstream("b.txt");

      //bfile = ofstream("b.txt");

      //bfile.open("b.txt");

     

     

 

 

 

      setupExt();

  initGL();

 

 

      glutDisplayFunc(drawGL);

  glutReshapeFunc(reshape);

      glutKeyboardFunc(key);

      glutMouseFunc(mouseCallback);

      glutMotionFunc(leftMotion);

      glutIdleFunc(idle);

     

      int ctr=0;

      bool add;

      cout << "Number of Materials = " << pScene->numMaterials << endl;

      for ( int i=0 ; i<pScene->numMaterials-1 ; i++){

             add = true;

             for ( int j=(i+1) ; j<pScene->numMaterials ; j++){

                   if ( pScene->pMaterials[i]==pScene->pMaterials[j] ){

                         add = false;

                         break;

                   }

             }

             if ( add ) ctr++;

      }

      cout << "Number of unique Material = " << ctr << endl;

 

      /*

  GLubyte* ext = (GLubyte* )glGetString(GL_EXTENSIONS);

      while ( *ext != '\0' ){

             char t = (char)*ext;

             ext++;

             if ( t != ' ' ) cout << t; else cout << endl;

            

      }

      */

     

  glutMainLoop();

      return 0;             /* ANSI C requires main to return int. */

}

 

Oct-Tree.h

 

// Octree.h: interface for the Octree class.

//

//////////////////////////////////////////////////////////////////////

 

#ifndef _OCTREE_H

#define _OCTREE_H

 

#include "MassiveMain.h"

 

#ifdef _WIN32

#include <windows.h>        // Header File For Windows

#endif

 

#include <GL\gl.h>               // Header File For The OpenGL32 Library

#include <GL\glu.h>                   // Header File For The GLu32 Library

#include <GL\glaux.h>         // Header File For The Glaux Library

#include <GL\glut.h>

#include <iostream.h>

#include "WGLEXT.h"

#include "glext.h"

 

#include "SceneGraph.h"

#include "JDNGeoObject.h"

#include "Vector4.h"

#include "Matrix4.h"

#include "ViewFrustum.h"

 

#define MAXRECURSION 32

#define MAXTRIANGLES 2000

 

extern PFNGLGENOCCLUSIONQUERIESNVPROC                 glGenOcclusionQueriesNV;

extern PFNGLDELETEOCCLUSIONQUERIESNVPROC      glDeleteOcclusionQueriesNV;

extern PFNGLISOCCLUSIONQUERYNVPROC                        glIsOcclusionQueryNV;

extern PFNGLBEGINOCCLUSIONQUERYNVPROC                 glBeginOcclusionQueryNV;

extern PFNGLENDOCCLUSIONQUERYNVPROC                    glEndOcclusionQueryNV;

extern PFNGLGETOCCLUSIONQUERYIVNVPROC                 glGetOcclusionQueryivNV;

extern PFNGLGETOCCLUSIONQUERYUIVNVPROC        glGetOcclusionQueryuivNV;

 

extern PFNGLGENFENCESNVPROC                                                   glGenFencesNV;

extern PFNGLDELETEFENCESNVPROC                                 glDeleteFencesNV;

extern PFNGLSETFENCENVPROC                                                glSetFenceNV;

extern PFNGLTESTFENCENVPROC                                              glTestFenceNV;

extern PFNGLFINISHFENCENVPROC                                            glFinishFenceNV;

extern PFNGLISFENCENVPROC                                                         glIsFenceNV;

extern PFNGLGETFENCEIVNVPROC                                            glGetFenceivNV;

 

extern PFNGLVERTEXARRAYRANGENVPROC                    glVertexArrayRangeNV;

extern PFNGLFLUSHVERTEXARRAYRANGENVPROC         glFlushVertexArrayRangeNV;

 

extern PFNWGLALLOCATEMEMORYNVPROC                            wglAllocateMemoryNV;

 

extern int g_DisplayListCtr;

extern int g_NumNodes;

extern int g_NumVisibleNodes;

 

 

enum eOctreeNodes

{

      TOP_LEFT_FRONT,             // 0

      TOP_LEFT_BACK,               // 1

      TOP_RIGHT_BACK,                   // etc...

      TOP_RIGHT_FRONT,

      BOTTOM_LEFT_FRONT,

      BOTTOM_LEFT_BACK,

      BOTTOM_RIGHT_BACK,

      BOTTOM_RIGHT_FRONT

};

 

class Octree 

{

public:

      Octree(SceneGraph* pScene, int depth, Vector4f& centre, float width){

             m_pScene = pScene;

             m_depth  = depth;

             m_centre = centre;

             m_width  = width;

             m_bSubDivided = false;

             m_BBoxVertices[0] = Vector4f(centre.x - width, centre.y - width , centre.z - width);

             m_BBoxVertices[1] = Vector4f(centre.x + width, centre.y - width , centre.z - width);

             m_BBoxVertices[2] = Vector4f(centre.x + width, centre.y - width , centre.z + width);

             m_BBoxVertices[3] = Vector4f(centre.x - width, centre.y - width , centre.z + width);

             m_BBoxVertices[4] = Vector4f(centre.x - width, centre.y + width , centre.z - width);

             m_BBoxVertices[5] = Vector4f(centre.x + width, centre.y + width , centre.z - width);

             m_BBoxVertices[6] = Vector4f(centre.x + width, centre.y + width , centre.z + width);

             m_BBoxVertices[7] = Vector4f(centre.x - width, centre.y + width , centre.z + width);

             m_Life = 0;

      };

 

      virtual ~Octree(){};

 

      void CreateNode(int numParentObjects, int* parentObjectsIndices){

             int tmp, ctr, tri;

             tmp = ctr = tri = 0;

            

             m_NodeID = g_NumNodes+10000;

             g_NumNodes++;

 

             // Iterate through each of the parent objects and find the

             // number of object that intersect with the child, add intersecting objects index

             // to child vector and calculate the total number of intersecting triangles with

             // child

             for ( int i=0 ; i<numParentObjects ; i++){

                   tmp = IntersectNode(m_pScene->pGeoObjects[parentObjectsIndices[i]]);

                   tri += tmp;

                   if ( tmp != 0 ){

                         ctr++;

                   }

             }

 

             m_numIntersectingObjects = ctr;

             m_pObjectIndices = new int[ctr];

             m_pObjectNumTri    = new int[ctr];

             m_numTriangles = tri;

 

            

             ctr = 0;

             for ( int j=0 ; j<numParentObjects ; j++){

                   tmp = IntersectNode(m_pScene->pGeoObjects[parentObjectsIndices[j]]);

                   if ( tmp != 0 ){

                         m_pObjectIndices[ctr] = parentObjectsIndices[j];

                         m_pObjectNumTri[ctr]  = tmp;

                         ctr++;

                   }

             }

            

            

             // If the depth has not reached maximum and number of intersecting triangles

             // is greater than the maximum allowed triangle count for each node

             // subdived node

 

             if ( tri > MAXTRIANGLES && m_depth < MAXRECURSION ){

                   SubDivideNode();

             }else{

                   if ( tri > 0 ) AssignFaces();

             }

      };

 

      void DrawDebug(void){

                        

             glBegin(GL_LINES);

                   glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y, m_BBoxVertices[0].z);

                   glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y, m_BBoxVertices[1].z);

                   glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y, m_BBoxVertices[1].z);

                   glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y, m_BBoxVertices[2].z);

                   glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y, m_BBoxVertices[2].z);

                   glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y, m_BBoxVertices[3].z);

                   glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y, m_BBoxVertices[3].z);

                   glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y, m_BBoxVertices[0].z);

                   glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y, m_BBoxVertices[0].z);

                   glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y, m_BBoxVertices[4].z);

                   glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y, m_BBoxVertices[4].z);

                   glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y, m_BBoxVertices[5].z);

                   glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y, m_BBoxVertices[5].z);

                   glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y, m_BBoxVertices[6].z);

                   glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y, m_BBoxVertices[6].z);

                   glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y, m_BBoxVertices[7].z);

                   glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y, m_BBoxVertices[7].z);

                   glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y, m_BBoxVertices[4].z);

                   glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y, m_BBoxVertices[1].z);

                   glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y, m_BBoxVertices[5].z);

                   glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y, m_BBoxVertices[2].z);

                   glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y, m_BBoxVertices[6].z);

                   glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y, m_BBoxVertices[3].z);

                   glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y, m_BBoxVertices[7].z);

             glEnd();

     

             if ( m_bSubDivided ){

                   m_octreeNodes[TOP_LEFT_FRONT]->DrawDebug();          

                   m_octreeNodes[TOP_LEFT_BACK]->DrawDebug();            

                   m_octreeNodes[TOP_RIGHT_BACK]->DrawDebug();          

                   m_octreeNodes[TOP_RIGHT_FRONT]->DrawDebug();  

                   m_octreeNodes[BOTTOM_LEFT_FRONT]->DrawDebug();

                   m_octreeNodes[BOTTOM_LEFT_BACK]->DrawDebug();    

                   m_octreeNodes[BOTTOM_RIGHT_BACK]->DrawDebug();

                   m_octreeNodes[BOTTOM_RIGHT_FRONT]->DrawDebug();

             }

     

             return;

      };

     

      void CullOctree(Octree** pptrNodes, ViewFrustum& viewFrustum){

             // Check if the current node is in our frustum

                  

             if( !viewFrustum.isCubeInFrustum( m_BBoxVertices ) ){

                   m_Life = 0;

                   return;

             }

            

            

             // Check if this node is subdivided. If so, then we need to recurse and draw it's nodes

             if( m_bSubDivided )

             {

                   // Recurse to the bottom of these nodes and draw the end node's vertices

                   // Like creating the octree, we need to recurse through each of the 8 nodes.

                   m_octreeNodes[TOP_LEFT_FRONT]->CullOctree(pptrNodes, viewFrustum);           

                   m_octreeNodes[TOP_LEFT_BACK]->CullOctree(pptrNodes, viewFrustum);       

                   m_octreeNodes[TOP_RIGHT_BACK]->CullOctree(pptrNodes, viewFrustum);           

                   m_octreeNodes[TOP_RIGHT_FRONT]->CullOctree(pptrNodes, viewFrustum);   

                   m_octreeNodes[BOTTOM_LEFT_FRONT]->CullOctree(pptrNodes, viewFrustum);

                   m_octreeNodes[BOTTOM_LEFT_BACK]->CullOctree(pptrNodes, viewFrustum);     

                   m_octreeNodes[BOTTOM_RIGHT_BACK]->CullOctree(pptrNodes, viewFrustum);

                   m_octreeNodes[BOTTOM_RIGHT_FRONT]->CullOctree(pptrNodes, viewFrustum);

             }

             else

             {

                   pptrNodes[g_NumVisibleNodes] = this;

                   g_NumVisibleNodes++;

             }

      }

     

      void NaiveDraw(ViewFrustum& viewFrustum){

            

             // Check if the current node is in our frustum

                  

             if( !viewFrustum.isCubeInFrustum( m_BBoxVertices ) ){

                   return;

             }

            

            

             // Check if this node is subdivided. If so, then we need to recurse and draw it's nodes

             if( m_bSubDivided )

             {

                   // Recurse to the bottom of these nodes and draw the end node's vertices

                   // Like creating the octree, we need to recurse through each of the 8 nodes.

                   m_octreeNodes[TOP_LEFT_FRONT]->NaiveDraw(viewFrustum);          

                   m_octreeNodes[TOP_LEFT_BACK]->NaiveDraw(viewFrustum);           

                   m_octreeNodes[TOP_RIGHT_BACK]->NaiveDraw(viewFrustum);         

                   m_octreeNodes[TOP_RIGHT_FRONT]->NaiveDraw(viewFrustum); 

                   m_octreeNodes[BOTTOM_LEFT_FRONT]->NaiveDraw(viewFrustum);

                   m_octreeNodes[BOTTOM_LEFT_BACK]->NaiveDraw(viewFrustum);    

                   m_octreeNodes[BOTTOM_RIGHT_BACK]->NaiveDraw(viewFrustum);

                   m_octreeNodes[BOTTOM_RIGHT_FRONT]->NaiveDraw(viewFrustum);

             }

             else

             {

                   DrawNode();

             }

            

      };

 

      void DrawAABB(void){

             m_Life = 5;

             glCallList(m_NodeID);

 

      }

     

      void CreateAABBDisplayList(void){

 

             if( m_bSubDivided )

             {

                   // Recurse to the bottom of these nodes and draw the end node's vertices

                   // Like creating the octree, we need to recurse through each of the 8 nodes.

                   m_octreeNodes[TOP_LEFT_FRONT]->CreateAABBDisplayList();           

                   m_octreeNodes[TOP_LEFT_BACK]->CreateAABBDisplayList();      

                   m_octreeNodes[TOP_RIGHT_BACK]->CreateAABBDisplayList();          

                   m_octreeNodes[TOP_RIGHT_FRONT]->CreateAABBDisplayList();  

                   m_octreeNodes[BOTTOM_LEFT_FRONT]->CreateAABBDisplayList();

                   m_octreeNodes[BOTTOM_LEFT_BACK]->CreateAABBDisplayList();     

                   m_octreeNodes[BOTTOM_RIGHT_BACK]->CreateAABBDisplayList();

                   m_octreeNodes[BOTTOM_RIGHT_FRONT]->CreateAABBDisplayList();

             }

             else

             {

                   glNewList(m_NodeID,GL_COMPILE);

                         glBegin(GL_QUADS);

                                glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);

                               glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);

                                glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);

                                glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);

 

                                glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);

                                glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);

                                glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);

                                glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);

 

                                glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);

                                glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);

                                glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);

                                glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);

 

                                glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);

                                glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);

                                glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);

                                glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);

 

                                glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);

                                glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);

                                glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);

                                glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);

 

                                glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);

                                glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);

                                glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);

                                glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);

                        

 

 

                         glEnd();

                   glEndList();

             }

             return;

      }

 

 

      void DrawNode(void){

 

             if ( m_numIntersectingObjects > 0 ){

                  

                   for ( int i=0 ; i<m_numIntersectingObjects ; i++){

 

                         JDNGeoObject* obj = &(m_pScene->pGeoObjects[m_pObjectIndices[i]]);

                         JDNMaterial*     mat = &(m_pScene->pMaterials[obj->matID]);

 

                         int numFaces = m_pObjectNumTri[i];

                        

                         /*

                         GLfloat no_mat[] = { 0.0,  0.0,  0.0,  1.0};

 

                         GLfloat mat_a[4];

                         GLfloat mat_d[4];

                         GLfloat mat_s[4];

                         GLfloat mat_shine[1];

 

                         mat_a[0] = (GLfloat)mat->ambient.x;

                         mat_a[1] = (GLfloat)mat->ambient.y;

                         mat_a[2] = (GLfloat)mat->ambient.z;

                         mat_a[3] = 1.0;

 

                         mat_d[0] = (GLfloat)mat->diffuse.x;

                         mat_d[1] = (GLfloat)mat->diffuse.y;

                         mat_d[2] = (GLfloat)mat->diffuse.z;

                          mat_d[3] = 1.0;

 

                         mat_s[0] = (GLfloat)mat->specular.x;

                         mat_s[1] = (GLfloat)mat->specular.y;

                         mat_s[2] = (GLfloat)mat->specular.z;

                         mat_s[3] = 1.0;

 

                         mat_shine[1] = 0.0;

 

                         glMaterialfv( GL_FRONT_AND_BACK, GL_AMBIENT, mat_a );

                         glMaterialfv( GL_FRONT_AND_BACK, GL_DIFFUSE, mat_d );

                         glMaterialfv( GL_FRONT_AND_BACK, GL_SPECULAR, mat_s );

                         glMaterialfv( GL_FRONT_AND_BACK, GL_SHININESS, mat_shine );

                         glMaterialfv( GL_FRONT_AND_BACK, GL_EMISSION, no_mat);

                         */

 

                         glCallList(mat->id);

 

                         glVertexPointer(3,GL_FLOAT,6*sizeof(GLfloat),m_pScene->m_ptrVertices+(obj->m_offset*6));

                         glNormalPointer(GL_FLOAT,6*sizeof(GLfloat),m_pScene->m_ptrVertices+(obj->m_offset*6)+3);

                         //glIndexPointer(GL_UNSIGNED_INT,0,m_pGeoObjectsFaceIdx[i]);

                        

                         glDrawElements(GL_TRIANGLES,numFaces*3, GL_UNSIGNED_INT, m_pGeoObjectsFaceIdx[i]);

                   }

             }

             return;

      };

 

      int                            m_Life;

 

 

private:

 

      void SubDivideNode(void){

            

             m_bSubDivided = true;

 

             // Create children nodes

             m_octreeNodes[TOP_LEFT_FRONT]                   = new Octree(m_pScene, ++m_depth, GetCentre(TOP_LEFT_FRONT), m_width/2.0f);

             m_octreeNodes[TOP_LEFT_BACK]              = new Octree(m_pScene, ++m_depth, GetCentre(TOP_LEFT_BACK), m_width/2.0f);

             m_octreeNodes[TOP_RIGHT_BACK]                  = new Octree(m_pScene, ++m_depth, GetCentre(TOP_RIGHT_BACK), m_width/2.0f);

             m_octreeNodes[TOP_RIGHT_FRONT]          = new Octree(m_pScene, ++m_depth, GetCentre(TOP_RIGHT_FRONT), m_width/2.0f);

             m_octreeNodes[BOTTOM_LEFT_FRONT]     = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_LEFT_FRONT), m_width/2.0f);

             m_octreeNodes[BOTTOM_LEFT_BACK]             = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_LEFT_BACK), m_width/2.0f);

             m_octreeNodes[BOTTOM_RIGHT_BACK]    = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_RIGHT_BACK), m_width/2.0f);

             m_octreeNodes[BOTTOM_RIGHT_FRONT]   = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_RIGHT_FRONT), m_width/2.0f);

 

             m_octreeNodes[TOP_LEFT_FRONT]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[TOP_LEFT_BACK]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[TOP_RIGHT_BACK]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[TOP_RIGHT_FRONT]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[BOTTOM_LEFT_FRONT]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[BOTTOM_LEFT_BACK]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[BOTTOM_RIGHT_BACK]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

             m_octreeNodes[BOTTOM_RIGHT_FRONT]->CreateNode(m_numIntersectingObjects, m_pObjectIndices);

 

             return;

      };

 

      Vector4f GetCentre(int node){

             switch ( node ){

            

                   case TOP_LEFT_FRONT:   

                         return Vector4f(m_centre.x-(m_width/2)     ,m_centre.y+(m_width/2), m_centre.z+(m_width/2) );

                         break;

                   case TOP_LEFT_BACK:                  

                         return Vector4f(m_centre.x-(m_width/2)     ,m_centre.y+(m_width/2), m_centre.z-(m_width/2) );

                         break;

                   case TOP_RIGHT_BACK:

                         return Vector4f(m_centre.x+(m_width/2)    ,m_centre.y+(m_width/2), m_centre.z-(m_width/2) );

                         break;

                   case TOP_RIGHT_FRONT:

                         return Vector4f(m_centre.x+(m_width/2)    ,m_centre.y+(m_width/2), m_centre.z+(m_width/2) );

                         break;

                   case BOTTOM_LEFT_FRONT:

                         return Vector4f(m_centre.x-(m_width/2)     ,m_centre.y-(m_width/2), m_centre.z+(m_width/2) );

                         break;

                   case BOTTOM_LEFT_BACK:

                         return Vector4f(m_centre.x-(m_width/2)     ,m_centre.y-(m_width/2), m_centre.z-(m_width/2) );

                         break;

                   case BOTTOM_RIGHT_BACK:

                         return Vector4f(m_centre.x+(m_width/2)    ,m_centre.y-(m_width/2), m_centre.z-(m_width/2) );

                         break;

                   case BOTTOM_RIGHT_FRONT:

                         return Vector4f(m_centre.x+(m_width/2)    ,m_centre.y-(m_width/2), m_centre.z+(m_width/2) );

                         break;

             }

             return Vector4f(6.666f,6.666f,6.666f);

      };

 

      void AssignFaces(void){

             m_pGeoObjectsFaceIdx = new GLuint*[m_numIntersectingObjects];

            

             // Allocate memory for FaceSet of each intersecting object

             for ( int i=0 ; i<m_numIntersectingObjects ; i++){

                   m_pGeoObjectsFaceIdx[i] = new GLuint[m_pObjectNumTri[i]*3];

             }

            

             Vector4f* verts = new Vector4f[3];

             for ( int j=0 ; j<m_numIntersectingObjects ; j++){

                   JDNGeoObject* obj = &(m_pScene->pGeoObjects[m_pObjectIndices[j]]);

                  

                   int ctr=0;

                   for ( int k=0 ; k<obj->numFaces ; k++ ){

                        

                         // If any of the current face's vertex is in node add to the node's faceSet

                         /*

                         if ( isVertexInNode(obj->pVertices[obj->pFaces[k*3]])   ||

                                       isVertexInNode(obj->pVertices[obj->pFaces[k*3+1]]) ||

                                       isVertexInNode(obj->pVertices[obj->pFaces[k*3+2]]) ) {

                                m_pGeoObjectsFaceIdx[j][ctr] = obj->pFaces[k*3];

                                m_pGeoObjectsFaceIdx[j][ctr+1] = obj->pFaces[k*3+1];

                                m_pGeoObjectsFaceIdx[j][ctr+2] = obj->pFaces[k*3+2];

                                ctr += 3;

                         }

                         */

                         verts[0] = obj->pVertices[obj->pFaces[k*3]];

                         verts[1] = obj->pVertices[obj->pFaces[k*3+1]];

                         verts[2] = obj->pVertices[obj->pFaces[k*3+2]];

 

                         if ( faceIntersectNode(verts) ){

                                m_pGeoObjectsFaceIdx[j][ctr] =   obj->pFaces[k*3];

                                m_pGeoObjectsFaceIdx[j][ctr+1] = obj->pFaces[k*3+1];

                                m_pGeoObjectsFaceIdx[j][ctr+2] = obj->pFaces[k*3+2];

                                ctr += 3;

                         }

 

 

                   }

             }

             delete [] verts;

      };

 

 

      // Test object against node and return the number of intersecting triangles

      int IntersectNode(JDNGeoObject& obj){

             int num=0;

             Vector4f* verts = new Vector4f[3];

             for ( int j=0 ; j<obj.numFaces ; j++){

                   /*

                   if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3]]) ){

                         num++;

                         continue;

                   }

                   if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3+1]]) ){

                         num++;

                         continue;

                   }

                   if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3+2]]) ){

                         num++;

                         continue;

                   }

                   */

                   verts[0] = obj.pVertices[obj.pFaces[j*3]];

                   verts[1] = obj.pVertices[obj.pFaces[j*3+1]];

                   verts[2] = obj.pVertices[obj.pFaces[j*3+2]];

                   if ( faceIntersectNode(verts) ) num++;

 

             }

             delete [] verts;

             return num;

      };

 

      bool faceIntersectNode(Vector4f* verts){

             // Test to see if any verts are in Node

             if ( isVertexInNode(verts[0]) ) return true;

             if ( isVertexInNode(verts[1]) ) return true;

             if ( isVertexInNode(verts[2]) ) return true;

 

             /*                      

             // Test each edge of triangle

             Vector4f quad[3];

             Vector4f e1    = verts[1] - verts[0];

             Vector4f e2    = verts[2] - verts[0];

             Vector4f e3    = verts[2] - verts[1];

             float t        = 0.0f;

 

             // Test edges against 1st quad

             quad[0]                    = m_BBoxVertices[2];

             quad[1]                    = m_BBoxVertices[3];

             quad[2]                    = m_BBoxVertices[0];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

             // Test edges against 2nd quad

             quad[0]                    = m_BBoxVertices[6];

             quad[1]                    = m_BBoxVertices[5];

             quad[2]                    = m_BBoxVertices[4];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

             // Test edges against 3rd quad

             quad[0]                    = m_BBoxVertices[6];

             quad[1]                    = m_BBoxVertices[2];

             quad[2]                    = m_BBoxVertices[1];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

             // Test edges against 4th quad

             quad[0]                    = m_BBoxVertices[5];

             quad[1]                    = m_BBoxVertices[1];

             quad[2]                    = m_BBoxVertices[0];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

             // Test edges against 5th quad

             quad[0]                    = m_BBoxVertices[4];

             quad[1]                    = m_BBoxVertices[0];

             quad[2]                    = m_BBoxVertices[3];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

             // Test edges against 6th quad

             quad[0]                    = m_BBoxVertices[7];

             quad[1]                    = m_BBoxVertices[3];

             quad[2]                    = m_BBoxVertices[2];

            

             t = JDNGeoObject::quadRay(verts[0], e1, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[0], e2, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

             t = JDNGeoObject::quadRay(verts[1], e3, quad);

             if ( (t != 0.0f) && (t <= 1.0f) && (t >= -1.0f) ){

                   return true;

             }

 

 

             // Test each diagonal against face

             //delete [] quad;

             */

 

 

             return false;

      }

 

      // This function test a vertex against Node

      bool isVertexInNode(Vector4f& vert){

            

             if (  (vert.x > (m_centre.x+m_width)) ||

                                (vert.y > (m_centre.y+m_width)) ||

                                (vert.z > (m_centre.z+m_width)) ||

                                (vert.x < (m_centre.x-m_width)) ||

                                (vert.y < (m_centre.y-m_width)) ||

                                (vert.z < (m_centre.z-m_width)) ) return false;

 

             else return true;

      };

 

      GLuint**            m_pGeoObjectsFaceIdx;                              // Index of intersecting faces

     

      bool                         m_bSubDivided;                                                  // Is the Node subdivide i.e. does it have children

      int                            m_displayListID;                                            // Display List ID

      int                            m_numTriangles;                                                 // Total number of intersecting triangles

      int                            m_numIntersectingObjects;              // Number of intersecting objects

      int                            m_depth;                                                                    // Depth of node

      Vector4f           m_centre;                                                                  // Coords of centre of node

      Vector4f           m_BBoxVertices[8];                                      // Vertices of each nodes corners

      float                         m_width;                                                                    // Width from centre to Node's boundry

      Octree*                   m_octreeNodes[8];                                       // Pointer to Node children

      SceneGraph* m_pScene;                                                                   // Pointer to scene graph

 

      int*                    m_pObjectIndices;                                        // Intersecting Object indicex

      int*                    m_pObjectNumTri;                                         // Number of intersecting triangles for each intersecting object

     

      int*                    m_pDisplayListID;

      GLuint**            m_Indices;

      GLfloat**           m_Vertices;

      int           m_NodeID;      

     

};

 

#endif