S 3d Max

broken image


Time is a wonderful teacher. Unfortunately, it kills all of it's students.

Locate and open the Daz to 3ds Max Bridge, which can be found at the top of Studio under Scripts Bridges Daz to 3ds Max. Click on ‘Daz to 3ds Max' in the above path to run the script. In 3ds Max, click ‘x' on the keyboard and type ‘DaztoMax' when the search icon appears; then choose ‘Auto-Import.' 5. These are all the Free 3D models you can download at RenderHub. Just log in or sign up to start taking advantage of all the Free 3D models we have to offer. Creaform's MaxSHOT 3D lineup is a game changer for product development, manufacturing, quality control and inspection teams that need the highest measurement accuracy and repeatability as much for large‑scale projects than for parts from 2 to 10 m. Imagine achieving accuracy better than 0.015mm/m! 3ds Max Programming. Welcome to Autodesk's 3ds Max Forums. Share your knowledge, ask questions, and explore popular 3ds Max SDK, Maxscript and Python topics. 3DMark Port Royal is the world's first dedicated real-time ray tracing benchmark for gamers. You can use Port Royal to test and compare the real-time ray tracing performance of any graphics card that supports Microsoft DirectX Raytracing.

The3D Studio MAX R2 Display Architecture

Don Brittain,Ph.D., for Yost Group, Inc.

With 3D Studio MAX, overallproductivity is largely tied to a user's ability tomanipulate 3D data quickly and smoothly. Thus, almost all usersare interested in what system configuration will lead to the mostinteractive 'bang for the buck'.

This article will attempt todescribe the MAX R2 display architecture in such a way that userscan figure out what type and level of display acceleration willbest meet their needs.

Please note that the MAX R2display pipeline is a significantly enhanced superset of the MAXR1.x pipeline. Thus, if you are using MAX R1.x (or one of the 1.xderivative products, such as 3D Studio VIZ or 3D StudioApprentice), then the information here will not help. Rather, youshould consult my earlier article Shedding Light on MAX Benchmarks: AnAnalysis of Interactive Performance With 3D Studio MAX.

In order to get 3D models toappear on a 2D computer screen, there are two major computationalsteps.

  1. Convert the data from 3D 'world' space into 2D screen space
  2. Draw the resulting 2D projections on the screen.

In the case of a shaded, texturedobject, the first step involves lighting, transforming, andclipping each geometric primitive (triangles, in the case ofMAX). The second step involves filling in the projected triangles(with color ramps), interpolating depth values, and performingtexture lookup.

Each of these steps can becomputationally intensive, and either step (or both!) could bebottlenecks slowing down your use of MAX.

For future reference, Step 1 isreferred to as geometry acceleration (and it includestransforming vertices, clipping them to the 3D view volume, andcalculating the illumination at each vertex). Step 2 is referredto as rasterization. It is really a 2-and-a-halfdimensional process, in that interpolating and filling in color,depth, and texture values are operations that happen per 2D pixel(as opposed to the Step 1 calculations which happen per 3Dvertex).

Display cards handle both, one, orneither of these steps.

The simplest case is where thedisplay card doesn't accelerate either step. Such cards arecalled '2D' cards, or 'dumb' cards, orperhaps 'Windows accelerators', because they do notoffload any of the 3D processing from the main CPU. In this case,the MAX software handles all of the interactive rendering tasksexcept the actual display of the resulting image. An off-screenbitmap is created as the output of the rendering pipeline, andwhen MAX has finished filling in all the pixels, the image isblitted (copied) to the screen.

Since all display cards (evenfancy, high-end 3D cards!) can act as dumb 2D cards, we made it apriority to make this process as efficient as possible. To thisend, we use multiple threads of execution within MAX to largelyuncouple step 1 from step 2, thereby allowing two processors(within multi-processing NT systems) to work in parallel. Ineffect, what this does is allow the second CPU to act as arasterization processor, thereby allowing a 2-processor / dumbdisplay card system to emulate a single processor system with arasterization accelerator.

For completeness, it should benoted that there are other ways that this part of the pipeline isoptimized, including early rejection of out-of-view data, lazyevaluation of lighting calculations, caching of sharedcomputation results, and multiple execution threads to handle thegeometric transformations of the scene database.

This leads us naturally to thenext level of display acceleration: rasterization accelerators.

These cards handle all of Step 2above. Once the 3D data is converted into 2D device space (withcolors, and possibly depth values and texture coordinates), it ishanded over to the rasterization processor on the display card.This processor then computes all of the intermediate color anddepth values, which are recorded right into display card memory.

Most 3D cards available today areactually rasterization-only accelerators, though some of theseaccelerators can handle natural window coordinate data directly,whereas other (slower) cards require the data to be massaged intoa special (and card-specific) format before the fastrasterization hardware takes over. (This extra formatting iscalled 'triangle setup processing', and cards withhardware triangle setup naturally perform better than thosewithout.)

In order to optimize MAX'suses of all rasterization cards, we apply all the optimizationsmentioned in the last paragraph of the 2D Display Cards section,and we also convert all possible data into triangle strips, whichminimizes the communication and computational overhead of gettingdata to and through the rasterization processor. (Basically, iftriangles are sent in strips, where each new verteximplicitly defines a whole new triangle by using it incombination with the previous 2 vertices, we can send data to therasterization processor faster. This allows an optional secondCPU to work more on the Step 1 part of the pipeline.)

It should be noted that, as with2D display cards, Step 1 and Step 2 are decoupled by separatethreads so that the system can perform automatic load balancingof all available processors (one or two CPU's plus therasterization processor).

The most sophisticated (and, atthis point, rarest) type of 3D display card is one with geometryacceleration built-in. Cards in the this category are true 3Dcards, in that they can work with data in 3D space: theytransform it, clip it, light it, and rasterize it. In otherwords, they handle all of Step 1 and all of Step 2.

In order to take full advantage ofgeometry acceleration, MAX converts all appropriate scene datainto a form that is easy and efficient to feed to theaccelerator. This involves turning the 3D scene database into 3Dtriangle strips with surface normal vectors for each vertex. (Todo this efficiently, the process of constructing the normals ishandled by multiple threads, and the strip data is preserved bythe MAX dataflow pipeline.)

As with the other two scenarios,multiple execution threads are used to decouple the variouscomputational stages, so that automatic load balancing can occur.This is especially effective when MAX is generating new(procedural) data on each frame update: then each host CPU canwork on creating the data, and the geometry and rasterizationprocessors can work on displaying the data.

The natural question now, is,'which of these setups provides the fastest display path forMAX?' And the truthful (and admittedly somewhat unhelpful)answer is 'it depends'.

There are two types of bottlenecksthat can slow down the pipeline. One is computational overload,and the other is communication limitations.

For a very simple scene, such as asingle big cube and one light, the overhead of step 1 is minimal,since it only involves transforming, clipping, and lighting avery small number of vertices or triangles. But if the scene isdisplayed in a large viewport, it could potentially involvefilling in hundreds of thousands (or even millions) of pixels, sothe rasterization process (step 2) is quite expensive.

Thus, in this case, arasterization accelerator would result in much faster systemthroughput.

Kannada audio songs. Also note that if that single cubewas textured, the step 1 calculations would go up only slightly,whereas step 2 would be much more complex. (There would beadditional texture vertices to interpolate, and the actual texelvalues would have to be read and/or computed and potentiallymasked against, or blended with, the interpolated color values.)

So, in this case, a rasterizationaccelerator that supports texturing would make even a biggerdifference.

It should be noted that, despitethe fact that a single cube is too simple to be considered alikely real-life scene, this case is very much representative ofmost 3D games: geometric complexity is kept to a minimum (i.e.there is a low vertex/triangle count) and the game is made morevisually complex and interesting by extensive use of textures.Indeed, most 3D game cards are simply rasterization accelerators(as opposed to geometry accelerators), so they do, indeed,provide excellent acceleration for such scenes.

At the opposite end of thespectrum is a geometrically complex scene (say, about 1 millionvertices and triangles), displayed in a quarter-screen viewport.In this case, step 1 is quite involved, whereas step 2 is quitesimple. (Indeed, there is quite likely very little 2Dinterpolation to do, since a transformed triangle may be only onepixel in size! And besides, even if the viewport is about 500 x400 pixels in size, that 'only' amounts to 200,000pixels, which is significantly less than the amount of data step1 must deal with.)

Thus, for such a scene,rasterization acceleration is not particularly helpful, sincerasterization time is not limiting the throughput. And, indeed, ageometrically-large scene takes about the same amount of time todisplay using the HEIDI software z-buffer as with a hardwarerasterization card.

Note that this illustrates,full-circle, the symbiosis between geometrically simple /texturally complex games and existing 3D game display cards: thefact is, game cards provide no benefit at all to displayinggeometrically complex scenes (and hence it is somewhat aself-fulfilling prophecy that 3D games use simple geometry).

Finally, a quick comment on 3DAPI's vs rasterization-only API's with regard to efficiency ofhandling data. When dealing with a rasterization-only driverinterface, MAX employs a very efficienttransformation-and-lighting algorithm that results in each vertexof a mesh being transformed only once and lit at most once,independent of the structure of the mesh. (Vertices that areclipped or culled are never lit.) In contrast, 3D API's have totransform each vertex passed to them, even if that vertex hasbeen 'seen' before.

To illustrate the effect of this,suppose we have a 10,000 face / 5,000 vertex model that consistsof 200 triangle strips containing 50 triangles each. (This wouldbe considered very efficient stripping.) Then, in therasterization-only case, MAX would transform a total of 5,000vertices, whereas the 3D API would have to transform 200 (strips)x 52 (vertices per strip) = 10,400 vertices. Thus, a geometryaccelerator would have to be more than twice as fast as theinternal MAX computations just to break even!

Let's look at the extreme cases:With best-case 'perfect stripping' (1 strip with 10,000triangles), the 3D API still has to deal with 10,002 vertices(more than 2X the rasterization case), and in the worst case,where each 'strip' contains only one triangle, the 3DAPI must transform 10,000 (strips) x 3 (vertices per strip) =30,000 vertices. This is 6 times more transformationsthan MAX makes when using a rasterization-only interface!

Since geometry accelerators arespecial-purpose devices, they are usually substantially moreefficient at tranformations than the CPU-implemented code insideMAX, but this does illustrate the uphill battle to attainhigh-speed 3D nirvana.

Communication with the displayhardware is the next big performance-critical area, and thatbrings us naturally to the next section.

Let's again consider amillion-vertex scene being displayed in a 500x400 pixel viewport.If we download the scene to a geometry accelerator, we need totransfer each vertex along with a surface normal. This amounts to24 million bytes of information (4 bytes for each floating pointx, y, and z vertex value, and 4 bytes for each normal vectorcomponent). Other data (such as light position and color,material descriptions, texture map images, world space locations,etc) also have to be transferred to the accelerator card, butthis additional information is – with the exception oftexture images – relatively small. Except in the case ofanimating textures, we do not need to send texture maps to theaccelerator for each viewport update, so they can also befactored out of this analysis.

OK, so we're about to send24Meg of data to a geometry accelerator card. But how does it getthere? Unfortunately, it usually has to be transferred across thesystem bus. Now, apart from mechanical devices (like floppy, CD,and hard disk drives), the bus is one of the slowest parts of thesystem. So this is bad news!

But how bad? Well, a'high-speed' PCI bus runs at 66MHz. In practice, a busrunning at this speed is doing well if it can actually sustain atransfer rate of 20Meg per second. Thus, assuming no overhead inMAX or the display card, it will take us over 1 second just tosend the scene data to the card! So, getting a 30frames-per-second update rate on this scene ain't gonnahappen! (Note: some of the latest PCI chipsets havesignificiantly better throughput, but if you have the resourcesavailable, it is best to actually measure the sustained transferrate, rather than trust the almost-always-overstated specs.)

Note that if MAX computes all ofthe scene lighting, clipping, rasterization, etc, then only500x400x3=600,000 bytes must be transferred across the system busto produce a truecolor image on a dumb display card. Cypress at2lp rc42 software. This is 40times less data hitting the slow part of the system!

But the fact remains that geometryaccelerator cards do the lighting, clipping, rasterization, etc.much faster than the main CPU can, and while the display card isworking on those problems the CPU is freed up to work on otherparts of the scene display process. So there is still an overallwin to use a geometry accelerator, but this shows why theperformance may be somewhat limited.

Note that the amount of data sentacross the bus can also limit the overall display speed forrasterization-only cards, since it is possible for geometric datato exceed pixel data in size under these conditions too.

With an understanding of thecommunication overhead, some ways that MAX can (and does!) makethe whole display process more efficient become apparent. Hereare a few:

  • Object display culling: MAX can quickly determine whether an entire object is outside the viewport's display region, and if so, none of that object's data is processed or sent across the bus.
  • Primitive culling: For rasterization-only drivers, MAX does very efficient culling of backfacing or out-of-view triangles, thereby almost completely eliminating unnecessary data transfer.
  • Triangle strips: Rather than sending data as packets of 3 vertex/normal pairs (72 bytes per triangle), amenable data is sent in triangle strips. Even if only two triangles are combined into a 4 vertex strip, the savings are substantial. (96 bytes instead of 2x72=144 bytes, which is a 33% savings.) For very long strips, the savings approaches 67%. This also applies (with slightly different numbers) to triangle data with UV texture coordinates
  • For cards that support incremental scene updating, only data that has changed from one frame to the next needs to be sent. MAX supports this optimization at two levels, depending on the underlying hardware and driver implementations. At one level (dual planes) only changing objects are redrawn, but an extra copy of the viewport image and z-buffers must be stored in video memory (a precious commodity!) or must be moved across the system bus (slow!). At the second level, any objects that intersect those that are moving must also be redrawn, and the card must support partial viewport updates. This will be discussed in more detail in the driver section.

To conclude this section,here's some good news: The Intel AGP (Accelerated GraphicsPort) bus has been designed to help alleviate the PCI busbottleneck inherent in all 3D graphics systems. This bus has beendesigned with the transfer of graphics-specific information inmind, and it should be 'standard equipment' in most newsystems by the end of 1997. Virtually all new 3D graphicschipsets have been designed with the AGP bus in mind, so overall3D performance should improve dramatically in the near future.

There are several factors that canaffect how responsive a 3D program can feel to a user other thanthe overall rate at which triangles can be displayed on thescreen. Although these factors rarely show up in benchmarks(whether by the card manufacturers, industry trade organizations,or magazines), they can have a great affect on how usable a 3Dsystem is for a given task.

Here are a couple examples to keepin mind.

  1. When a viewport needs to be redrawn because it has been 'damaged' by another window moving across it, some systems need to re-render the entire scene, whereas others allow MAX to just re-blit the viewport image from off-screen memory. Depending on the scene and viewport size, the time it takes to do these operations can vary considerably, with scene traversal generally taking much longer than blitting for complex scenes.
  2. After a scene changes and MAX re-renders it to a full-screen view, some systems need to blit all of the image data from main system memory (across the bus) or from extra display card memory, whereas others just need to have the display monitor's image get displayed from a different (newly-updated) region of card's video memory (called 'page flipping'). The first operation (blitting) tends to be quite slow, whereas the second (page flipping) happens virtually instantly.

As you might guess by now, thesystems that allow MAX to do 1 efficiently tend to force 2 to beinefficient, and vice versa. There are, however, some low-leveldrivers that are clever enough to change how they operate on thefly so that they always work optimally with MAX.

Since updating damaged viewportscan be expensive, MAX makes sure that pop-up menus and viewporttooltips handle the repainting efficiently independent of theunderlying 3D driver. Unfortunately, moving 'heavyweight' windows (like the material editor, trackview, andvideo post) will force MAX to do a full viewport repair viaeither a blit or scene re-traversal.

Another 'quality oflife' driver issue is how much support is provided to allowMAX to perform incremental scene updates. Serial number carl zeiss binoculars. Recall from theprevious section that the fastest way for MAX to update a sceneis to render only the smallest amount of scene data –generally only that part of the scene that has changed since theprevious frame.

In order to accomplish this, theunderlying driver and display hardware has to either provide anefficient way to store a partially-rendered scene (dual planes),or a way for MAX to only paint a rectangular subset of the 3Dviewports (incremental viewport update). And just providing afunctionally-correct way to do these things is not enough, inthat moving any unnecessary data across the bus can causeunnerving 'jumps' in interactive performance, which endup making MAX harder to use, even if the average frame rateactually goes up.

This will be discussed in moredepth in the next section.

S 3d Max

MAX R2 supportsdynamically-loadable display drivers. These drivers are linkedinto MAX at runtime and allow MAX to be efficiently optimized forthe underlying display hardware.

In order to optimize the overallsystem throughput, the driver provides the high-level MAX codewith information on what sort of operations it supports. Thisallows MAX to either perform some of the interactive displaycalculations itself, or to hand off the calculations directly tothe driver.

Some driver-level decisions arespeed/quality tradeoffs (e.g. point-sampled vs. mipmapped texellookup), or affect qualitative factors regarding how the viewportimages appear (e.g. anti-aliased lines). The driver has theoption of letting the user make decisions about these issuesthrough a driver-specific configuration dialog box.

The configuration dialog box canalso allow a user to choose driver options which allow MAX to'break some rules' in order to work optimally with theunderlying hardware. This pertains, in particular, to the OpenGLdriver, where not enough information about the underlyinghardware is available through the API to have MAX make all thedecisions itself.

MAX R2 comes with three built-indrivers: HEIDI, OpenGL, and Direct3D. Details about each of thesedrivers will be presented below. (Since the drivers aredynamically loaded, other drivers may be added over time.)

HEIDI is unique in the broad rangeof 3D support it provides: it supports primitives at therasterization (device coordinate) level, at an abstract 2Dcoordinate level, and at the full 3D scene level. Moreover, itallows hardware display cards to accelerate the API at any ofthese levels.

The HEIDI driver in MAX R2 usesonly the rasterization level API at this point.

Although HEIDI is, itself,customizable through dynamically-loaded, hardware-specificdrivers, the only driver that ships with MAX R2 is the softwarez-buffer driver. This HEIDI driver is unique in that it ishardware-independent. It performs all rasterization operationsusing the main CPU and then the resulting image is blitted to thescreen.

This driver has the followingadvantages:

  • It works with all Windows display hardware
  • Scene data culling is very efficient
  • Multiprocessor systems can do very efficient load balancing
  • Dual planes and incremental viewport updates both work efficiently
  • Both 8-bit and truecolor displays are supported natively, and high-color also works (by blitting a 24-bit virtual image to a high-color framebuffer)

But it also has the followingdisadvantages:

  • Large viewports update significantly slower than small viewports
  • Texture mapped objects render much slower than non-textured objects
  • Perspective correction of textured objects is very slow, and only low-quality texture lookup is available.
  • The main CPU(s) get overloaded by doing both the 3D transformations and the rasterization
  • The driver uses lots of main system memory.

The Microsoft Direct3D APIsupports both rasterization and 3D scene level calls, although inD3D Version 5 (the only version supported by MAX at this time),the 3D calls can not be accelerated by any underlying displayhardware. Thus, as with the HEIDI driver, we chose to use onlythe rasterization-level API calls.

This driver has the followingadvantages:

  • It is efficiently supported by many inexpensive 3D display cards
  • Scene data culling is very efficient
  • Texture display is usually accelerated (depends on display card), and perspective correction is usually free.
  • The driver works with high-color displays, which provide a good trade-off between display quality and memory overhead.
  • Incremental display update works efficiently
  • Overhead is very minimal, and MAX gets to 'talk' very efficiently to the underlying hardware

But it has these disadvantages:

  • Direct3D 5 currently only runs under Windows 95 (and hence there is no multi-processor support)
  • The MAX D3D driver only supports high-color displays.
  • Dual plane operations are slow (if available)
  • There can be a fairly wide variation in interactive feature set, depending on what the low-level D3D driver supports.
  • Most D3D cards are optimized for full screen, small polygon count, textured scenes (as is typical with games), so wireframe displays and other non-game-oriented features may be missing or not optimally accelerated.

The OpenGL API quite large, but itworks only at the 3D scene level. Thus, when running with thisdriver, MAX hands off all 3D primitives to the OpenGL driver,independent of the level of hardware acceleration actuallyprovided by the display card.

Because of this higher level ofabstraction, there are more variables that affect overallperformance with this driver. A separate section will discuss howa display card's OpenGL display driver can provide the bestsupport for MAX.

In general, the MAX OpenGL has thefollowing advantages:

  • It supports geometry acceleration as well as rasterization acceleration.
  • It is tightly integrated into Windows NT, and many 3D display cards were specifically designed to accelerate OpenGL operations.
  • The API is abstract enough that MAX can easily benefit from general advances in OpenGL performance. And because of this abstraction, OpenGL implementations have all of the scene data necessary optimize the entire 3D display process.

But it also has thesedisadvantages:

  • All potentially visible scene data must be transferred to the driver, and this can cause a communication bottleneck across the system bus. In particular, this slows down the display of individual primitives (as opposed to strips or polylines – e.g. MAX wireframe displays).
  • Because OpenGL was designed to support a wide-variety of display systems, there is no guarantee that either incremental scene update methods (partial window blits or dual planes) will work with a particular implementation of OpenGL.
  • Because lighting and texturing are restricted to OpenGL-specified semantics, there can be mismatches between MAX scene lighting and texturing and what appears in an OpenGL viewport. (This applies especially to attenuated lights and non-tiled texture display.)

Because the OpenGL API is bothlarge and abstract, there are many ways that an OpenGL displaycard driver can be fully-compliant with the OpenGL spec, yet notprovide optimal support for MAX. This section discusses detailsregarding how such drivers can be optimized for best MAXperformance.

Before going into the gorydetails, please note that MAX will run adequately with anyOpenGL-compliant driver (version 1.1 or later). The detailsdiscussed here are ways to go above and beyond what the specrequires, so that MAX can provide the best possible throughput tothe end user.

With that in mind, here are someimplementation-specific points:

  • Since MAX is highly multi-threaded, it is absolutely imperative that the OpenGL driver be thread safe. In particular, MAX maintains one 'draw thread' per viewport (four total), and these threads create and hold on to their own OpenGL rendering contexts (OGLRC) for the entire run of MAX. In more detail, the contexts are created and made current at the beginning of the MAX session (one context per each of the four drawing threads), and the four contexts remain current in their respective threads until MAX is terminated.
  • OpenGL does not inform the application whether it is using blits (copies) or page-flipping (swaps) to update a viewport. MAX can work most efficiently if the blit model is used, since it can then repaint the viewports by reblitting the off-screen image any time another window damages the viewport region (as when the material editor, trackview, or the various floating dialogs are used throughout a typical MAX session). If MAX can't tell that the blit model is implemented, it has to assume the system is using a swap model, and this means that MAX must re-render the entire scene (four times for a four view layout!) each time the viewport display is damaged. For large scenes, this can be very slow and disruptive to the user's efficient work flow. (The swap model is more efficient for low complexity scenes, but display update gains are soon lost by the extra time required to re-render even medium-sized scenes.)
  • Based on the above, MAX requests a pixel format with the PFB_SWAP_COPY bit set, and looks to see if this bit is set in the format that is returned from OpenGL. This bit is optional in the OpenGL spec, and only serves as a hint, but if the driver implements it, MAX will start using the system optimally right from the start. (If the bit is not set, but a blit model is used by the driver, the MAX user will have to specifically change the default driver settings in the OpenGL Driver Configuration dialog box in order to see the benefits.)
  • OpenGL only requires support for 8 lights, with additional lights being optional. MAX users tend to use a lot of lights, so an optimal OpenGL driver will support more than the default of 8 lights. (Indeed, the viewport limit of 12 lights in MAX R1.2 was up'ped to 32 for both the HEIDI and D3D drivers in R2 because R1 users found 12 lights to be too limiting.)
  • MAX users tend to use a lot of spot lights (as opposed to omni or directional lights). Some OpenGL drivers have not optimized spot lights since they tend to be used seldomly in the standard OpenGL benchmarks.
  • MAX uses standard Windows device-independent truecolor bitmaps as the source for textures and viewport backgrounds. These store the colors as BGRA (blue, green, red, alpha) packed ints, so if an OpenGL driver supports the optional, extended GL_BGRA_EXT pixel format, MAX will not have to repack the data before downloading to the driver. (The presence of this extension is indicated by the 'GL_EXT_bgra' keyword.)
  • Since MAX keeps track of screen bounding regions and can easily calculate the minimal set of objects that must be re-rendered to update the display, if a driver implements blit screen updates and the optional glAddSwapHintRectWIN extension, MAX can perform incremental scene updates through the OpenGL driver. This can be a big performance win. (The presence of this extension is indicated by the 'GL_WIN_swap_hint' keyword.)
  • The MAX dual planes update system provides an efficient way to perform absolutely minimal re-rendering for updating the viewport displays. But it cannot be adequately implemented in standard OpenGL, even if all of the common Windows NT extensions are present. Thus, MAX looks for and utilizes a custom extension (implemented through the standard OpenGL extension mechanism). In addition to the need for the display driver to implement the four extension functions, the display card (or system RAM) must have adequate space for an extra copy of both the image and z-buffers for each viewport. But, if the resources are available in fast RAM, this extension can provide an astounding performance gain to the end user under many common modeling scenarios. A precise description of this custom extension is provided below.

The OpenGL extension describedbelow, if present, will be used by MAX to implement dual planesunder OpenGL. As with all OpenGL extensions under Windows NT, thefunctions are imported into MAX by calling wglGetProcAddress, andthe functions themselves are implemented with the __stdcallcalling convention. The presence of this extension is indicatedby the keyword 'GL_KTX_buffer_region' being present inthe string returned by glGetString(GL_EXTENSIONS).

In an optimal implementation ofthis extension, the buffer regions are stored in video RAM sothat buffer data transfers do not have to cross the system bus.Note that no data in the backing buffers is ever interpreted byMAX – it is just returned to the active image and/or Zbuffers later to restore a partially rendered scene withouthaving to actually perform any rendering. Thus, the buffered datashould be kept in the native display card format without anytranslation.

GLuint glNewBufferRegion(GLenumtype)

This function creates a new bufferregion and returns a handle to it. The type parameter can be oneof GL_KTX_FRONT_REGION, GL_KTX_BACK_REGION, GL_KTX_Z_REGION orGL_KTX_STENCIL_REGION. These symbols are defined in the MAX gfx.hheader file, but they are simply mapped to 0 through 3 in theorder given above. Note that the storage of this region data isimplementation specific and the pixel data is not available tothe client.

void glDeleteBufferRegion(GLuintregion)

This function deletes a bufferregion and any associated buffer data.

void glReadBufferRegion(GLuintregion, GLint x, GLint y, Glsizei width, GLsizei height)

This function reads buffer datainto a region specified by the given region handle. The type ofdata read depends on the type of the region handle being used.All coordinates are window-based (with the origin at thelower-left, as is common with OpenGL) and attempts to read areasthat are clipped by the window bounds fail silently. In MAX, xand y are always 0.

void glDrawBufferRegion(GLuintregion, GLint x, GLint y, Glsizei width, GLsizei height, GLintxDest, GLint yDest)

This copies a rectangular regionof data back to a display buffer. In other words, it movespreviously saved data from the specified region back to itsoriginating buffer. The type of data drawn depends on the type ofthe region handle being used. The rectangle specified by x, y,width, and height will always lie completely within the rectanglespecified by previous calls to glReadBufferRegion. This rectangleis to be placed back into the display buffer at the locationspecified by xDest and yDest. Attempts to draw sub-regionsoutside the area of the last buffer region read will fail(silently). In MAX, xDest and yDest are always equal to x and y,respectively.)

GLuint glBufferRegionEnabled(void)

3d Sky

This routine returns 1 (TRUE) ifMAX should use the buffer region extension, and 0 (FALSE) if MAXshouldn't. This call is here so that if a single display driversupports a family of display cards with varying functionality andonboard memory, the extension can be implemented yet only used ifa given display card could benefit from its use. In particular,if a given display card does not have enough memory toefficiently support the buffer region extension, then this callshould return FALSE. (Even for cards with lots of memory, whetheror not to enable the extension could be left up to the end-userthrough a configuration option available through a manufacturer'saddition to the Windows tabbed Display Properties dialog. Then,those users who like to have as much video memory available fortextures as possible could disable the option, or other users whowork with large scene databases but not lots of textures couldexplicitly enable the extension.)

Notes:

Buffer region data is stored perwindow. Any context associated with the window can access thebuffer regions for that window. Buffer regions are cleaned up ondeletion of the window.

MAX uses the buffer region callsto squirrel away complete copies of each viewport's imageand Z buffers. Then, when a rectangular region of the screen mustbe updated because 'foreground' objects have moved,that subregion is moved from 'storage' back to theimage and Z buffers used for scene display. MAX then renders theobjects that have moved to complete the update of the viewportdisplay.

Because most of the earlyapplications designed to take advantage of the Direct3D API weregames, most D3D drivers are currently optimized for getting veryhigh-throughput from scenes found in typical 3D games. Thesescenes typically have a low polygon count and are rendered asshaded, textured polygons in a full screen display mode.

MAX, of course, is a Windowsprogram that requires a standard Windows UI to be displayed undertypical usage (Expert Mode notwithstanding!). Thus, MAX opens theprimary DirectX display surface in 'cooperative' mode.Moreover, since MAX does not permit overlapping 3D viewports, wecan allocate backbuffer and Z-buffer resources in a somewhatunusual, but very efficient manner: we request a singlebackbuffer and a single z-buffer underlying the entire primarydisplay surface, and then we manage the individual viewportdrawing regions ourselves. (This differs from both the OpenGL andHEIDI drivers -- they treat each viewport as a totally separate3D window having its own backbuffer and z-buffer.)

What's 3d Max

All viewport updates are done byblitting from the backbuffer to the primary display surface.Since MAX tracks the scene's damaged rectangles on a per-viewbasis, we only blit the smallest rectangular region of a viewportthat has been changed. This allows us for efficient incrementalscene updates.

The MAX Direct3D driver usesDrawPrimitive calls (as opposed to execute buffers) for allprimitives.

MAX handles all 3D->2Dtransformations and clipping, and does all lighting using a lazyevaluation algorithm. This means that only primitives thatactually need to get rendered will be handed off to the D3Ddriver. Thus, all primitives are rendered with D3D clippingturned off.

In order to provide optimalsupport for MAX, a low-level D3D driver should provide efficientsupport for lines (both solid and dashed), textures (with bothMODULATE and MODULATEMASK addressing modes), and blit-basedscreen updates (as opposed to page flipping). The driver shouldbe 'DrawPrimitive-aware', and texture formats shouldsupport at least a 1-bit alpha channel (which MAX uses fornon-tiled texture display).

It is impossible to provide afinite set of benchmarks that can definitively rank display cardperformance for an application as complex as MAX. Moreover,because MAX is an interactive application, raw scene throughputnumbers can not fully describe how efficient (or pleasant) it isto use MAX within a particular hardware environment. Finally, howreliable or stable a driver is or how easy it is to upgrade arefactors that effect the overall end-user experience in a way thatwill never show up in frames-per-second benchmarks.

All that being true, there is nodoubt that some display cards provide better throughput thanother cards (for certain work styles, at least), and it is usefulto have a set of test scenes for getting some rough idea of the'bang-per-buck' for a given hardware configuration.With that in mind, the following benchmark scenes (download size: 1Megabyte) may come inhandy. (If you are already a MAX user, and you are looking forthe optimal display setup for accelerating the kind of work youdo, I highly encourage you to make up your own set of benchmarksthat are more representative of the type of work you do withMAX!)

Before starting MAX, if you addthe line

Download 3d Max

ShowFPS=1

to the [Performance] section ofthe 3dsmax.ini file, MAX will then display 5 indented fields inthe status bar prompt area at the bottom of the MAX window. Thenumbers that appear in these boxes are the frames-per-second(FPS) update times for each of the four MAX viewports, togetherwith the overall FPS number for then entire 3D viewing region.When playing an animation (typcically with 'real timeupdate' set to OFF), the FPS numbers represent the averagethroughput for the entire animation. (The average is restartedevery 1000 frames or so to filter out the effects of singularevents.)

The FPS readout is a goodindication of how fast a display card accelerates scenerendering. Alternatively, you can write a MAXScript program thatwill load MAX, playback a series of test scenes, and record therun times to a text log file (or you could get fancier and havethe script automatically log the data to an Excel spreadsheet!).

Here is a brief description of thescenes:

3d Max

  • Bnchmrk1.max: 3 objects totalling 20k triangles lit by two spotlights. All three objects are moving and none of them are clipped by the viewport. This tests raw polygon throughput for a simple scene.
  • Bnchmrk2.max: Same scene as Bnchmrk1, only the geometric complexity of each of the three objects has been increased so that the scene now totals 100k triangles. This shows how performance throughput changes for a larger scene.
  • Bnchmrk3.max: Same scene as Bnchmrk2.max, only this time at most one of the three objects is visible in the viewport at a time. This tests how well the driver deals with 'rejected' (non-visible) scene data.
  • Bnchmrk4.max: Three objects totalling 100k triangles, with only one 20k triangle object moving. This scene shows (usually rather dramatically) the benefit of having dual planes and/or incremental scene update implemented and enabled. Try playing back the animation with these options on and off (if your driver allows for both!)
  • Bnchmrk5.max: One 20k triangle textured object, never clipped by the viewports. This tests raw texturing throughput. Try this test with varying degrees of texture fidelity, if your driver supports this adjustment.
  • Bnchmrk6.max: A single textured, bending cylinder (with 5000 faces). This tests the display pipeline throughput for deforming objects. If you have another 3D application handy, you may want to compare the MAX playback speed with the speed you get from that app doing something similar!
  • TexTest1.max: A 12.4k triangle scene developed by Alan Iglesias. This scene has a camera moving inside a fully textured room. It tests clipping and texturing throughput in a scene less contrived than the others.

Related Links

3d Max Free

Please send me e-mail with any comments about this web site.

Copyright 1997, D. L. Brittain.
Last revised: January 28, 1999.





broken image