As the technical director at The Creative Assembly, I have been programming – mostly in video games – since 1979. Recently I have been helping with the graphics optimisation work on Total War: Shogun 2. The following is an overview of a single day of development on the latest Total War title.
Total War: Shogun 2 has a completely new deferred renderer with DX11 support added and many rendering system changes. With all the new features, we were left to optimise the new systems for all of our existing customers using DX9.
Movie playback performance in the front-end was poor on lower-end PCs, but fine when played outside of the game. After the usual driver and movie player SDK version checks, I fired up the Intel Graphics Performance Analyser (Intel GPA).
ON THE BLOCK
When running a GFX analyser, I check the total GPU time matches the frame-rate to ensure that what I am looking for is likely to be a GPU issue. In the GPU breakdown (see above right) I have grabbed a frame and added parts of the scene rendered by each of the larger GPU duration blocks.
The movie was actually being played by code designed to play a movie in a window as part of the UI. This is typical of a fairly common mistake made in the industry and it wasn't a problem until a very expensive 3D battlefield was added into the mix as a front-end backdrop.
With the 3D scene turned off during full-screen movie playback, all the movies now play well on all machines.
When thousands of trees were being displayed on the campaign map, there was a noticeable slow down.
Again, after checking driver version and GPU timing, I started the investigation. Looking at the [trees] graph, you can see that the trees take quite a time: it is spent mostly in the pixel shader.
KNOCK ON WOOD
While looking at the pixel shader disassembly, I found that there were four texkill instructions. This was a very expensive instruction on the older hardware and can still be an issue on the more modern cards. I used the analyser to comment out all the texkill instructions – introduced as part of the DX11 deferred renderer – and turn on alpha test instead.
This showed a 50 per cent speed-up on the test machine, and a range from 15 per cent to 35 per cent on other machines. This was not a particularly accurate test, but it was a good enough method to convince the GFX team to then tweak the shader.
As non-imposter trees are liable to cause an overdraw issue, I then checked the overdraw for the scene in question and noted particularly high levels – and poly counts – on the trees. To amend this, while the shaders were being changed, the art team remodelled the trees with fewer polygons and overdraw while still avoiding imposters. The resulting trees looked just as good as the old ones, but reduced the GPU cost by 50 per cent.
ON CLOUD 800
The campaign map was still not performing in the way that we wanted it to, and a little further investigation (see above) into the issue showed 800-plus tiny draw calls. Looking at the pixels involved, the only possibility was clouds. I showed this to the GFX team lead, and he very quickly identified the Napoleon: Total War shader as the guilty code in this case.
It was both inappropriate for the Shogun 2 campaign map, and redundant as Shogun 2 clouds use a new system. With NTW clouds removed, the new performance was good: around 20+ FPS on one of our min-spec PCs.
When looking at the frame breakdown, I noticed that there was still a significant cost to the ground, and that parts of it were deep under the sea. In deep sea areas, the bottom does not affect the final render and these areas will be removed before shipping. With Shogun 2, we have many more coastal battles and these will also benefit from having the deep-sea bottom removing.
ON THE CARDS
As part of our standard preparations for release we send our game to GFX card manufacturers and they come back to us with compatibility and performance issues. We got a couple of issues back:
As part of the move to DX11, the use of half had been replaced with float, causing a performance penalty on some hardware. The GFX team will reintroduce the use of half to some of the DX9 shaders to see if we get any benefit (the benefits may be reduced due to more texture usage than before).
An R32F format “depth” buffer was being used as part of the G-Buffer. This halves the performance of some cards, and we are looking at alternatives to improve the deferred renderer performance. As part of the work to support this, the GFX team were able to remove some z-buffer use and speed up other areas of the renderer.
The new water renderer is quite expensive on older hardware. We are currently at 20 fps on six-year-old-plus hardware, but still have some further optimisations to apply to support all the min-spec configurations we would like to before release in March.