Every month, Jagex’s developers visit us from Runescape’s Gielinor to talk about their latest adventures. This month, senior game engine developer Chris Knowles shares the company’s experience of over 18 years of live title updates
Game updates are a weekly affair at Jagex. For the last 18 years, we’ve updated RuneScape and, for the past six years, Old School RuneScape, on a weekly basis. Last year, this meant adjusting our processes to include updating our iOS and Android clients for Old School RuneScape and RuneScape, currently in closed beta, so we could keep the updates rolling across both desktop and mobile.
With that amount of experience behind us, the game update process is a comfortable part of continually expanding each game. But no matter how used we are to updating, we remain alert to the risks of an update not going to plan, or an issue cropping up in the live game which needs to be fixed quickly. We’ve found that there are two key measures of the quality of a launch pipeline: how safely you can launch an update, and how quickly (and safely) you can fix an issue if it arises.
As with many processes, humans are often the weak point in the system. Even with a script to follow, we’re liable to make mistakes – therefore, the launch process should be automated as much as it sensibly can be.
A fully automated process may be a step too far; there may be situations where a technical issue means an update needs to be aborted, and relying on a fully automated process to handle every possible failure case and stop when needed is probably unwise. Having a few breaks in the process where a human can check that the universe is as it should be is a great safety net. You probably don’t want your players downloading the new version of the client until you’re sure the new servers are up and running.
But one vital question to be asking yourself before pushing the first Big Red Button is: what will I do if this doesn’t work? If the deployment of the new version of your game is only partly successful, or the servers won’t start, are you able to redeploy the previous, working version of the game and get your players back in, or are they going to have to sit and stare at the login screen while you work out what’s gone wrong?
Even trickier is the question of what you’ll do if the update works but players find a game-breaking issue. This is where the speed at which you can do an update becomes important. When you’re just releasing the latest content a quick process is handy, but not strictly required. When you’re trying to patch a live issue, however, being able to deploy a solution quickly is far more important, particularly with player sentiment at stake.
With RuneScape, backing out an update that’s gone live and reusing the previous version is rarely possible. Suddenly players would be carrying items that didn’t exist or be in some other broken state. What we can do is use our hotfix system that allows us to update content scripts without having to do a full deploy or shut down the game. This allows us to either fix an issue or, if need be, simply disable the offending bit of content while we produce and test a definitive fix.
Our nightmare scenario for either version of RuneScape is a live issue that, even once the problem is fixed, has had a detrimental effect on the game’s integrity; perhaps players were able to gain unreasonable amounts of XP or gold in a very short time.
ROLLING IT BACK
In this scenario, the only option is to perform a save rollback – shutting the game down and restoring everyone’s state to the point just before the update. Our thorough code review process and QA testing means that these are extremely rare events for us. Between RuneScape and Old School RuneScape, we’ve only had to perform three rollbacks in the past nine years.
If you do find yourself in this unhappy boat, it’ll be a real test of whether your backup strategy is worthy of the name. We take regular snapshots of players’ save games, and so we’re able to retrieve saves from almost exactly the point that the servers shut down for the update, so that the only progress lost for the overwhelming majority is what they did after the faulty update went live.
The most important piece of advice if something goes wrong is don’t panic. Better that the game remains offline or buggy for a little longer than you jump to a solution and make things worse. But our ability to avoid panicking in a situation is often affected by how much we trust the systems that allow us to dig ourselves out of our current hole. Time and effort invested in an update process and, crucially, an update recovery process, will pay dividends at some point. But hopefully, not for today’s update.