As we work on live games (mobile or web, Games as a Service, GaaS), we are constantly adding new features and content, but we need to evaluate if the feature is successful (and how successful) or not, and make decisions regarding the feature. In this article, I will cover some factors or characteristics of a feature to consider beyond a straightforward readout from an A/B testing experiment.
The most basic way is to look at a feature’s success if to use experiments results vs objectives. Usually the goal of a feature is tied to a specific metric. A feature’s intention could be to raise ARPU, increase Retention or increase amount of vitals user sends. Then you can have an A/B testing and evaluate if indeed the feature met the target you set. However, a simple read of the experiment results can be misleading and doesn’t give the whole picture, and here are some other factors to consider. For this article, I assume readers already have a basic understanding of A/B testing, experiment design, and common GaaS metrics.
Often, a day or two after launching a new feature, a PM would be run to me and excitedly mention this great feature just lifted ARPU by 10% based on the readout from the an experiment result, and ask me if we should ramp up the feature. I would say, wait and see. Sometimes, after a couple days, the revenue bump from this feature would decay. The ARPU lift from the feature would look like something below.
Of course, ideally good features will have an impact curve on a metric like this, which means the lift is sustained.
There could be many reasons for the decay. One of the common causes is that the feature isn’t actually beneficial to the users. For example, you may launched a new power-up, there are users who want to try it, but then after user tried once or twice, they may find that the power-up simply isn’t useful.
Another common reason could be that the feature design is deceptive. For example, the interface could be designed so that the button to confirm spending of premium currency is made very large and “easy” to click on, while the cancellation button made less obvious. I strongly recommend against this, because users will soon be smartened to it and will cause you to lose users in the long term.
Even if a feature or mechanic is repeatable, it is necessary to keep things varied and fresh.
Another reason could be that the feature is a content-based feature, and paying users could burn through the content. Actually, this certainly isn’t a bad thing, because it means the feature can be repeated.
Even if a feature doesn’t sustain, it may still be worth a while to do. Especially, there is a lot of features that sounds good on paper, then have no lift or have negative impact. Look at it in terms of Return on Investment: let’s say a feature cost two man-weeks of resources to make, but it makes $50,000 in two days and then decayed. It could still be positive ROI. An even better way to look at it would be if those two man-weeks could be better spent elsewhere.
A feature’s impact during first launch may not sustain, but can sometimes be repeated multiple times with slight variations or with different type of content. This is actually a good thing, because often the amount of the revenue from each launch of repeatable features can become very predictable. For an EP who needs to project how much revenue for a quarter, this can bey very helpful for planning a long term roadmap.
However, even with repeatable features it is possible that the user can still get bored after a while. For example, for some events we run in games, some event types are easier and cheaper to produce than others. After we run one event type too many times, there is a lot of user complaints and visible decline in revenue, even though each event had different set of content. Therefore, even if a feature or mechanic is repeatable, it is necessary to keep things varied and fresh.
If after launching a feature, and it didn’t move the needle on a target metric, with a deeper analysis, we sometimes find there is cannibalisation. For example, in the case of a well-known mechanism/feature, based on past data or funnel analysis, we expected this feature to contribute ARPU of 0.5 cents. However, when this didn’t happen after the launch, and we looked at our monetisation stack (or ARPU by feature/category), which often reveals revenue in this new feature is indeed 0.5 cents, but in another category or feature the revenue decreased. In this case, this is called cannibalisation.
First of all, if there is some cannibalisation, it isn’t necessarily a bad things if overall revenue increased.
In some cases cannibalisation is expected. For example, if you have an Arcade causal game, and you already have three pre-game power-ups, adding one more probably won’t add much revenue. However, this might be a good thing if the new power-up is so much more fun than previous ones and adds variety.
Decisions don't depend on whether cannibalisation will happen or not; they depend on how much cannibalisation will happen.
Sometimes, there could even be negative cannibalisation. For a game I was in charge of we had in-game power-ups, i.e. power-ups that you can buy while you are in middle of playing a level. When we first added this mechanism with only one type of item you can buy, it was quite effective at increasing revenue, and had almost no cannibalisation of the revenue from the continue mechanism, and the ARPU lift sustained. So we decided to make it “better,” by adding a lot more different types of in-game power-ups. However, the whole category of in-game power-ups dropped in revenue. We attribute it to the ironic psychology effect that sometime too many choices isn’t a good thing, especially when the user is trying to make fast decisions during gameplay.
The decision on whether to keep a feature or not isn’t dependant on whether cannibalisation will happen or not; it is dependant on how much cannibalisation will happen. So set up experiments with different combinations of features will help you make an informed decision.
The difference between Cannibalisation and a Trade-off is that Cannibalisation effects only one metric, say ARPU. Trade-off is the increase of one metric (e.g. ARPU) at an expense of another (e.g. Retention).
Trade-offs happen more often than I’d like, especially between Retention and ARPU. For example, one of the easiest revenue features is to add a pay gate to new content. If you do that, the people who love your game mostly will pay for it, often begrudgingly. However, you’ll lose non-paying users, or even paying users who are frustrated with being nickeled and dimed.
Given that long-term retention is hard to measure in short term, it is especially important to consider the trade-offs that could happen as you design your features and experiments.
As we design features, we shoudn't just think a short-term metric we need to increase, but also whether the impact of the feature could sustain and be repeated, and how it impact other features and metrics.
A trade-off isn’t necessarily always bad. Since my background is an engineer, I know that engineers make trade-off decisions on a daily basis. Here we also may have to decide to move forward with a trade-off. For example, a feature that increases retention but decreased ARPU might be a good thing, because you retained more users and that bigger userbase will increase your overall revenue. Sometimes the reverse is true: you increased ARPU, reduced retention, but still your overall revenue increased because the increased ARPU made up for the lost users.
There are situations we may have to make hard decision for business reasons. In Live game operations, a game lead has monthly or quarterly revenue projections to meet. Build long-term and good features is challenging and takes time. Say, if you only five days left in a quarter, and you have a gap against the quarterly revenue projections, you may have to use some short term features to raise the revenue.
Are there features that raise both ARPU and Retention? Absolutely. One example is level progression – it is a very old mechanism, but used very effectively in games like Candy Crush Saga. Another example is Events, which I covered in another article. Both mechanics are proven to increase retention and revenue.
As we design features, we should think about not just a short-term metric we need to increase, but also if the impact of the feature could sustain, can we repeat it, and how it impact other features and metrics, as each feature itself isn’t a closed system. At the same time, even if a feature has flaws, that doesn’t mean it is all bad. Understand the overall business goals and the big picture will help you evaluating a feature and make the right decision regarding whether to deploy or remove a feature.
Xing Wang was most recently an executive producer at Zynga, where he lead teams in charge of games such as card battler Ayakashi and Ruby Blast, one of Facebook's top 25 games of 2012. Currently, he is working on a start-up. He has Bachelor's and Master's degrees in Computer Science from MIT.