In 2019, the success or failure of user acquisition advertising campaigns comes down to creative testing.
Facebook and Google’s shifts toward automation have removed most of the advantages of third-party adtech tools used to deliver. Those platforms’ automation of bid and budget management and audience selection has leveled the playing field even more.
But creative is still an opportunity. The algorithms can test different creative elements of ads, but they cannot create those elements. Creative is still best done by human beings.
Trouble is the vast majority of creative fails. 95% of new creative will not beat the current control. And if an ad can’t beat the control, there’s no point in running it. It’ll only cost you money.
So the real competitive advantage lies not just in creative, but in testing creative – to identify creative winners as quickly as possible, and for the lowest amount of spend per variation.
We’ve created and tested tens of thousands of ads in the last two years. We’ve profitably managed over $1 billion in ad spend. Here’s where our biggest wins have come from:
- 60% creative testing
- 30% audience expansion
- 10% of everything else
Creative is the differentiator. Creative is where the bulk of the big wins are. And from those creative wins, here’s which elements tend to win the most:
- 60% of videos
- 30% text
- 10% headlines and calls to action
That gives you an idea of where to start your tests. But it’s not all you need to know. Take Facebook user acquisition advertising, for example. It has several hidden challenges, including:
- Multiple strategies for testing ads – It’s nice to have choices, but they can complicate things. You can test creative on Facebook with their split-test feature, or by setting up one ad per ad set, or by setting up many ads within an ad set. Which one you pick will affect your testing results.
- Data integrity – The data for each of your tests won’t come in evenly. Some ads will get more impressions than others. The CPM for different ads and ad sets will vary. This makes for noise in the data, which makes it harder to determine the winning ad.
- Cost – Testing has an extremely high ROI, but it can also have a very high investment cost. If you don’t set up your creative testing right, it can be prohibitively expensive.
- Bias – Facebook’s algorithm prefers winning ads. And because you’ll be running your control against each new ad, the system will favor the winning ad. This skews the data even more and makes it harder to establish which ad won.
Running tests in Google Ads has many similar challenges, but it is about to get easier soon when Google App Campaigns launches “asset reporting” (expected towards the end of this July).
Perfect Versus Cost-Effective Creative Testing
Let’s take a closer look at the cost aspect of creative testing – and how to overcome it.
In classic testing, you need a 95% confidence rate to declare a winner. That’s nice to have, but getting a 95% confidence rate for in-app purchases will end up costing you $20,000 per variation.
Here’s why: To reach a 95% confidence level, you’ll need about 100 purchases. With a 1% purchase rate (which is typical for gaming apps), and a $200 cost per purchase, you’ll end up spending $20,000 for each variation in order to accrue enough data for that 95% confidence rate.
And that’s actually the best-case scenario. Because of the way the statistics works, you’d also have to find a variation that beats the control by 25% or more for it to cost “only” $20,000. A variation that beat the control by 5% or 10% would have to run even longer to achieve a 95% confidence level.
That’s a deal killer for a lot of businesses. Few advertisers can afford to spend $20,000 per variation, especially if 95% of new creative fails to beat the control.
So what do you do?
You move the conversion event you’re targeting up a little in the sales funnel. Instead of optimizing for purchases, for mobile apps, you optimize for impression to install rate (IPM). For websites, you optimize for impression to top-funnel conversion rate.
Why impression to action rate?
The obvious concern here is that ads with high CTRs and high conversion rates for top-funnel events may not be true winners for down-funnel conversions and ROI / ROAS. But while there is a risk of identifying false positives with this method, we’d rather take this risk than the risk and expense of optimizing for bottom-funnel metrics.
If you decided to test for bottom-funnel performance anyway:
- You would be substantially increasing the spend per variation and you’d introduce substantial risk into your portfolio’s metrics.
- Or you’d need to rely on fewer conversions to make decisions, which runs the risk of identifying false positives.
Here’s one other benefit: When we’re optimizing for IPM (installs per thousand impressions), we’re effectively optimizing for relevance score.
As you know, a higher relevance score (Quality Rank, Engagement Rank or Conversion Rank) comes with lower CPMs and access to higher-quality impressions. Ads with higher relevance scores and lower revenue per conversion will often outperform ads with lower relevance scores and higher revenue per conversion because Facebook’s algorithm is biased towards ads with higher relevance scores.
So optimizing for installs works better than optimizing for purchases on several levels. Most importantly, it means you can run tests for $200 per variation because it only costs $2 to get an install. For many advertisers, that alone can make more testing possible. There just aren’t a lot of companies that can afford to test if it’s going to cost $20,000 per variation.
Here are a few other best practices to make the whole creative testing system work:
- Facebook’s split-testing feature – We mentioned earlier that there are several different ways to test ads, even within Facebook. Skip the other options and just use their split-testing feature.
- Always test against a top-performing control – If you don’t test every variation against your control, you’ll never know if the new ad will actually beat your control. You’ll only know how the new ad performed compared to the other new ads you tested, which doesn’t actually help.
- Only test on Facebook’s news feed – There are 14 different placements available in Facebook’s ad inventory. Testing all of them at once creates a lot of noise in the test data as each placement has different CPMs, conversion rates, and CTRs. So don’t do that. Keep the data clean and just test for the news feed.
- Optimize for app installs – This is a major lever in getting your costs down. It may not be a perfect solution, but it works well enough.
- Aim for 100 installs minimum – You need at least 100 installs to reach statistical significance. We can bend the rules of statistics a bit to find winning ads faster and cheaper, but we cannot break the rules entirely.
- Use the right audience – Use an audience that’s considered high quality so it’s representative of performance at scale, but also one that isn’t being used elsewhere in your account. This minimizes the chance that audiences used in test cells could be exposed to other ads that are running concurrently in other ad sets.
- Consistent data drives higher confidence in results – Never judge a test by its first day of results. Look at aggregate performance for variations and for stability across day-to-day results. If test data is consistent and winners and losers are no longer changing day-over-day, your test results will be far more reliable than if cumulative variation performance is still changing day by day.
The data below shows the winner changing on the last day, which is an indicator that additional data would increase our confidence in the data. The winner “orange line” had a poor day 1, but a strong day 2 and day 3. This is an indicator that the results were still changing at the time of test completion.
How to Test “Predicted Winners”
If you follow all those best practices, you may have a few new ads that have performed well (and reliably) against the control. You’ll have what we call “predicted winners.”
We know these newly-tested predicted winners performed well against the control in a limited test. What we don’t know is what will happen if we really increase their exposure and test them against purchases and for ROAS. Will they continue to perform?
To find that out, each winning variation should be launched into existing ad sets. These variations should be allowed to compete for impressions versus other top ads. This will allow us to verify whether these new predicted winners are holding up at scale.
Run through this process enough, and you’ll finally have found that precious thing… an ad that beats your control.
Pro Tips for Mobile User Acquisition Testing
- Get back to testing – Your new winners will soon fatigue and performance will deteriorate. The best way to offset performance fatigue is to replace old creative with new creative winners
- Do more competitive creative analysis – If you’re a new media buyer (or even an experienced one), spend every bit of free time you have doing competitive creative analysis. It’s a high-return activity that will help you generate better ads to run your tests with.
- Don’t trash the near winners – Sometimes in tests, we’ll have ads that were within 10% of the control’s performance but didn’t quite beat it. We don’t just kill those “near winners.” We’ll send them back to the creative team so they can tweak those ads just a bit to improve performance.
Conclusion for Creative Testing
Quantitative creative testing has one of the highest ROIs of any business activity. No matter who you are, or how much creative testing you’re doing, do more of it. It’s the single best way to improve the ROAS for your accounts.