Twitter: @robkingston, https://mintmetrics.io/
...but it shouldn't be.
Let's explore common pitfalls so you can identify & fix them.
Finance client testing new CTA design. Mild styling/markup change, passed extensive QA:
<button class="oldBtn">Create an account</button>
<a class="newBtn" href="/signup.html">Signup now</a>
After launching, the SaaS split testing tool reported
checkouts down -70%! 😧 %$#&!
Right, time for us to check what GA was reporting:
So, we go back to the SaaS tool:
And then compared GA's figures:
NB. Subjects = Visitors
The SaaS tool tracked +150% more users bots.
"Checkouts" dropped -70% because bots didn't recognise the new CTA.
Ecommerce client testing major redesign. Used JS redirects for a 50-50 split test that needed back-end changes.
Load times will hurt the treatment, right?
Nope! Conversion rates were dead even...
But, traffic should be split 50-50.
Why does the control group have more?
Assuming even assignment it could be affected as much as -18%! A far cry from +1.2%...
Use "SRM" tests & plot your assignment over time
Another ecommerce client testing a major SERP listing redesign:
Treatment was delivering a solid lift (Everything stat sig). Just 1-week out from completion...
Days later, we check the results...
Time to investigate...
Errors spiked from the treatment group when a feature deployment broke our test pages.
The whole page was unusable, risking $ooK's/revenue.
Fortunately, we had protection.
Erroring users were booted from the test so they could continue browsing unhindered.
On a lead-gen site, someone wanted to run a really, really important test to lift engagement.
"It's such a good idea - it's full of personalisation and has a
fantastic PIE score!"
So, we built it, QA'd it, tested it and launch the experiment...
With the experiment live, we waited for a result...
2 months passed & no cigar!
Hypothetical: Timmy wants to run a test on an ecommerce payments page (responsible for 100% of revenue).
"Let's redesign this step to make it fit our new brand..."
So, you build, test and QA it thoroughly.
On launch day, you publish it to 100% of traffic...
at 5pm on "Friyay". Job done ¯\_(ツ)_/¯
Yeah, nah.
Imagine a production config flag breaks the test.
Now, half the revenue is at risk.
PS. It's easier to launch a contentious idea to 10% traffic, too!
Hopefully now you know how easy they are to spot and solve!
Any questions?