Web site optimization, commonly known as A/B testing, has become an expected competency among many web teams, yet there are few comprehensive and unbiased books, articles, or training opportunities aimed at individuals trying to create this capability within their organization.
In this series, I’ll present a detailed, practical guide on how to build, fine-tune, and evolve an optimization program. Part 1 will cover some basics: definitions, goals and philosophies. In Part 2, I’ll dive into a detailed process discussion covering topics such as deciding what to test, writing optimization plans, and best practices when running tests. Part 3 will finish up with communication planning, team composition, and tool selection. Let’s get started!
(See the original article on BoxesAndArrows.com)
The basics: What is web site optimization?
Web site optimization is an experimental method for testing which designs work best for your site. The basic process is simple:
- Create a few different design options, or variations, of a page/section of your website.
- Split up your web site traffic so that each visitor to the page sees either your current version (the control group) or one of these new variations.
- Keep track of which version performs better based on specific performance metrics.
The performance metrics are chosen to directly reflect your site’s business goals and might include things like how many product purchases were made on your site (a sales goal), how many people signed up for the company newsletter (an engagement goal), or how many people watched a self-help video in your FAQ section (a customer service goal). Performance metrics are often referred to as conversion rates, which equals the percentage of visitors who performed the action being tested compared to the total number of visitors to that page.
Optimization can be thought of as one component in the web site development ecosystem. Within optimization, the basic process is to analyze data, create and run tests, then implement the winners of those tests.
A/B vs. multivariate
There are two basic types of optimization tests: A/B tests (also known as an A/B/N tests) and multivariate tests.
In an A/B test, you run two or more fixed design variations against each other. The variations might differ in only one individual element (such as the color of a button or swapping out an image for a video) or in many elements all at once (such as changing the entire page layout and design, changing a long form into a step-by-step wizard, etc…).
In general, A/B tests are simpler to design and analyze and also return faster results since they usually contain fewer variations than multivariate tests. They seem to constitute the vast majority of manual testing that occurs these days.
Multivariate tests vary two or more attributes on a page and test which combination works best. The key difference between A/B and multivariate tests is that the latter are designed to tease apart how two or more dimensions of a design interact with each other and lead to that design’s success. In the example below, the team is trying to figure out what combination of button text and color will get the most clicks.
The simplest form of multivariate testing is called the full-factorial method, which involves testing every combination of factors against each other, as in the example above. The biggest drawback of these tests is that they generally take longer to get statistically significant results since you are splitting the same amount of site traffic between more variations than A/B tests.
Other fractional factorial methods use statistics to try and interpolate the results of certain combinations, thereby reducing the traffic needed to test every single variation. Many of today’s optimization tools allow you to play around with these different multivariate methods; just keep in mind that fractional factorial methods are often complex, named after deceased Japanese mathematicians, and require a degree in statistics to fully comprehend. Use at your own risk.
Why do we test? Goals, benefits, and rationale
There are many benefits of moving your organization to a more data-driven culture. Optimization establishes a metrics-based system for determining design success vs. failure, thereby allowing your team to learn with each test. No longer will people argue ad nauseum over design details. Cast away the chains of the HiPPO effect—in which the Highest Paid Person in the Office determines what goes on your site. Once you have established a clear set of goals and the appropriate metrics for measuring those goals, the data should speak as the deciding voice.
Optimization can also drastically improve your organization’s product innovation process by allowing you to test new product ideas at scale and quickly figure out which are good and which should be scrapped. In his article “How We Determine Product Success” John Ciancutti of Netflix describes it this way:
“Innovation involves a lot of failure. If we’re never failing, we aren’t trying for something out on the edge from where we are today. In this regard, failure is perfectly acceptable at Netflix. This wouldn’t be the case if we were operating a nuclear power plant or manufacturing cars. The only real failure that’s unacceptable at Netflix is the failure to innovate.
So if you’re going to fail, fail cheaply. And know when you’ve failed vs. when you’ve gotten it right.”
Top three testing philosophies
1. Rigorously focus on metrics
I personally don’t subscribe to the philosophy that you should test every single change on your site. However, I do believe that every organization’s web strategies should be grounded in measurable goals that are mapped directly to your business goals.
For example, if management tells you that the web site should “offer the best customer service,” your job is to then determine which metrics adequately represent that conceptual goal. Maybe it can be represented by the total number of help tickets or emails answered from your site combined with a web customer satisfaction rating or the average user rating of individual question/answer pairs in your FAQ section. As Galileo supposedly said, “Measure what is measurable, and make measurable what is not so.”
Additionally, your site’s foundational architecture should allow, to the fullest extent possible, the measurement of true conversions and not simply indicators (often referred to as macro vs micro conversions). For example, if your ecommerce site is only capable of measuring order submissions (or worse yet, leads), make it your first order of business to be able to track that order submission through to a true paid sale. Then ensure that your team always has an eye on these true conversions in addition to any intermediate steps and secondary website goals. There are many benefits of measuring micro conversion rates, but the work must be done to map them to a tangible macro conversion or you run the risk of optimizing for a false conversion goal.
2. Nobody really knows what will win
I firmly believe that even the experts can’t consistently predict the outcome of optimization tests with even close to 100% accuracy. This is, after all, the whole point of testing. Someone with good intuition and experience will probably have a higher win rate than others, but for any individual test, anyone can be right. With this in mind, don’t let certain members of the team bully others into design submission. When it doubt, test it out.
3. Favor a “small-but-frequent” release strategy
In other words, err on the side of only changing one thing at a time, but perform the changes frequently. This strategy will allow you to pinpoint exactly which changes are affecting your site’s conversion rates. Let’s look at the earlier A/B test example to illustrate this point.
Let’s imagine that your new marketing director decides that your company should completely overhaul the homepage. After a few months of work, the team launches the new “3-column” design (above-right). Listening to the optimization voice inside your head, you decide to run an A/B test, continuing to show the old design to just 10% of the site visitors and the new design to the remaining 90%.
To your team’s dismay, the old design actually outperforms the new one. What should you do? It would be difficult to simply scrap the new design in its entirety, since it was a project that came directly from your boss and the entire team worked so hard on it. There are most likely a number of elements of the new design that actually perform better than the original, but because you launched so many changes all at once, it is difficult to separate the good from the bad.
A better strategy would have been to have constantly optimized different aspects of the page in small but frequent tests to gradually evolve towards a new version. This process, in combination with other research methods, would provide your team with a better foundation for performing site changes. As Jared Spool argued in his article The Quiet Death of the Major Relaunch, “the best sites have replaced this process of revolution with a new process of subtle evolution. Entire redesigns have quietly faded away with continuous improvements taking their place.”
By now you should have a strong understanding of optimization basics and may have started your own healthy internal dialogue related to philosophies and rationale. In Part 2 of this series, we’ll talk about more tactical concerns, specifically, the optimization process.