A/B Testing

If website visitors leave a company's store without purchasing the product or service they were looking for, this is a warning signal. This is where A/B testing comes in to get to the bottom of the problem. Companies thus counteract the loss of important leads by comparing individual elements of their website to find the most user-friendly option. In this way, they specifically increase their conversion and reduce purchase cancellations, among other things.

What is A/B Testing?
5 steps to perform an A/B test
Examples for A/B tests
More test options
Challenges and possible sources of error
Frequently asked questions

What is A/B Testing?

A/B testing (also called split testing) involves testing different versions of a website, online store, newsletter or app. The primary goal: to determine the better performing page and generate more traffic. These A/B tests serve as a basis to clarify important questions for the marketing strategy and to perform a precise analysis of certain elements. They are considered a statistical method and are easy to perform with the right tools.

5 steps to perform an A/B test

But how does an A/B test work? To get a comprehensive result, companies should run it over two complete business cycles. This way, buyers who first have to think about a purchase are also included, all traffic sources are integrated, and upcoming holidays do not carry too much weight. The following outlines an approach for the ecommerce sector:

Finding out the optimization potentials

At the beginning, companies need to define the goal of the upcoming test. Often, an A/B test is conducted to find out which page achieves the highest conversion rate. But A/B tests are also a popular marketing tool for apps or emails. In order to identify corresponding potential for improvement, the focus should be on factors that can be verified in real terms. Whether interviews, surveys, Google Analytics or eye tracking - there are many ways to get user data. These questions can help:

Which elements are important? What already works on a website and what needs improvement? If a page is hardly visited, it is not worth investing a lot of time and energy to adapt it.

At what point do customers bail out?

How can a jump be prevented?

Creating the test groups

In the next step, all visitors to a website are divided into two test groups. These must always be of equal size in order to obtain meaningful results at the end. One half is presented with variant A of the store and the other with variant B. The test groups know nothing about their luck because they are presented with one of the two variants. The test groups know nothing of their luck, because they are presented with one of the two variants at random - so the results are not falsified. Even with a newsletter mailing, targeted segmentation helps to narrow down the desired target group and exclude existing customers from the test.

Tip: A sample size calculator helps to measure how large the sample needs to be per variant. Often a tool helps to divide the entire target group into smaller groups.

Identify the variables

Which variables are examined? One example is the bounce rate, which determines how long users stay in the store. Here, for example, there may be different call-to-action buttons to see which is clicked more often ("Order now" vs. "Go to store"). Store operators then formulate a hypothesis that provides information about the success of the adjustments at the end of the test. In most cases, several hypotheses are necessary. The following is an example hypothesis:

The bounce rate is reduced when the potential customer lands directly on the desired product page instead of on the overview page for pants.

Implement the test

Once all preparations have been completed, the test can begin. To do this, the store operator should check which testing tool is best suited to their own site. It helps to look at the different options and compare their advantages and disadvantages. Then the test begins: an employee creates the two versions of the page and switches them live.

Whether holidays, vacations or major events such as the Olympics - all these factors can have a significant impact on the test phase. At this point, tools help companies and provide reliability rates. If no significant results are available after several weeks, the approach should be reconsidered. Accordingly, both the size of the test group and the runtime play a meaningful role.

Data analysis and interpretation of results

First, the previously defined hypotheses should be compared with the results. If the results deviate strongly from the expected results, all factors must be examined again in detail. Do the test results correspond to the expectations? If variant A performs better than variant B, as expected, this is the side that companies should continue to work with. Important conclusions can be drawn from this. In the end, however, companies should always cross-check the better version with another test instance. Now it's time to observe:

Will user behavior change in the long term or only in the short term?
Are there fewer support requests coming in?
Is customer satisfaction increasing?
How do the numbers change?

Once the test cycle is complete, it's on to testing other elements and seeing if they make a difference.

Examples for A/B tests

Below are some examples of test visualization that help to better understand the principle of A/B testing. In all tests, only one element is changed at a time so that it is possible to measure which elements are better received by the target audience. If several factors change at the same time, it is not possible to determine what visitors like about the version.

a) E-mail marketing

Everyone regularly receives newsletters from a company from which he or she has already purchased products or signed up for the newsletter purely out of interest. Since nowadays a lot of companies use email marketing, it is difficult not to get lost in the mass of competitors. Here, it is a good idea to use a newsletter with different subject lines, for example. Do click-bait headlines or numbers in the title work better with the desired target group?

b) Shop optimization

When visiting an online store, users expect a user-friendly environment, with fast loading times, a choice of payment systems and much more. Store operators can test different factors.

Example: Summer dresses are sold in the store. Instead of "red dress", the content creator could use the following description: "runbin red beach dress". This can be used to test what resonates better with readers.

c) Advertising campaigns

With both Google Ads and social ads, companies should be variable. In order to reach the right target group for their own brand, it is important to find out which version of a campaign performs better. Here, for example, cleaner ads can be used at one time and more playful ones at another to see how they affect the click and conversion rate. In addition, CTAs can be quickly adapted and tested.

More test options

In addition to classic A/B testing, there are other well-known testing options. However, these often require high traffic and are more suitable for medium-sized to large companies.

A/B/n tests

In A/B/n tests, several variants of the control version can be tested against a control instance. The target group is split into smaller test groups (for example, 25% each). Multiple testing opens up new ways to see which combination of elements performs best on a page. For this type of testing to work, a company needs a lot of traffic.

Multivariate tests

Multivariate tests are a form of A/B testing in which, however, not just one but several variables are examined in combination. The goal of the testing procedure is to compare combinations with each other and to highlight the variant that achieves the best possible conversion rate. While this test automatically tests all possible combinations, an A/B/n test requires someone to select all options by hand.

There are two different types of multivariate tests:

Full factorial: This method describes the general case of multivariate testing. Here, all possible combinations of the variables are tested under the same conditions. This means that each defined combination is tested with the same number of subjects. This method makes it possible to analyze the interactions between the variables and to identify the optimal combination.

Partial factorial: This method is used less frequently because it requires statistical derivations. Here, only a few selected combinations are tested for the entire traffic. The remaining variants are determined using statistical methods. This approach makes it possible to test a large number of combinations efficiently and still obtain meaningful results.

The choice between full-factorial and partial-factorial multivariate tests depends on various factors such as the available test size, resources and the desired level of detail for the analysis. Multivariate tests thus offer an advanced way to examine the effects of multiple variables on conversion rates and determine the best possible combination. By specifically optimizing the variables, companies can make their online marketing more effective.

Challenges and possible sources of error

A/B testing has a significant impact on data-driven decisions. Accordingly, they offer potential for the continuous improvement of a company's marketing strategies and support, for example, the search for the best version of a website or an app. Even if not every test is positive, they always provide further insight into new testing methods. Caution: The results should always be interpreted with caution so that no errors creep in. In addition, external influences and also seasonality play an important role during the test phase.

Frequently asked questions

What is the difference between A/B testing and multivariate testing?

A/B testing and multivariate testing are methods for optimizing marketing strategies, but they differ in their approach. A/B testing compares two variants (A and B), changing only one variable at a time. Multivariate Testing tests multiple elements or variables simultaneously to analyze their individual and combined effects. While A/B testing is easier and faster to implement, Multivariate Testing allows for a more detailed analysis of the interactions between variables.

How long should an A/B test run to produce meaningful results?

The duration of an A/B test depends on various factors:

the size of the test groups
the expected effect of the changes
the traffic rate on the website

However, as a rule, tests should be run for a sufficient period of time to obtain statistically significant results. At least for one week to account for any weekday or seasonal variations. Preferably even for two business cycles.

What metrics are used to evaluate an A/B test?

Selecting the right metric for evaluating an A/B test is critical to obtaining meaningful results. The metric should be closely related to the test objective and reflect the desired changes. Examples of common metrics are: