Growth Experimentation Is Not a Process. It’s a Discipline.
Growth experimentation is the structured practice of testing hypotheses about how your business acquires, retains, and expands its customer base, using real data to decide what to scale and what to kill. Done well, it replaces opinion-driven strategy with evidence-driven iteration. Done poorly, it becomes a theatre of A/B tests that never move the needle on anything that matters.
The difference between the two is almost never about tooling or process. It’s about whether the people running experiments have a clear commercial question they’re trying to answer, and the discipline to act on what they find.
Key Takeaways
- Growth experimentation only produces value when experiments are tied to a specific commercial question, not run for their own sake.
- Most teams over-index on lower-funnel optimisation and under-invest in experiments that reach genuinely new audiences.
- Velocity matters: running ten modest experiments beats running one perfect one, because you learn faster and compound the results.
- A failed experiment is only wasted effort if you didn’t design it to teach you something in advance.
- The biggest risk in experimentation programmes is not running bad tests. It’s drawing confident conclusions from statistically weak ones.
In This Article
- Why Most Experimentation Programmes Underdeliver
- What a Good Experiment Actually Looks Like
- The Velocity Principle: Why Running More Tests Beats Running Better Tests
- Where to Experiment: Mapping Tests to the Growth Model
- The Confidence Problem: When Experiments Lie to You
- Building an Experimentation Culture Without the Cult
- From Experiment to Scale: The Step Most Teams Skip
- The Experiments Worth Running Right Now
Why Most Experimentation Programmes Underdeliver
I’ve sat in a lot of growth reviews across a lot of different businesses, and the pattern is remarkably consistent. The team has a testing roadmap. They’re running experiments. The dashboard is full of results. And yet, when you ask what has actually changed about the business as a result, the room goes quiet.
The problem is usually one of three things. Either the experiments are too small to matter even if they win, the results are being read selectively to confirm what the team already believed, or there’s no mechanism to actually scale the things that work. Experimentation without a path to deployment is a research project, not a growth programme.
There’s also a subtler issue that I think gets less attention than it deserves. Many teams are experimenting on the wrong part of the funnel entirely. They’re running endless conversion rate tests on audiences who were already going to buy. Earlier in my career, I made exactly this mistake. I was obsessed with lower-funnel performance metrics, optimising click-through rates and cost per acquisition with real precision. It looked impressive. But I came to understand that a significant portion of what we were crediting to performance activity was demand that existed independently of our efforts. We were capturing intent, not creating it. The growth wasn’t coming from the experiments. It was coming from the market.
Real growth requires reaching people who don’t yet know they want what you’re selling. That’s a much harder problem to experiment on, but it’s the right problem.
What a Good Experiment Actually Looks Like
A well-designed growth experiment starts with a falsifiable hypothesis. Not “let’s try a new landing page” but “we believe that removing the pricing table from the homepage will increase trial sign-ups by reducing decision fatigue at the awareness stage.” That specificity matters for two reasons. First, it forces you to think clearly about the mechanism you’re testing. Second, it gives you something concrete to evaluate when the results come in.
The structure I’ve seen work consistently across different businesses and categories follows a simple logic: hypothesis, metric, minimum detectable effect, sample size, duration, decision rule. Before you run the experiment, you decide what result would cause you to act, and you commit to that in advance. This sounds obvious but it’s violated constantly in practice. Teams run tests, see ambiguous results, and then spend three weeks debating what the data “really” means. That’s not experimentation. That’s post-hoc rationalisation with extra steps.
On sample size and duration: these are where a lot of programmes quietly fall apart. Running a test for five days because the team is impatient, or calling a result at 70% statistical confidence because it’s pointing in the right direction, produces conclusions that don’t replicate. The cost isn’t just the bad decision on that specific test. It’s the erosion of trust in the whole programme over time, as “winning” experiments fail to deliver when scaled.
If you’re building or rebuilding an experimentation programme, the go-to-market and growth strategy thinking that sits behind your experiments matters as much as the experimental design itself. Tests without strategic context tend to optimise for the wrong things.
The Velocity Principle: Why Running More Tests Beats Running Better Tests
There’s a tendency in more methodologically rigorous teams to treat each experiment as a significant event. Extensive pre-work, careful design, long run times, formal readouts. The experiments are good. There just aren’t very many of them.
The businesses I’ve seen compound growth most effectively through experimentation tend to operate differently. They run more tests, accept that some will be inconclusive, and treat the learning from failed experiments as genuinely valuable rather than embarrassing. The compounding effect of ten modest experiments over a quarter almost always outperforms two rigorous ones, not because rigour is bad but because the information value of more data points is higher.
This is one of the core insights behind agile scaling approaches. BCG’s work on scaling agile makes a related point: organisations that build iteration into their operating rhythm, rather than treating it as a special activity, consistently outperform those that treat change as a project. The same logic applies to growth experimentation. It shouldn’t be a programme you run. It should be how you work.
The practical implication is that you need to make experimentation cheap. Cheap to propose, cheap to design, cheap to run, cheap to kill. If every test requires a project brief, a steering committee sign-off, and a dedicated sprint, you’ll run four experiments a year. If you build infrastructure that lets a two-person team spin up a test in a day, you’ll run forty. The organisations running forty will learn faster. Full stop.
Where to Experiment: Mapping Tests to the Growth Model
Not all experiments are created equal, and the ones most worth running depend heavily on where your actual growth constraints are. This sounds obvious, but most teams default to experimenting where it’s easiest rather than where it matters most.
A useful frame is to think about growth across three distinct problems: acquiring new customers, converting the ones you’ve acquired, and retaining and expanding the ones you already have. Each of these requires a different experimental approach, and each has a different risk profile.
Acquisition experiments are typically the hardest to run well because the feedback loops are long and the variables are many. Testing a new channel, a new audience segment, or a new creative approach requires patience and a willingness to accept ambiguous results in the short term. Market penetration strategy thinking is useful here: the question isn’t just “does this channel work” but “does this channel reach people who aren’t already reachable through our existing mix.”
Conversion experiments are faster to run and easier to measure, which is why teams over-index on them. They’re valuable, but there’s a ceiling. You can optimise a funnel to near-perfection and still have a growth problem if the top of that funnel isn’t bringing in enough of the right people.
Retention and expansion experiments are chronically underinvested in most businesses I’ve worked with. The economics are compelling: it costs significantly less to grow revenue from an existing customer than to acquire a new one, and retention improvements compound over time in ways that acquisition improvements don’t. If your experimentation programme is 80% acquisition and conversion and 20% retention, the allocation is probably wrong.
Forrester’s work on intelligent growth models makes a similar point about the importance of balancing growth levers rather than optimising a single one. The businesses that sustain growth over time are rarely the ones that found one thing that worked and scaled it indefinitely. They’re the ones that kept testing across the full model.
The Confidence Problem: When Experiments Lie to You
I judged the Effie Awards for a period, and one of the things that experience reinforced for me was how often marketers present correlation as causation with complete confidence. Award entries are full of “we did X, and then Y happened.” The causal link is asserted, not demonstrated. The same pattern shows up in growth experimentation, and it’s worth being direct about.
Experiments can mislead you in several distinct ways. Novelty effects make new things look better than they are in the short term. Seasonality confounds results when tests run across different time periods. Selection bias distorts results when the audience seeing your test isn’t representative of the audience you’ll eventually scale to. And p-hacking, whether intentional or not, produces false positives when teams test multiple variations and report only the one that worked.
None of this means experimentation is unreliable. It means you need to be honest about the confidence level of your conclusions. A result from a two-week test with 500 conversions tells you something, but it doesn’t tell you what will happen when you scale that approach to a £2 million budget. Treating it as though it does is how growth programmes produce a string of “winning” tests that somehow don’t add up to actual growth.
The discipline I’d recommend is to explicitly categorise your experimental results: directional signal, moderate confidence, high confidence. Most tests, honestly assessed, sit in the first category. That’s fine. Directional signals are useful. They stop being useful when they’re treated as certainties.
Building an Experimentation Culture Without the Cult
There’s a version of “experimentation culture” that I’ve seen go wrong in agencies and in-house teams alike. It becomes a philosophy rather than a practice. Teams talk about being “data-driven” and “test-and-learn” as identity markers rather than as descriptions of how they actually work. The language of experimentation becomes a substitute for the discipline of it.
When I was growing a team from around twenty people to over a hundred, one of the things I had to be deliberate about was the difference between culture and process. Culture is what happens when no one is looking. Process is what happens when someone is. You need both, but they’re not the same thing, and conflating them produces organisations that talk about experimentation without running many experiments.
The practical foundations of an experimentation culture are less exciting than the philosophy. You need a shared log of experiments run, results achieved, and conclusions drawn. You need a norm that failed experiments are discussed openly rather than quietly buried. You need decision-making authority to be clear: who can approve an experiment, who can call it, who can scale the result. And you need leadership that responds to “we ran a test and it didn’t work” with curiosity rather than disappointment.
That last one is harder than it sounds. I’ve been in rooms where a failed test was treated as a failure of the person who proposed it. That dynamic kills experimentation programmes faster than any methodological problem. People stop proposing tests that might not work, which means they stop proposing the interesting ones.
There’s a moment early in my career that I still think about. I was in a brainstorm, the founder had to leave for a client meeting, and he handed me the whiteboard pen in front of a room full of people who’d been doing this longer than I had. My immediate internal reaction was something close to panic. But I did it anyway, and the thing I learned wasn’t about the specific ideas that came out of that session. It was that you don’t wait until you’re certain before you try something. You design for learning, not for certainty, and you act on what you find.
That instinct, applied at scale, is what a real experimentation culture looks like.
From Experiment to Scale: The Step Most Teams Skip
Running experiments is the visible part of the work. Scaling what works is where the value is actually created, and it’s where most programmes have their biggest gap.
The scaling problem is partly organisational and partly methodological. Organisationally, the team that runs experiments is often not the team that controls budget allocation, channel strategy, or creative production. A result that should trigger a meaningful shift in investment instead gets written up in a report and sits in a shared drive. The experiment worked. Nothing changed.
Methodologically, scaling requires understanding not just that something worked but why it worked and under what conditions. An experiment that succeeded with a specific audience segment at a specific spend level may not replicate at ten times the budget or across a broader audience. The mechanism matters as much as the result.
This is where growth tooling can genuinely help, not because the tools create the insight but because they make it easier to track what’s actually happening as you scale a result. The risk is treating the tool as the answer rather than as infrastructure for better questions.
The businesses that get this right tend to have a deliberate “scale gate” in their process: a set of criteria that a winning experiment has to meet before meaningful budget moves behind it. Sample size thresholds. Replication in a second test. A clear mechanism that explains the result. It slows things down slightly, but it prevents the common failure mode of scaling a false positive into a significant budget commitment.
If you’re working through how experimentation fits into a broader commercial growth framework, the go-to-market and growth strategy hub covers the wider strategic context that experimentation programmes need to operate within.
The Experiments Worth Running Right Now
Without knowing your specific business, I can’t tell you what to test. But I can tell you the categories of experiment that are most consistently underinvested across the businesses I’ve worked with and observed.
Audience expansion experiments. Most businesses have a core customer profile and optimise relentlessly for it. The question worth testing is whether there are adjacent audiences who have the same underlying need but don’t yet know your product exists. This is harder to measure than conversion rate optimisation, but the upside is proportionally larger. Forrester’s thinking on agile scaling touches on the importance of building the organisational capacity to pursue these kinds of non-obvious growth paths.
Message and positioning experiments. Most teams test creative execution without testing the underlying positioning. The result is highly optimised delivery of a message that may not be the most compelling one available. Testing whether a different value proposition resonates more strongly with a target segment is a higher-order experiment than testing button colour, but it’s also higher-order in terms of potential impact.
Channel mix experiments. The default channel mix at most businesses is largely inherited from decisions made two or three years ago. Testing whether a channel that wasn’t viable then is viable now, or whether a channel you’ve deprioritised might perform differently with a different creative approach, is worth doing systematically. Research into pipeline and revenue potential consistently points to significant untapped opportunity in channels that teams have written off based on early or outdated tests.
Pricing and packaging experiments. These are the experiments most teams are most reluctant to run, because they touch revenue directly. That reluctance is understandable but often misplaced. If you don’t know how your customers respond to different pricing structures, you’re leaving commercial optimisation entirely to intuition. The same logic that applies to creative testing applies here.
About the Author
Keith Lacy is a marketing strategist and former agency CEO with 20+ years of experience across agency leadership, performance marketing, and commercial strategy. He writes The Marketing Juice to cut through the noise and share what works.
