Marketing Regression Analysis: What the Numbers Tell You

Marketing regression analysis is a statistical method that quantifies the relationship between marketing inputs and business outcomes, most commonly revenue or sales volume. It separates what your marketing actually contributed from what would have happened anyway, giving you a more honest read on where your budget is working and where it is not.

Done well, it is one of the few tools in the planner’s kit that can genuinely challenge assumptions rather than confirm them. Done poorly, it produces confident-looking numbers that are wrong in ways nobody notices until the business is already in trouble.

Key Takeaways

  • Regression analysis isolates the incremental contribution of marketing spend by controlling for variables like seasonality, price changes, and competitor activity, not just correlating spend with sales.
  • Most lower-funnel attribution overstates performance because it takes credit for demand that already existed. Regression helps expose that gap.
  • The quality of your output is entirely dependent on the quality and completeness of your input data. Garbage in, confident-looking garbage out.
  • Regression is a diagnostic tool, not a decision engine. It tells you what happened, not what to do next. That interpretation still requires commercial judgement.
  • Multicollinearity, omitted variable bias, and short data windows are the three most common ways regression results mislead marketers who do not know what to look for.

Why Most Marketers Are Working With Incomplete Numbers

Early in my career I was deeply invested in performance marketing. Click-through rates, cost per acquisition, return on ad spend. The dashboards were clean, the numbers moved in the right direction, and the story they told was compelling. We were generating results. I believed it.

It took me years to properly interrogate what those numbers were actually measuring. A significant portion of what lower-funnel channels were being credited for was demand that already existed. People who were going to buy anyway, finding a slightly different path to the checkout. The attribution model called it a conversion. The business called it growth. Neither label was quite right.

Regression analysis, when applied properly, is one of the tools that forces that conversation. It does not care about last-click attribution or platform-reported ROAS. It looks at the actual relationship between what you spent and what changed in the business, after accounting for everything else that was happening at the same time.

That is a harder question to answer. It is also a more useful one.

What Regression Analysis Actually Does

At its core, regression analysis estimates the relationship between a dependent variable (usually sales, revenue, or some other business outcome) and a set of independent variables (your marketing inputs, plus everything else that influences the outcome).

In a marketing context, those independent variables typically include spend by channel, pricing data, promotional activity, seasonality indices, distribution metrics, competitor spend where available, and macroeconomic indicators. The model estimates how much each variable contributes to changes in the dependent variable, holding everything else constant.

The output is a set of coefficients. Each coefficient tells you how much the dependent variable is expected to change for a one-unit increase in that independent variable. In plain terms: if you increase TV spend by £100,000, the model estimates you will generate an additional X units of sales, all else being equal.

That “all else being equal” clause is doing a lot of work. It is also where most of the complexity lives.

This type of modelling sits at the more rigorous end of the go-to-market and growth strategy toolkit. It is not a quick-win tactic. It is infrastructure for making better resource allocation decisions over time.

The Three Variables That Distort Marketing Regression Results

There are a handful of technical problems that come up repeatedly in marketing regression work. Not all of them are obvious, and some are actively obscured by the confidence of the output.

Multicollinearity occurs when two or more of your independent variables are highly correlated with each other. In marketing, this is common. TV spend and digital spend often move together because budgets are allocated in planning cycles. If both increase at the same time, the model struggles to separate their individual contributions. The coefficients become unstable and the confidence intervals widen. You end up with a model that fits the historical data reasonably well but cannot reliably attribute contribution to individual channels.

Omitted variable bias is what happens when an important driver of your outcome is not included in the model. If your sales are significantly influenced by a competitor running a major promotion, and you do not have that data in the model, its effect gets absorbed into whichever of your variables happens to correlate with it. The model is not wrong in a way you can easily see. It is wrong in a way that looks plausible.

I saw this play out at an agency I ran. We were building econometric models for a retail client, and the results kept overstating the contribution of their email programme. It took several iterations to identify that the email send schedule was correlated with their seasonal peaks, which were being driven primarily by weather and school holidays. Once we included a proper seasonality variable, the email contribution dropped to something more credible.

Short data windows are the third common problem. Regression models need enough data points to identify stable relationships. In practice, many marketing teams try to run models on 12 to 18 months of weekly data. That is 52 to 78 observations, which sounds like a lot until you consider how many variables you are trying to model simultaneously. The rule of thumb is at least 10 to 15 observations per variable. Violate that and the model overfits the historical data and performs poorly as a forward-looking tool.

How Regression Fits Into Marketing Mix Modelling

Marketing mix modelling (MMM) is the most common application of regression analysis in marketing. It is the same underlying methodology, applied at scale to decompose the contribution of every element of the marketing mix to a business outcome over time.

MMM has been around since the 1960s and was used extensively by large FMCG companies long before digital attribution became the dominant conversation. The irony is that the industry largely abandoned it during the rise of digital, seduced by the apparent precision of click-level data, and is now returning to it as the limitations of digital attribution become harder to ignore.

The appeal of MMM is that it is channel-agnostic. It does not care whether you are measuring TV, paid search, out-of-home, or trade promotions. If you have spend data and outcome data over a long enough time period, you can model the contribution of any input. That makes it genuinely useful for understanding the full marketing mix rather than just the channels that happen to have good tracking.

The limitation is that it is retrospective. MMM tells you what happened. It does not tell you what will happen if you change your mix. To use it for planning, you need to make assumptions about how the relationships you have observed historically will hold in future conditions. Those assumptions are often more uncertain than the models imply.

For context on how leading organisations approach scaling analytical capabilities like this, the BCG framework on scaling agile is worth reading, not because it addresses MMM directly, but because the organisational conditions it describes, cross-functional teams, iterative cycles, tolerance for learning, apply equally to building a credible measurement capability.

Building a Regression Model: What Good Practice Looks Like

The mechanics of running a regression are not the hard part. Any competent analyst with access to R, Python, or even Excel can produce a regression output in an afternoon. The hard part is building a model that is actually telling you something true.

Good practice starts with data collection. You need at minimum two to three years of weekly data to build a model with enough statistical power to be useful. That means historical spend by channel, sales or revenue data at the same frequency, pricing history, promotional calendars, distribution data if relevant, and some proxy for competitor activity. Sourcing all of this from organisations where data lives in different systems and different teams is often the single biggest constraint on the project timeline.

Once you have the data, variable selection matters more than the choice of regression technique. You are trying to include every meaningful driver of the outcome while avoiding multicollinearity. That requires both statistical testing and commercial judgement. A variable can be statistically significant and commercially meaningless. A variable can be commercially important and statistically insignificant because the data is too noisy. Neither situation is resolved by the model alone.

Adstock transformations are a standard step in marketing regression that deserves more attention than it usually gets. Most marketing channels have a carryover effect: the impact of a spend event does not disappear immediately but decays over time. TV advertising in particular can have effects that persist for weeks. Adstock models that decay by applying a geometric decay function to the spend data, capturing the idea that last week’s advertising still has some residual effect this week. Getting the decay rate right matters. Too short and you understate the channel’s contribution. Too long and you overstate it.

Saturation curves are the other transformation worth understanding. Most channels exhibit diminishing returns at high spend levels. The relationship between spend and sales is not linear. A log transformation of spend, or an S-curve specification, captures this more accurately than a linear model. This matters enormously for budget optimisation, because a linear model will always recommend putting all your budget into the highest-coefficient channel, which is almost never the right answer in practice.

What Regression Tells You About Channel Contribution

One of the most valuable outputs of a well-built regression model is a decomposition of sales: a breakdown of what proportion of your total sales volume is attributable to each variable in the model. This typically shows a base level of sales (what you would sell with no marketing at all, driven by brand equity, distribution, and organic demand) and an incremental layer driven by each marketing input.

In my experience judging the Effie Awards, the entries that stood out were almost always the ones that could demonstrate genuine incrementality rather than correlation. Anyone can show that sales went up when spend went up. The more interesting question is how much of that sales increase would have happened anyway, and how much was genuinely caused by the marketing activity.

Regression-based decomposition gives you a defensible answer to that question. It also tends to produce some uncomfortable findings. The base level of sales is often higher than marketing teams expect, which implies that a larger proportion of sales than previously assumed would happen without any marketing at all. That is a useful finding for a CFO. It is a less comfortable one for a CMO whose budget is under scrutiny.

Understanding market penetration dynamics is relevant here because the decomposition often reveals that growth is being driven more by distribution and pricing than by advertising. That does not mean advertising is not working. It means the business needs to understand the full picture of what is driving performance before making channel investment decisions.

The Limits of Regression in a Modern Media Environment

Regression analysis has real limitations, and being honest about them is part of using it well.

The fragmentation of media has made the data collection problem harder. When a brand was running TV, press, and radio, the spend data was relatively clean and the channels were distinct. Now a single campaign might involve paid social across five platforms, programmatic display, influencer partnerships, podcast sponsorships, and connected TV. Each has different data quality, different reporting conventions, and different carryover characteristics. Aggregating that into a coherent model requires significant data engineering before you even start the statistical work.

The rise of creator-led marketing adds another layer of complexity. When a brand works with creators across platforms, the spend is often structured differently (flat fees, gifting, performance tiers) and the effect is harder to isolate because it interacts with organic amplification in ways that are difficult to model cleanly. Platforms like Later have explored how creator campaigns convert, but that kind of qualitative understanding needs to sit alongside, not instead of, quantitative modelling.

There is also the question of what regression cannot measure at all. Brand equity, customer experience, product quality, and word of mouth all influence sales outcomes. Some of these can be proxied in a model (brand tracking scores as a variable, for example) but many cannot. The model captures what it can measure. The things it cannot measure do not disappear from the real world just because they are absent from the equation.

I have seen businesses make significant budget cuts based on regression outputs that showed low contribution from brand-building activity, only to watch their base sales erode over the following 18 months as the unmeasured equity effects unwound. The model was technically correct about the short-term contribution. It was silent on the long-term consequences.

Using Regression for Budget Optimisation

The practical application most marketing teams want from regression is budget optimisation: given a fixed total budget, how should it be allocated across channels to maximise the outcome?

This is where the saturation curve work becomes critical. Once you have estimated the response curve for each channel (how sales respond to increasing spend levels, including diminishing returns), you can calculate the marginal return on the next pound spent in each channel. Optimal allocation means spending up to the point where the marginal return is equal across all channels. In practice this is an iterative calculation, but the principle is straightforward.

The output is typically a set of recommended spend ranges for each channel, along with a simulated revenue outcome for different total budget levels. This is useful for planning conversations because it translates statistical outputs into commercial language. It also makes the trade-offs explicit: if you cut the total budget by 20%, the model can show you which channels to cut first (those with the flattest response curves at current spend levels) and what the expected revenue impact will be.

When I was running agencies and managing large ad spend portfolios across multiple clients, the conversations that went best were the ones where we could show clients not just what we recommended, but what the data suggested would happen if they did something different. Regression-informed optimisation makes that kind of scenario planning possible in a way that gut feel and platform-reported metrics simply cannot.

For teams building out their broader growth infrastructure, the wider go-to-market and growth strategy thinking on this site covers how measurement capabilities like this sit within a broader commercial framework.

When Regression Reveals Something the Business Does Not Want to Hear

The most valuable outputs from regression analysis are often the most uncomfortable ones. A model that confirms what everyone already believed is intellectually satisfying but commercially limited. A model that challenges the narrative is where the real value sits.

The uncomfortable finding might be that your highest-spend channel has a lower incremental contribution than assumed. It might be that pricing has a larger effect on volume than all your marketing activity combined. It might be that distribution, the number of stores stocking the product or the prominence of its placement, is driving more of your sales than any media investment.

These findings are not arguments against marketing. They are arguments for understanding what marketing is actually doing relative to everything else that drives the business. A company that genuinely understood its full sales decomposition would make very different decisions about where to invest, and where to fix things that marketing cannot fix.

Marketing is sometimes used as a blunt instrument to prop up businesses with more fundamental problems. A regression model will not tell you that directly, but it will often reveal the symptoms: low base sales that are declining over time, high promotional dependency, or diminishing returns across all channels simultaneously. Those patterns point to product, distribution, or pricing issues that no amount of media spend will resolve.

For context on how organisations in complex sectors approach this kind of market analysis, the Forrester analysis of go-to-market challenges in healthcare illustrates how even well-resourced businesses can misread what is driving their commercial performance when they rely on incomplete measurement frameworks.

Similarly, BCG’s work on biopharma go-to-market strategy shows how rigorous pre-launch analysis, including regression-based demand modelling, shapes commercial decisions in high-stakes environments where the cost of misreading the market is severe.

What Good Looks Like: Practical Standards for Marketing Regression

If you are commissioning or reviewing regression-based analysis, there are a few standards worth holding the work to.

The model should be able to explain its variables clearly, including why each was included and what it is proxying. If the analyst cannot explain why a variable is in the model in plain English, that is a warning sign.

The model should report goodness-of-fit statistics (R-squared and adjusted R-squared) alongside the coefficients, and those statistics should be interpreted honestly. An R-squared of 0.85 sounds impressive. In a model with many variables and a small dataset, it may indicate overfitting rather than genuine explanatory power.

The model should be validated out-of-sample where possible. That means holding back a portion of the data, fitting the model on the remainder, and testing whether its predictions hold on the data it has not seen. Models that fit historical data well but fail on holdout periods are not reliable for forward planning.

The outputs should be presented with uncertainty ranges, not just point estimates. A coefficient of 2.3 with a confidence interval of 0.8 to 3.8 is a very different finding from a coefficient of 2.3 with a confidence interval of 2.1 to 2.5. The first is imprecise. The second is strong. Presenting only the point estimate without the uncertainty range is a form of false precision that misleads decision-makers.

And the findings should be stress-tested against commercial common sense. If the model says that radio has a higher ROI than your entire digital programme, that is worth interrogating before acting on. It might be true. It might also be an artefact of multicollinearity or a data issue with how radio spend was recorded. Statistical outputs need commercial interpretation, not just commercial acceptance.

For teams building growth programmes grounded in evidence rather than assumption, understanding how growth strategies translate into measurable outcomes is a useful complement to the modelling work itself.

About the Author

Keith Lacy is a marketing strategist and former agency CEO with 20+ years of experience across agency leadership, performance marketing, and commercial strategy. He writes The Marketing Juice to cut through the noise and share what works.

Frequently Asked Questions

What is marketing regression analysis used for?
Marketing regression analysis is used to quantify the relationship between marketing inputs (spend by channel, promotions, pricing) and business outcomes (sales, revenue). It separates the incremental contribution of marketing from baseline sales and other external factors, giving a more accurate picture of what is actually driving performance.
How is regression analysis different from standard marketing attribution?
Standard attribution models (last-click, multi-touch) track individual customer journeys through digital touchpoints. Regression analysis works at an aggregate level, modelling the relationship between total spend and total outcomes over time. It captures the effects of offline channels and controls for external variables that attribution models ignore entirely, making it more useful for full-mix budget decisions.
How much data do you need to run a marketing regression model?
As a general rule, you need at least two to three years of weekly data and a minimum of 10 to 15 observations per independent variable in the model. Shorter data windows increase the risk of overfitting, where the model fits historical patterns well but performs poorly as a predictive or planning tool.
What is an adstock transformation in marketing regression?
An adstock transformation accounts for the fact that advertising has a carryover effect: the impact of spend does not disappear immediately but decays over time. It applies a decay function to historical spend data so the model captures residual effects from previous weeks. Getting the decay rate right is important because it directly affects how much contribution is attributed to each channel.
What are the most common mistakes in marketing regression analysis?
The most common mistakes are multicollinearity (including variables that move together, making individual contributions impossible to separate), omitted variable bias (leaving out important drivers so their effect gets absorbed by other variables), using too short a data window, presenting point estimates without uncertainty ranges, and failing to validate the model on out-of-sample data before using it for planning.

Similar Posts