Proxies for Market Research: What to Use When Direct Data Doesn’t Exist

Proxies for market research are indirect data sources that stand in for direct measurement when primary research is unavailable, too slow, or too expensive. Instead of surveying your target market directly, you read signals from adjacent behaviours, observable outputs, and publicly available data to infer what you cannot measure outright.

The technique is older than the discipline itself. Economists have used it for decades. Epidemiologists rely on it constantly. In marketing, it tends to get overlooked in favour of research methods that feel more rigorous, even when those methods are slower, costlier, and often no more accurate in practice.

Key Takeaways

  • Proxies are indirect signals that substitute for direct measurement when primary research is impractical, too slow, or cost-prohibitive.
  • The strongest proxies share a causal or structural relationship with the thing you are trying to measure, not just a surface-level correlation.
  • Search behaviour, job postings, pricing signals, and competitor activity are four of the most reliable proxy categories available to marketers without primary research budgets.
  • Proxy research compounds well: a single indirect signal is weak, but three or four pointing in the same direction creates defensible commercial intelligence.
  • The risk is not using proxies at all, it is using them without understanding their limitations and presenting the result as something more certain than it is.

I have run agencies where commissioning a formal research study simply was not an option. Not because the insight was unimportant, but because the timeline was three weeks and the budget was already allocated. You either find a workable alternative or you make decisions with less information than you should have. Proxies are the workable alternative.

What Makes Something a Useful Proxy

Not all indirect signals are equally useful. The difference between a good proxy and a misleading one comes down to the relationship between the signal and the thing you are trying to measure.

A good proxy has one of three relationships with your target variable. It is structurally connected, meaning the proxy is a natural byproduct of the behaviour you care about. It is causally upstream, meaning the proxy tends to precede or predict the behaviour. Or it is a close substitute, meaning it measures something so similar that the gap between them is narrow enough to be manageable.

Search volume is a strong proxy for category interest because it is structurally connected. When someone searches for a product category, that search is itself an expression of intent. You are not inferring intent from something loosely related; you are reading a direct output of the intent you care about. That is why search engine marketing intelligence has become one of the most reliable inputs for market sizing, demand mapping, and competitive positioning, even when no primary research exists.

A weak proxy, by contrast, is one where the connection is plausible but not structural. Social media follower counts as a proxy for brand preference, for instance. It feels intuitive. It is also unreliable enough that decisions built on it tend to be wrong in ways that are hard to diagnose.

The test I apply is simple: if the proxy moved significantly but the underlying behaviour did not change, how often would that happen? If the answer is “frequently,” the proxy is too loose to rely on.

The Four Proxy Categories That Actually Deliver

Across the work I have done, across agency environments, client-side projects, and turnaround situations, four proxy categories have consistently proven their weight.

Search Behaviour

Organic and paid search data is the closest thing to a real-time demand census that most marketers have access to. When I was running paid search at scale, managing campaigns across multiple verticals and geographies, the keyword data was often more revealing about market structure than any research brief I received from clients. You could see which problem framings people were using, which competitor names were being searched alongside product categories, and where demand was growing or contracting, all without a single survey.

The early days of paid search had a rawness to them that made this particularly visible. Running a campaign and watching intent data accumulate in near real-time was genuinely instructive. It showed you what people wanted, in their own language, at the moment they wanted it. That is a hard thing to replicate through any other research method.

For proxy purposes, search behaviour is most useful for measuring category-level demand, identifying problem language your audience actually uses, and tracking whether interest in a topic is growing or declining over time.

Job Posting Data

This one is underused. When a company posts a job, it reveals something about its priorities, its technology stack, its growth trajectory, and sometimes its pain points. A B2B technology company that posts five data engineering roles in a quarter is probably building out infrastructure. A retail brand hiring aggressively for CRM tells you something about where they think their customer retention problem sits.

For competitive intelligence and market sizing, job posting data is particularly useful because it is a leading indicator. Companies hire in advance of revenue, not after it. If you are trying to understand whether a market segment is growing, watching the hiring patterns of companies in that segment will often tell you before the financial results do.

This connects directly to the kind of work involved in building a rigorous ICP scoring rubric for B2B SaaS, where signals like hiring activity, technology adoption, and organisational structure help qualify accounts at scale without relying on direct outreach for every data point.

Pricing and Commercial Signals

How competitors price their products tells you a great deal about how they perceive the market, who they are targeting, and what they believe buyers will pay. Price points, packaging structures, discount behaviours, and the presence or absence of enterprise tiers are all readable signals about competitive positioning and market segmentation.

I have used pricing analysis as a proxy for market maturity on several occasions. In an immature market, pricing tends to be variable and experimental. As a market matures, pricing structures converge and the variance narrows. If you are entering a new category and trying to understand where the market is in its development cycle, a systematic review of competitor pricing over time can tell you more than most analyst reports.

Review and Complaint Data

Public reviews, forum discussions, and complaint threads are a form of unsolicited primary research. The people writing them had no incentive to participate in a survey. They wrote because they felt strongly enough to do so, which means the signal tends to be concentrated around genuine pain points and genuine satisfactions rather than the mild, averaged-out responses you often get from structured research.

The limitation is selection bias. The people who write reviews are not a representative sample. But for understanding the texture of customer experience, the language people use to describe problems, and the specific failure modes of competitor products, review data is often more useful than anything you would get from a structured survey. This is a core component of marketing services pain point research, where the goal is to understand friction at the level of actual customer language, not sanitised survey responses.

Grey Market Data and What It Tells You

There is a category of research inputs that sits in an interesting space: not primary, not conventional secondary, but observable data from sources that are publicly accessible but not designed for research purposes. Regulatory filings, planning applications, patent registrations, import and export records, and tender documents all fall into this territory.

I think of this as grey market research, and it is consistently undervalued. A patent filing tells you what a competitor is working on before they announce it. A planning application tells you where a retailer is expanding before the press release. An import record tells you what a brand is sourcing and at what volume. None of this requires a research budget. It requires someone who knows where to look and has the patience to read documents that most people ignore.

The challenge is that grey market data requires interpretation. It is raw signal without context, and drawing the wrong inference from it can send a strategy in the wrong direction. The discipline is in triangulating: if the patent filing, the job postings, and the pricing changes all point in the same direction, the inference becomes defensible. If only one signal is present, treat it as a hypothesis rather than a finding.

For a broader view of the research methods available across the spectrum, from structured primary approaches to these more indirect techniques, the Market Research and Competitive Intel hub covers the full landscape of tools and when to use them.

When Proxies Outperform Primary Research

There is a tendency to treat primary research as the gold standard and everything else as a compromise. That framing is too simple. Proxies outperform primary research in several specific situations.

When speed matters, proxies win. A search trend analysis can be completed in hours. A focus group takes weeks to commission, run, and analyse. If a market is moving quickly and you need to make a decision within a short window, waiting for primary research is itself a strategic error. I have seen clients commission research to validate a decision they needed to make in two weeks, receive the results in six, and find that the market had already moved. The research was not wrong; it was just irrelevant by the time it arrived.

When you are measuring behaviour rather than attitudes, proxies are often more accurate. People are not always honest in surveys, and they are not always self-aware about their own behaviour. But the search they ran at 11pm, the review they left unprompted, the product they bought, these are behaviours, and behaviours are harder to fake. The gap between what people say they do and what they actually do is well-documented across consumer research, and proxies that measure behaviour directly tend to close that gap.

When the sample you need does not exist in a panel, proxies are your only option. Niche B2B markets, emerging categories, and hard-to-reach professional audiences are all situations where primary research becomes logistically impractical. You either build a proxy-based picture of the market or you go in without one.

How to Build a Proxy Research Stack

The goal is not to find one perfect proxy. It is to build a small set of complementary signals that triangulate toward the same conclusion. I typically work with three to five proxy sources for any given research question, chosen so that each one has different failure modes. If they all fail in the same direction, I have not reduced my risk; I have just created an elaborate way to be wrong.

Start by defining the specific question you are trying to answer. “Understand the market” is not a research question. “Estimate whether demand in the mid-market segment is growing or contracting over the next twelve months” is a research question. The more specific the question, the easier it is to identify proxies that are genuinely relevant rather than vaguely interesting.

Map the behaviours that would be observable if your hypothesis were true. If mid-market demand is growing, you would expect to see increasing search volume in relevant category terms, more job postings for roles associated with the problem, and potentially new entrants or pricing changes in the competitive set. Each of these is a testable signal. None of them alone is conclusive. Together, they form a picture.

Assign confidence levels to each signal explicitly. A strong structural proxy, like search volume for a category you are entering, gets high confidence. A loose associative proxy, like social media sentiment, gets low confidence. When you synthesise the signals, weight them accordingly. This sounds obvious, but in practice it is easy to let the most available data point drive the conclusion rather than the most reliable one.

Document your assumptions. The value of proxy research is not just the conclusion it produces; it is the chain of reasoning that led there. If a decision is made based on proxy intelligence and it turns out to be wrong, you want to be able to identify which assumption failed. That is how you improve the methodology over time rather than just accepting that research is unreliable.

This kind of structured approach to indirect intelligence is also relevant when you are conducting a broader strategic assessment. The frameworks used in technology consulting business strategy alignment and SWOT analysis often depend on exactly this type of observable-signal-based market reading, particularly when client data is incomplete or when competitive dynamics are shifting faster than formal research can track.

The Limits of Proxy Research and How to Manage Them

Proxies are not a free lunch. The risks are real, and the most dangerous version of proxy research is the kind where the practitioner has forgotten they are working with indirect signals and started treating the proxy as if it were the thing itself.

Selection bias is the most common failure mode. Review data over-represents people who feel strongly. Search data over-represents people who are actively searching, which is not the same as the full population of potential buyers. Job posting data reflects what companies are willing to advertise publicly, which is not always what they are actually prioritising. Every proxy has a population it captures well and a population it misses. Knowing which is which is part of using the method responsibly.

Correlation without causation is the second major risk. Two signals moving together does not mean one causes the other, and it does not mean both are caused by the thing you think they are caused by. I have seen competitive intelligence built on proxy data that turned out to be tracking a confounding variable rather than the underlying market dynamic. The conclusion looked right, the logic seemed sound, and the decision it supported was wrong.

There is also the problem of what proxy research cannot measure at all. Latent demand, for instance, is almost impossible to capture through indirect signals. If a problem exists but people have not yet started searching for solutions, there is no search volume to measure. If a competitive threat is emerging in a market segment you are not currently monitoring, the proxy signals may not appear until the threat is already established. This is one reason why focus groups and qualitative research methods retain value even in an environment where behavioural data is abundant. They can surface things that people have not yet expressed through observable behaviour.

The honest framing for proxy research is that it produces hypotheses of varying confidence, not conclusions. The discipline is in being explicit about that distinction, especially when presenting findings to stakeholders who may not share your understanding of the methodology’s limitations.

Proxies in Practice: What This Looks Like in Real Decisions

Early in my career, when I was starting to understand how commercial decisions actually get made, I noticed that the most useful market intelligence was rarely the most expensive. The people making good calls were often reading signals that were freely available; they were just reading them more carefully and more systematically than everyone else.

One of the clearest examples I can give is from a period when I was working on a campaign launch with a very short runway. The brief was to validate whether there was meaningful demand in a category before committing significant budget. There was no time for primary research and no budget for it either. What we did instead was map search volume trends across relevant keyword clusters, analyse the pricing and positioning of the three main competitors, and read through several hundred customer reviews across the competitive set. That took about two days. The picture that emerged was specific enough to make a defensible budget recommendation, and the campaign performed in line with what the proxy signals had suggested.

That is not a dramatic story, but it is an honest one. Proxy research does not usually produce breakthroughs. It produces good-enough intelligence, quickly, at low cost, which is exactly what most commercial decisions actually need.

Win/loss analysis is another area where proxy methods add significant value. When you cannot get direct access to lost prospects, the signals left in their behaviour, the content they engaged with, the competitor they chose, the timing of their decision, can often reconstruct the decision logic well enough to be actionable. Win/loss analysis as a discipline has been formalised precisely because the indirect signals around a buying decision are often more honest than the direct feedback you get from the buyer after the fact.

The broader point is that proxy research is not a fallback for when you cannot do real research. It is a legitimate methodology with its own strengths, its own limitations, and its own best practices. Treating it as a second-class option means leaving a significant amount of commercial intelligence on the table.

If you are building out a broader research and intelligence capability, the articles in the Market Research and Competitive Intel hub cover the full range of methods, from structured primary approaches to competitive monitoring, with a consistent focus on what actually produces usable insight rather than what looks most rigorous on paper.

About the Author

Keith Lacy is a marketing strategist and former agency CEO with 20+ years of experience across agency leadership, performance marketing, and commercial strategy. He writes The Marketing Juice to cut through the noise and share what works.

Frequently Asked Questions

What is a proxy in market research?
A proxy in market research is an indirect data source used to infer something you cannot measure directly. Instead of surveying buyers about their intent, for example, you might analyse search volume trends as a structural signal of demand. Good proxies share a causal or structural relationship with the variable you are trying to measure, rather than just a surface-level correlation.
When should you use proxies instead of primary research?
Proxies are most appropriate when primary research is too slow for the decision at hand, when the budget does not support commissioning formal research, when the target population is too niche or hard-to-reach for a panel, or when you need to measure actual behaviour rather than stated attitudes. They are also useful as a complement to primary research, providing real-time signals between formal research cycles.
What are the most reliable proxy sources for B2B market research?
For B2B contexts, the most reliable proxy sources include search volume data for relevant category and problem-framing keywords, job posting data as a leading indicator of company priorities and growth trajectory, competitor pricing and packaging changes as signals of market positioning, and public review data on competitor products for understanding customer pain points. Regulatory filings, patent registrations, and tender documents can also provide high-value intelligence in specific sectors.
What are the main risks of relying on proxy data?
The main risks are selection bias, where the proxy captures a non-representative subset of the population you care about; false correlation, where two signals move together for reasons unrelated to your hypothesis; and the risk of treating the proxy as equivalent to the thing it represents rather than as an approximation. The discipline is in documenting your assumptions, assigning explicit confidence levels to each signal, and triangulating across multiple proxies with different failure modes.
How many proxy sources should you use for a single research question?
Three to five proxy sources is a practical working range for most research questions. The goal is triangulation: using sources that have different failure modes so that if they all point in the same direction, the inference becomes defensible. Using a single proxy, however strong, leaves you exposed to the specific blind spots of that source. Using more than five tends to produce diminishing returns and can make the synthesis harder to communicate clearly to stakeholders.

Similar Posts