eBay’s Experiment on Paid Search Advertising Profitability

Reading time

5 min

Section

Tutorial Session 5

5 min left0%

ebay’s experiment on paid search advertising profitability0%

5 min left

Data-driven experiments are the clearest way to separate luck from impact. If you cannot test it, you cannot know it.

Talvinder Singh, from a Pragmatic Leaders data session

eBay ran a controlled experiment to understand whether their spending on paid search advertising via Google AdWords actually increased revenue. They paused paid search ads in 70 designated market areas (DMAs) across the USA and compared revenue changes against DMAs that continued advertising.

The actual job in this experiment was to quantify the causal effect of pausing Google AdWords on eBay’s revenue. Without this, the team would be guessing whether advertising spend was justified or not.

This kind of experiment — a randomized controlled trial — is the gold standard for product teams to validate the impact of major investments. eBay’s approach shows how to combine business intuition with statistical rigor.

The stakes of paid search advertising

Paid search ads are a major acquisition channel for many digital marketplaces. They are expensive and measurable, but their direct impact on revenue is often unclear.

For eBay, the question was: Does pausing paid search ads reduce revenue, and by how much?

A positive result would justify continued spend. A null or negative result would call for reallocation of budget.

Without controlled testing, teams rely on correlation or intuition — which can lead to wasteful spending.

The experiment design: pausing ads in selected geographies

eBay chose 70 DMAs in the US as the experimental group where Google AdWords spending was paused. Other DMAs continued as usual.

This geographic split allowed comparison of revenue ratios between experimental and control groups.

The key variable of interest was the absolute difference in revenue across DMAs — which varies naturally due to population size and market characteristics.

The experiment tested the null hypothesis:

There is no difference in revenue generation after pausing Google AdWords.

Rejecting this null would mean the pause caused a significant revenue change.

Using a linear model to measure impact

To analyze the data, eBay built a linear regression model:

The dependent variable was the revenue ratio (post-pause revenue / pre-pause revenue) for each DMA.
The independent variable was a binary indicator denoting whether the DMA was in the experimental group (paused ads) or control group.

The model estimated the average difference in revenue ratio attributed to pausing ads, controlling for other factors.

The results indicated a roughly 5% negative offset in revenue ratio when AdWords spending was paused.

This means pausing ads led to a 5% revenue drop on average across those DMAs.

Accounting for randomness: randomization testing

A single linear model estimate can be misleading if the observed effect is due to chance.

To address this, eBay applied a randomization test:

They shuffled the experimental group labels randomly.
For each shuffle, they re-ran the linear model.
They repeated this 10,000 times to build a distribution of revenue differences under the null hypothesis.

The observed 5% difference was compared against this distribution.

The chance of seeing such a difference by random luck was very low, indicating statistical significance.

Interpreting the p-value and business impact

The p-value quantifies the probability of observing the effect size if the null hypothesis were true.

A very small p-value means the observed revenue drop is unlikely to be a coincidence.

This gave eBay confidence that their extra spending on Google AdWords was a good decision — it materially increased revenue.

Applying this to product feature testing at eBay

Imagine you are now a product manager at eBay, tasked with testing a new feature added to the landing page.

Here is how you would approach it, inspired by the paid search experiment:

What data would you collect?

Primary metric (dependent variable): revenue or conversion rate attributable to the landing page.
Feature flag: whether a user saw the new landing page or the old one.
User attributes: geography, device, referral source, time of day.
Engagement metrics: bounce rate, session duration, click-through rate on landing page elements.
Experiment group assignment: control vs treatment.

How to design data collection strategies?

Use A/B testing with random assignment of users to control or treatment.
Ensure sample size is statistically powered to detect meaningful differences.
Collect data for a sufficient duration to capture seasonality effects.
Track experiment integrity — confirm users do not leak between groups.
Instrument all relevant touchpoints in analytics tools (Google Analytics, Mixpanel, etc.).

What variables to consider for regression analysis?

Variable type	Examples	Role
Independent variables	Feature flag (0 = old, 1 = new), user demographics, traffic source	Predictors
Dependent variables	Revenue per user, conversion rate, average order value	Outcomes
Control variables	Time of day, day of week, device type	Adjust for confounders

Building the regression model

You would build a linear regression model such as:

Revenue per user = β₀ + β₁ × Feature flag + β₂ × Device + β₃ × Traffic source + … + ε

The coefficient β₁ estimates the incremental impact of the new landing page.
Statistical significance testing determines if the effect is reliable.
Confidence intervals quantify the precision of the estimate.

Validating results with randomization tests or permutation tests

To ensure robustness, you can apply randomization tests similar to eBay’s approach:

Shuffle feature flag assignments.
Recompute model coefficients repeatedly.
Compare observed effect to the null distribution.

This guards against spurious correlations.

From experiment to decision

If your analysis shows a statistically significant positive revenue lift, you can recommend rolling out the new landing page.

If results are inconclusive or negative, you can iterate on the design or scrap the feature.

This data-driven approach minimizes risk and maximizes learning.

Indian context: relevance of experiment design

Indian digital marketplaces like Flipkart, Meesho, and Razorpay also rely heavily on paid advertising.

However, experiment design must account for:

Diverse user demographics and behaviors across states.
Variable internet connectivity affecting session data quality.
Multiple payment and delivery preferences impacting conversion metrics.

Adapting experiment methodology to local market nuances is critical.

Test yourself: Landing page A/B test at eBay India

// learn the judgment

You are PM at eBay India launching a new landing page feature. You randomly assign 50% of users to the new page and 50% to the existing page. After two weeks, revenue per user in the treatment group is 3% higher but with a p-value of 0.12. Bounce rates are unchanged. Your analytics team suggests running the test for one more week.

The call: What should you recommend? Continue the test, launch the feature, or scrap it? How do you justify your decision?

Your reasoning:

Where to go next

If you want to master experimentation and A/B testing: Experimentation and A/B Testing
If you want to learn data-driven decision making: Data-Driven Product Management
If you want to build predictive models: Regression Analysis for PMs
If you want to understand Indian digital market nuances: Designing for India
If you want to apply statistics confidently: Statistics for Product Managers

Data-driven experiments are the clearest way to separate luck from impact. If you cannot test it, you cannot know it.

Talvinder Singh, from a Pragmatic Leaders data session

The stakes of paid search advertising

Paid search ads are a major acquisition channel for many digital marketplaces. They are expensive and measurable, but their direct impact on revenue is often unclear.

For eBay, the question was: Does pausing paid search ads reduce revenue, and by how much?

A positive result would justify continued spend. A null or negative result would call for reallocation of budget.

Without controlled testing, teams rely on correlation or intuition — which can lead to wasteful spending.

The experiment design: pausing ads in selected geographies

eBay chose 70 DMAs in the US as the experimental group where Google AdWords spending was paused. Other DMAs continued as usual.

This geographic split allowed comparison of revenue ratios between experimental and control groups.

The key variable of interest was the absolute difference in revenue across DMAs — which varies naturally due to population size and market characteristics.

The experiment tested the null hypothesis:

There is no difference in revenue generation after pausing Google AdWords.

Rejecting this null would mean the pause caused a significant revenue change.

Using a linear model to measure impact

To analyze the data, eBay built a linear regression model:

The dependent variable was the revenue ratio (post-pause revenue / pre-pause revenue) for each DMA.
The independent variable was a binary indicator denoting whether the DMA was in the experimental group (paused ads) or control group.

The model estimated the average difference in revenue ratio attributed to pausing ads, controlling for other factors.

The results indicated a roughly 5% negative offset in revenue ratio when AdWords spending was paused.

This means pausing ads led to a 5% revenue drop on average across those DMAs.

Accounting for randomness: randomization testing

A single linear model estimate can be misleading if the observed effect is due to chance.

To address this, eBay applied a randomization test:

They shuffled the experimental group labels randomly.
For each shuffle, they re-ran the linear model.
They repeated this 10,000 times to build a distribution of revenue differences under the null hypothesis.

The observed 5% difference was compared against this distribution.

The chance of seeing such a difference by random luck was very low, indicating statistical significance.

Interpreting the p-value and business impact

The p-value quantifies the probability of observing the effect size if the null hypothesis were true.

A very small p-value means the observed revenue drop is unlikely to be a coincidence.

This gave eBay confidence that their extra spending on Google AdWords was a good decision — it materially increased revenue.

Applying this to product feature testing at eBay

Imagine you are now a product manager at eBay, tasked with testing a new feature added to the landing page.

Here is how you would approach it, inspired by the paid search experiment:

What data would you collect?

Primary metric (dependent variable): revenue or conversion rate attributable to the landing page.
Feature flag: whether a user saw the new landing page or the old one.
User attributes: geography, device, referral source, time of day.
Engagement metrics: bounce rate, session duration, click-through rate on landing page elements.
Experiment group assignment: control vs treatment.

How to design data collection strategies?

Use A/B testing with random assignment of users to control or treatment.
Ensure sample size is statistically powered to detect meaningful differences.
Collect data for a sufficient duration to capture seasonality effects.
Track experiment integrity — confirm users do not leak between groups.
Instrument all relevant touchpoints in analytics tools (Google Analytics, Mixpanel, etc.).

What variables to consider for regression analysis?

Variable type	Examples	Role
Independent variables	Feature flag (0 = old, 1 = new), user demographics, traffic source	Predictors
Dependent variables	Revenue per user, conversion rate, average order value	Outcomes
Control variables	Time of day, day of week, device type	Adjust for confounders

Building the regression model

You would build a linear regression model such as:

Revenue per user = β₀ + β₁ × Feature flag + β₂ × Device + β₃ × Traffic source + … + ε

The coefficient β₁ estimates the incremental impact of the new landing page.
Statistical significance testing determines if the effect is reliable.
Confidence intervals quantify the precision of the estimate.

Validating results with randomization tests or permutation tests

To ensure robustness, you can apply randomization tests similar to eBay’s approach:

Shuffle feature flag assignments.
Recompute model coefficients repeatedly.
Compare observed effect to the null distribution.

This guards against spurious correlations.

From experiment to decision

If your analysis shows a statistically significant positive revenue lift, you can recommend rolling out the new landing page.

If results are inconclusive or negative, you can iterate on the design or scrap the feature.

This data-driven approach minimizes risk and maximizes learning.

Indian context: relevance of experiment design

Indian digital marketplaces like Flipkart, Meesho, and Razorpay also rely heavily on paid advertising.

However, experiment design must account for:

Diverse user demographics and behaviors across states.
Variable internet connectivity affecting session data quality.
Multiple payment and delivery preferences impacting conversion metrics.

Adapting experiment methodology to local market nuances is critical.

Test yourself: Landing page A/B test at eBay India

// learn the judgment

The call: What should you recommend? Continue the test, launch the feature, or scrap it? How do you justify your decision?

Your reasoning:

Where to go next

If you want to master experimentation and A/B testing: Experimentation and A/B Testing
If you want to learn data-driven decision making: Data-Driven Product Management
If you want to build predictive models: Regression Analysis for PMs
If you want to understand Indian digital market nuances: Designing for India
If you want to apply statistics confidently: Statistics for Product Managers