Data-driven experiments are the clearest way to separate luck from impact. If you cannot test it, you cannot know it.
eBay ran a controlled experiment to understand whether their spending on paid search advertising via Google AdWords actually increased revenue. They paused paid search ads in 70 designated market areas (DMAs) across the USA and compared revenue changes against DMAs that continued advertising.
The actual job in this experiment was to quantify the causal effect of pausing Google AdWords on eBay’s revenue. Without this, the team would be guessing whether advertising spend was justified or not.
This kind of experiment — a randomized controlled trial — is the gold standard for product teams to validate the impact of major investments. eBay’s approach shows how to combine business intuition with statistical rigor.
The stakes of paid search advertising
Paid search ads are a major acquisition channel for many digital marketplaces. They are expensive and measurable, but their direct impact on revenue is often unclear.
For eBay, the question was: Does pausing paid search ads reduce revenue, and by how much?
A positive result would justify continued spend. A null or negative result would call for reallocation of budget.
Without controlled testing, teams rely on correlation or intuition — which can lead to wasteful spending.
The experiment design: pausing ads in selected geographies
eBay chose 70 DMAs in the US as the experimental group where Google AdWords spending was paused. Other DMAs continued as usual.
This geographic split allowed comparison of revenue ratios between experimental and control groups.
The key variable of interest was the absolute difference in revenue across DMAs — which varies naturally due to population size and market characteristics.
The experiment tested the null hypothesis:
There is no difference in revenue generation after pausing Google AdWords.
Rejecting this null would mean the pause caused a significant revenue change.
Using a linear model to measure impact
To analyze the data, eBay built a linear regression model:
- The dependent variable was the revenue ratio (post-pause revenue / pre-pause revenue) for each DMA.
- The independent variable was a binary indicator denoting whether the DMA was in the experimental group (paused ads) or control group.
The model estimated the average difference in revenue ratio attributed to pausing ads, controlling for other factors.
The results indicated a roughly 5% negative offset in revenue ratio when AdWords spending was paused.
This means pausing ads led to a 5% revenue drop on average across those DMAs.
Accounting for randomness: randomization testing
A single linear model estimate can be misleading if the observed effect is due to chance.
To address this, eBay applied a randomization test:
- They shuffled the experimental group labels randomly.
- For each shuffle, they re-ran the linear model.
- They repeated this 10,000 times to build a distribution of revenue differences under the null hypothesis.
The observed 5% difference was compared against this distribution.
The chance of seeing such a difference by random luck was very low, indicating statistical significance.
Interpreting the p-value and business impact
The p-value quantifies the probability of observing the effect size if the null hypothesis were true.
A very small p-value means the observed revenue drop is unlikely to be a coincidence.
This gave eBay confidence that their extra spending on Google AdWords was a good decision — it materially increased revenue.
Applying this to product feature testing at eBay
Imagine you are now a product manager at eBay, tasked with testing a new feature added to the landing page.
Here is how you would approach it, inspired by the paid search experiment:
What data would you collect?
- Primary metric (dependent variable): revenue or conversion rate attributable to the landing page.
- Feature flag: whether a user saw the new landing page or the old one.
- User attributes: geography, device, referral source, time of day.
- Engagement metrics: bounce rate, session duration, click-through rate on landing page elements.
- Experiment group assignment: control vs treatment.
How to design data collection strategies?
- Use A/B testing with random assignment of users to control or treatment.
- Ensure sample size is statistically powered to detect meaningful differences.
- Collect data for a sufficient duration to capture seasonality effects.
- Track experiment integrity — confirm users do not leak between groups.
- Instrument all relevant touchpoints in analytics tools (Google Analytics, Mixpanel, etc.).
What variables to consider for regression analysis?
| Variable type | Examples | Role |
|---|---|---|
| Independent variables | Feature flag (0 = old, 1 = new), user demographics, traffic source | Predictors |
| Dependent variables | Revenue per user, conversion rate, average order value | Outcomes |
| Control variables | Time of day, day of week, device type | Adjust for confounders |
Building the regression model
You would build a linear regression model such as:
Revenue per user = β₀ + β₁ × Feature flag + β₂ × Device + β₃ × Traffic source + … + ε
- The coefficient β₁ estimates the incremental impact of the new landing page.
- Statistical significance testing determines if the effect is reliable.
- Confidence intervals quantify the precision of the estimate.
Validating results with randomization tests or permutation tests
To ensure robustness, you can apply randomization tests similar to eBay’s approach:
- Shuffle feature flag assignments.
- Recompute model coefficients repeatedly.
- Compare observed effect to the null distribution.
This guards against spurious correlations.
From experiment to decision
If your analysis shows a statistically significant positive revenue lift, you can recommend rolling out the new landing page.
If results are inconclusive or negative, you can iterate on the design or scrap the feature.
This data-driven approach minimizes risk and maximizes learning.
Indian context: relevance of experiment design
Indian digital marketplaces like Flipkart, Meesho, and Razorpay also rely heavily on paid advertising.
However, experiment design must account for:
- Diverse user demographics and behaviors across states.
- Variable internet connectivity affecting session data quality.
- Multiple payment and delivery preferences impacting conversion metrics.
Adapting experiment methodology to local market nuances is critical.
Test yourself: Landing page A/B test at eBay India
You are PM at eBay India launching a new landing page feature. You randomly assign 50% of users to the new page and 50% to the existing page. After two weeks, revenue per user in the treatment group is 3% higher but with a p-value of 0.12. Bounce rates are unchanged. Your analytics team suggests running the test for one more week.
The call: What should you recommend? Continue the test, launch the feature, or scrap it? How do you justify your decision?
Your reasoning:
Where to go next
- If you want to master experimentation and A/B testing: Experimentation and A/B Testing
- If you want to learn data-driven decision making: Data-Driven Product Management
- If you want to build predictive models: Regression Analysis for PMs
- If you want to understand Indian digital market nuances: Designing for India
- If you want to apply statistics confidently: Statistics for Product Managers