When Metrics Move, Is It Behavior or Mix?

March 11, 2026

I spent four years on Google Search Ads working on UI and ad format changes. A useful analytical tool I carried forward from that work was a simple decomposition of ratio metrics into Within, Mix, and Interaction effects. I’ve used it in every role since, because it answers one of the most important questions in applied data science: when an aggregate metric moves, is it because behavior changed, or because the composition of traffic changed?

The Problem

I was at Reddit in 2018 and time spent per user was declining consistently. This was the metric that shaped the narrative around whether the platform was healthy. I dug into it and noticed newer users had lower activity levels, which made sense — they hadn’t built up habits yet. But Reddit was growing fast, so these newer, lower-activity users were making up a larger fraction of DAU. The average was getting pulled down even though engagement within each cohort was stable.

I made a couple of graphs showing cohort-level time spent alongside the shifting composition and shared them with leadership. They got it. But it wasn’t rigorous. I could show that a mix effect existed, but I couldn’t put a number on how much of the decline was mix and how much was real behavior change.

I didn’t find that tool until I moved to Google Ads. We shipped changes to search results — new ad formats, different placements. Some improved ads revenue, which immediately created a trust problem: if CPC goes up, are advertisers actually paying more, or did the mix of clicks just shift? We used mix-adjusted CPC metrics — decomposing the aggregate change into what happened within each advertiser versus what happened because the click distribution shifted. The decomposition didn’t just measure something — it changed what we shipped.

The Decomposition

Here’s the idea. Any ratio metric $R$ can be written as a weighted average of stratum-level rates:

$$R_t = \sum_i w_{i,t} \cdot r_{i,t}$$

Here, $w_{i,t}$ is stratum $i$’s share of the total denominator in period $t$, and $r_{i,t}$ is the stratum-level ratio. The change between two periods expands into three terms:

$$\Delta R = \underbrace{\sum_i w_{i,1} \cdot \Delta r_i}{\text{Within}} + \underbrace{\sum_i \Delta w_i \cdot r}{\text{Mix}} + \underbrace{\sum_i \Delta w_i \cdot \Delta r_i}$$}

Within asks: what changes if rates move but the mix stays fixed? Mix asks: what changes if composition shifts but stratum-level rates stay fixed? Interaction captures what happens when both move at the same time. These three terms sum exactly to the total change. This is not a model or an approximation; it is an accounting identity.

Time Spent: The Reddit Problem, Quantified

Here’s what the Reddit problem looks like with real structure. I simulated user cohorts — new, low-tenure, medium, high, and tenured — with different engagement levels and growth rates. The aggregate time spent declines steadily:

Aggregate time spent per user declining steadily from Jan 2017 to Oct 2019

But when you break it out by cohort, the story changes. Each cohort is roughly flat. The decline is driven by the fastest-growing segment (new users) having the lowest time spent:

Time spent by cohort (left) and user composition shift (right)

The decomposition makes this precise. Using meterstick with custom decomposition operations from brybrydataguy:

timespent = ms.Ratio('time_spent', 'users', name='TimeSpent')

(
    ms.MetricList([
        timespent | ms.PercentChange('dt', start_dt),
        timespent | WithinEffect('dt', start_dt, ['segment'], as_percent=True),
        timespent | MixEffect('dt', start_dt, ['segment'], as_percent=True),
        timespent | InteractionEffect('dt', start_dt, ['segment'], as_percent=True)
    ]) | ms.Jackknife('bucket', confidence=0.95)
).compute_on(df).display()

Time spent decomposition showing mix effect dominates the decline

The table tells the story I could not tell at Reddit with rigor: the roughly 11% decline in aggregate time spent is more than fully explained by mix. Newer, lower-activity cohorts made up a larger share of users, while within-cohort engagement was slightly positive. The aggregate metric was directionally real, but the behavioral interpretation was wrong.

CPC: Simpson’s Paradox in Three Advertisers

CPC is a classic place where aggregate metrics can mislead. If CPC goes up after a product change, the obvious concern is that advertisers are paying more for the same traffic. But that conclusion does not necessarily follow from the aggregate metric alone.

Here’s a simple example with three advertisers over two quarters. Every advertiser’s CPC goes down, but aggregate CPC nearly doubles:

Advertiser	Q1 CPC	Q2 CPC	Change
Luxury Co	$10.00	$9.50	-5%
Main St Retail	$2.00	$1.90	-5%
Local Biz	$0.50	$0.45	-10%
Aggregate	$2.05	$3.89	+90%

The decomposition makes the answer obvious: Within is -5.6% because costs fell within advertisers, Mix is +100% because clicks shifted toward the expensive advertiser, and Interaction is -4.6%. The aggregate increase is almost entirely compositional. No individual advertiser is paying more per click.

Where This Gets Powerful: High Dimensionality

I built a simulation with 5,000 advertisers — Pareto-distributed click volumes, lognormal CPCs, and realistic correlation between volume and cost. Period 2 introduces 15% CPC inflation across the board, but volume simultaneously shifts toward cheaper inventory as bidding algorithms react. Aggregate CPC looks fairly benign. The decomposition shows meaningful within-advertiser inflation being masked by a negative mix effect. That is exactly the kind of result a top-line dashboard can hide. Its also can be simply shown and describes from a table:

CPC Mix Effect in Experimeent

The point is not any single example. It is that this decomposition gives you a clear and robust way to quantify mix effects with confidence intervals across any number of dimensions, for any ratio metric.

When you are running experiments at scale with thousands of advertisers and dozens of stratification dimensions, you cannot manually inspect every slice. When a major metric moves, it is incredibly useful to have one number that answers the question: how much of this movement is compositional? This decomposition gives you a defensible answer.

When to Use It

Use this decomposition when you have a ratio metric, meaningful strata, and a genuine need to separate behavioral change from compositional change. CPC across advertisers, conversion rate across devices, time spent across cohorts, and latency across regions all fit naturally.

Do not use it when the strata are themselves the object of interest, when you need to stratify so finely that variance explodes, or when the relationship is continuous and nonlinear. In those cases, regression is usually the better tool. A useful heuristic is this: if you can enumerate and name your strata on a whiteboard, decomposition is a good fit. If you need a model to define them, regression probably is.

The calculation is straightforward, and it is easy to explore multiple combinations of dimensions. But the result is only as good as the stratification. This is still partly an art: you have to choose strata that capture the real confounding structure. Choosing your strata carefully matters more than the math.

Closing

Whenever a ratio metric moves, I want to know whether behavior changed or composition changed. This decomposition gives a clean, defensible answer. That is why I keep coming back to it.

Back to blog