Good Metric, Bad Metric
I’ve probably spent an embarrassing amount of time thinking about metrics — reading books, trying frameworks, tweaking dashboards, you name it. In a previous post, I talked about defining funnel metrics at Tailscale — and what I learned from doing it. In this one, I will go into a bit more depth about what makes a good metric.
Understanding the Basics
Created by Dave McClure back in 2007, the pirate metrics framework is perhaps the most popular way to measure user behavior. The idea here is to define key points in the user flow that correspond with desirable actions you want to drive:
- Acquisition — Did we get a new user?
- Activation — Did that user experience any value?
- Retention — Did that user experience enough value to return?
- Revenue — Did that user experience enough value to want to pay us?
- Referral — Did that user experience enough value to want to refer us to other users?
There are other frameworks like HEART or RARRA, but I haven’t used them — and they’re not as widely adopted as pirate metrics. In most companies, people see this as a linear funnel that users go through from one stage to the next. In practice, people jump around the funnel stages beyond acquisition all the time but in aggregate (and with decently large enough numbers) you should see something that resembles an actual funnel (i.e., there are less people who generate revenue vs the ones who activate). That makes them great for measuring outcomes — but not for understanding user behavior.
Managing these output metrics tends to be a shared responsibility between marketing, product management and sales. Marketers tend to focus more on top of funnel (acquisition and to some extent activation) plus some supporting activity in the middle (like nurture emails to re-engage users). Product managers are more in the middle (activation, retention and self-serve revenue if applicable). For instance, a PM might say: if we launch feature X, it should engage users more — and improve our retention metrics. Sales are further down in the funnel (sales-assisted or sales-led revenue). This division of labor also does not happen as neatly as described here. One of the main downsides with this is that it tends to create siloed thinking, competing agendas and increase need for alignment between all the parties involved.
The measurement issues with the funnel and siloing of thinking it introduces are among the main reasons why the idea of “growth loops” became more popular. Growth loops, if identified correctly, allow for more end-to-end ownership over something that actually drives growth. That said, defining good growth loops are much harder than pirate metrics. Also, I continue to believe defining good pirate metrics is a necessary step towards defining good growth loops, though it may not be a sufficient one.
So What’s a Good Metric
I may regret this later but I think most steps in the pirate metrics are relatively self-explanatory! Acquisition happens when a user signs up, revenue happens when a user pays us, referral happens when a user shares the app. Sure there are nuances that we should consider even with these (and here’s a sample to illustrate the point):
- Acquisition — How do we account for accidental sign ups? What do we call someone who browses our pages and never signs up (are they not valuable)? What if someone deletes their account and signs up again (do we treat them differently)? Etc.
- Revenue — Do we count this when someone subscribes to a paid plan or when they actually generate their first dollar? How do we account for people who stop paying us? Etc.
- Referral — Do we count this when the invite is sent or when it is accepted? What about people who talk about us and not share using the app experiences that we can measure? Etc.
In large enough numbers, these nuances become funnel optimization issues and in small numbers (like a new start up), they do not matter nor does optimizing for them.
The main thing that you need to pay very close attention to are in the activation and retention metrics. The trick to defining a good activation metric is this; think about the smallest unit of value that you deliver the user and what measure it. Here are some examples to illustrate the point (all of these are my own opinion, I actually don’t know what their activation metrics are):
- For Snowflake or Databricks the smallest unit of value I get from them is running a single query
- For Amazon S3 or Google Drive, the smallest unit of value I get is when I store a single byte or a file
- For Slack, it’s when I send or receive my first message.
- …
Understanding this is super important because you resist the temptation of setting a vanity metric (like visits to your admin console) just as much as you need to resist the temptation of over-engineering the metric (like prematurely assuming a single file or a single byte is too shallow of engagement to be worth something). I can’t stress this enough — I made this exact mistake. The file size doesn’t matter. The query complexity doesn’t matter. No, you don’t need three queries to call someone “active.” Just pick the simplest measurable metric, and you can always change this metric if it really doesn’t add value or help you drive meaningful outcomes.
Once you have a decent activation metric, you are ready to measure retention and this is where I may deviate from conventional wisdom a bit! I believe for retention you should measure the exact same thing as you do for activation except assign a time horizon to it. In other words, retention shouldn’t mean just “they came back.” It should measure whether they got value — again. As for the time horizon, I am a big fan of keeping it to 1-4 weeks because users who stick around for a month are unlikely to churn, 1-4 weeks is still short enough that you can experiment against and telling apart signal from noise beyond 4 weeks is near impossible unless you have Google or Meta scale data. That said, every once in a while (say once every other quarter), I do measure 52 week retention to test if 1-4 week retention actually leads to longer term retention as well.
Identifying the Smallest Unit of Value
By now, I wouldn’t blame you if you’re thinking “okay, thank you for the insights captain obvious! But how do I find the smallest unit of value?”. And the answer to that, as I said in a previous post as well, is to talk to your customers. They are not here because of your shiny UI or because you’ve implemented this in Rust or anything of that sort.
They are here because they have a problem to solve or as some purist may say, a Job-to-be-Done. Finding that out is the first step towards defining a good activation metric. For instance, in the case of Tailscale where I used to work, customers were trying to build a private network. Once you have identified the main problem people are trying to solve with your product, think about the smallest solution you provide to that problem. For Tailscale, this is a private network between two devices. Finally, think about the very first instance someone experiences value (some people call it Aha moment but honestly I find that phrase to be vague). For Tailscale, this is when someone sees those two devices connect for the first time (even if that connectivity is just a network ping or even a single byte of traffic).
Once you have these three points, put it all together to form your activation metric. For Tailscale, this was network traffic (any amount) on your network (any size) — as in, you are activated if you have two devices on your network and you send a single byte of traffic between them. By the way, I share this because how the company defines active user is publicly documented.
Final Thoughts
Once you have your output metrics locked, then you can start coming up with ideas and experiments to drive growth. Here you have a chance to opportunistically instrument input metrics that matter. For instance, if you believe friction in the onboarding flow is slowing down activation, instrument every step in your onboarding flow. Also, don’t over instrument either. In my opinion, there is no need (especially early-on) to instrument every single user action. The exception to this rule is, again, if you have Google or Meta scale data and have the resources to feed it into an algorithm that does magic for you. Admittedly, I have not seen that work either but I spent most of my time doing B2B product management.
Good luck — and may your growth curve always bend like a hockey stick.