GTM Experiments by Stage: What to Test and When

Every startup says they’re “running experiments.” Most of them are just guessing with extra steps.

Wrong experiment at the wrong stage is worse than no experiment at all. You burn runway, convince yourself you’ve learned something, and build a GTM motion on top of a broken foundation.

Here’s the actual framework.

The Core Problem

GTM experimentation fails for one reason: companies run the wrong experiments at the wrong stage.

A seed-stage founder A/B testing email subject lines for three months. A Series B team still doing one-off outbound blasts to “validate ICP” when they should have cracked that 18 months ago. A $15M ARR company with no idea which channels produce deals that actually close versus deals that churn.

Mark Leslie and Charles Holloway called this out in their 2006 Harvard Business Review piece “The Sales Learning Curve.” Every company moves through distinct phases of sales learning. Each phase has a different primary objective. Mixing them up is what burns cash and creates the illusion of progress where there is none.

Three stages. Each with its own priorities, metrics, and a hard exit criterion before you move on.

STAGE 1: $0 TO $1M ARR

Objective: Validate before you build anything.

At this stage you don’t have a GTM system. You have a founder making calls on instinct. That’s not a bug. That’s correct. Your job isn’t to build a repeatable process yet. Your job is to generate enough signal that when you do build one, it reflects reality rather than assumptions.

The trap is premature systematization. Founders who close a few deals start building dashboards, hiring SDRs, setting up automated sequences. They’re tuning a machine with no validated engine.

Geoffrey Moore described this in Crossing the Chasm as mistaking early adopter traction for mainstream demand. Your first ten customers bought on vision and founder charisma. The next hundred are pragmatists. They buy differently. Build your GTM around the first ten, and you’ll hit a wall exactly when momentum feels best.

What to run:

ICP trigger experiments: You probably have a demographic ICP (mid-market SaaS, 50 to 200 employees). What you don’t have is a trigger ICP: the specific event that makes a company ready to buy right now. “Series A companies in their first 90 days post-funding.” “Teams that just lost their head of sales.” Run outreach against different trigger hypotheses. Track reply-to-meeting rate. The trigger with the highest conversion is your real ICP.
Messaging experiments: Rahul Vohra’s approach at Superhuman is the most rigorous version of this published anywhere. Facing no clear product-market fit signal, he built a four-question survey sent after users experienced the core product. Key question: “How would you feel if you could no longer use Superhuman?” Benchmark: 40% or more answering “very disappointed” means PMF. Superhuman started at 22%. After segmenting by user type, doubling down on what users loved, and removing friction for on-the-fence users, they hit 58% in four quarters. The GTM version: find the message framing that makes your best-fit buyers feel they’d be “very disappointed” to go back to the old way. That’s your lead message.
Channel experiments: Slack’s early approach is the clearest model here. Stewart Butterfield deliberately called it a “preview release” rather than a beta, to avoid signaling unreliability. The pattern: test with a small group, measure adoption, expand only when the signal is positive. Small batch, specific audience, measured outcome. Scale or cut.
Pricing experiments: Dropbox’s early history is the instructive failure. Drew Houston ran Google AdWords. Cost to acquire a paying customer: $233 to $388, against a $99/year product. Unsustainable. He brought in Sean Ellis, who found nearly a third of Dropbox users were already coming from referrals. That one data point redirected the entire GTM model. The referral program that followed grew Dropbox from 100,000 to 4 million users in 15 months. 3,900% growth. The lesson: run your pricing and channel experiments early. The data will often redirect your whole model before you’ve committed to the wrong one.

Exit criterion for Stage 1:

Answer these three questions without a spreadsheet:

What specific trigger makes a company ready to buy right now?
Which message framing generates the most qualified conversations?
What does our best customer look like six months after signing?

Can’t answer all three? You’re not done. Don’t move on.

STAGE 2: $1M TO $5M ARR

Objective: Make the motion repeatable.

This is what Leslie calls the “transition phase.” You’ve validated the market, the message, and the customer profile. Now the question is whether any of it survives contact with someone who isn’t you.

Most companies blow this phase badly. They hire a rep or two, give them a light briefing, and call it a playbook. The reps underperform. The founder concludes the reps aren’t good enough, goes back to running deals, and the cycle repeats.

The problem isn’t the reps. The knowledge transfer never happened. Every experiment in Stage 2 is about externalising what the founder knows and testing whether it can actually be reproduced.

What to run:

Playbook validation experiments: David Ly Khim ran growth experiments at HubSpot. His rule: “If experiments aren’t based on our own hypothesis about our users, we’re just trying random things without a clear reason.” Apply this to sales. Document your ten most common objections and the response that works for each. Have a rep use it on ten real calls. Compare close rates. Where they diverge is where your documentation has gaps. That’s your training curriculum.
Discovery call experiments: What questions, asked in what order, most reliably surface the pain that makes someone buy? Test three different discovery frameworks across 20 calls each. Track which sequence produces the clearest buying signal by end of call. Not a soft skill exercise. A repeatable protocol waiting to be found.
Channel scaling experiments: The 2025 State of B2B GTM report surveyed 195 companies. Most effective channels under $1M ARR: LinkedIn, warm outbound, founder brand. The Stage 2 experiment is whether those channels still convert when someone other than the founder is running them. Double outbound volume on your best-performing channel. If reply rate drops more than 20%, you’ve hit the ceiling of that channel at your current ICP definition.
ICP expansion experiments: You’ve validated one segment. Now test which adjacent segment responds similarly: different vertical, same trigger, same company size. Run a targeted 50-contact sequence. If conversion rates land within 15% of your core segment, it’s a viable expansion. If not, wrong adjacency. Keep looking.
First hire readiness experiment: Before you make a full-time sales hire, do what Gradient Ventures calls a “GTM sprint”: two weeks where someone other than the founder executes the exact outreach, messaging, and discovery framework. Measure results. If they hit 60% or more of what the founder produces, the system is ready to hand off. If not, there’s a gap in the playbook. Find it before you hire into it.

Exit criterion for Stage 2:

Two non-founder reps are consistently closing at 70% or more of your historical win rate using a documented playbook. Not once. Consistently, across a full quarter.

If you can’t hit that, you don’t have a scalable motion. You have a founder-dependent business with extra salaries on the payroll.

STAGE 3: $5M ARR AND BEYOND

Objective: Optimise the machine.

At Stage 3 you stop asking “does this work?” You know what works. The experiments are now about efficiency, scale, and diversification. Optimisation, not discovery.

This is where RevOps earns its keep. Ebsta’s research across 4.2 million opportunities found companies with strong revenue operations see 87% higher win rates and 21% shorter sales cycles. The mechanism is visibility. RevOps creates the measurement layer that shows where deals stall, where pipeline leaks, and which reps are quietly deviating from the playbook in ways that cost money.

What to run:

Attribution experiments: You’ve been running multiple channels. Now find out which produce deals that actually close, not just leads that enter the funnel. Set up multi-touch attribution and run it for a full quarter before drawing conclusions. Channels that look expensive top-of-funnel often look very efficient when traced to closed revenue. And vice versa.
Segmentation experiments: Which customer segments have the highest retention, fastest time to value, and highest expansion revenue? Run the cohort analysis. Shift acquisition resources toward those segments. Sounds obvious. Almost no company at Stage 3 has done it with real rigour.
Sales cycle compression experiments: Ebsta data: deals closed under 50 days have a 47% win rate. Beyond 50 days, it drops to 20%. Find where your deals are stalling. Common culprits: no clear next step set at end of discovery, too long between demo and proposal, economic buyer not in the room until it’s almost too late. Each one is a hypothesis. Test them one at a time.
Cross-functional GTM experiments: The most underused experiment type at Stage 3. Zoom’s 2020 growth, 10 million to 300 million daily meeting participants in months, came from exactly this. Product, engineering, marketing, and customer support all worked from the same hypothesis: ease of use drives adoption. Marketing led with simplicity messaging, product streamlined onboarding, sales emphasised quick deployment in pitches, support shipped breakout rooms and virtual backgrounds within weeks of user feedback. Each function tested the same hypothesis through its own lens. Revenue grew 326% year-over-year. A cross-functional experiment compounds faster than anything running in a single channel.
Content and inbound experiments: Heinz Marketing found companies running five or more GTM experiments per month grow three times faster than those running fewer than two. Content is where many of those experiments happen cheaply. Test content type (frameworks vs case studies vs data), distribution (LinkedIn vs email vs SEO), and CTA format. Measure pipeline influenced, not traffic.

Exit criterion for Stage 3:

There isn’t one. Stage 3 is the operating model, not a phase you graduate from.

The 2025 State of B2B GTM survey found the average software company runs five core GTM channels and another 5.5 experiments on top of them simultaneously. That’s not chaos. That’s the cost of staying competitive when every channel eventually saturates.

THREE RULES THAT APPLY AT EVERY STAGE

Write a proper hypothesis. Not “let’s try LinkedIn ads.” Instead: “We believe that targeting ops leaders at recently-funded Series A companies on LinkedIn with a case study ad will produce a 3% click-through rate and a 15% meeting-booked rate from clicks.” Independent variable. Dependent variable. Specific prediction. If you can’t write it that way, you’re not ready to run the experiment.

Set a time box. GTM experiments die from indefinite extension. Two to four weeks, hard cap. If you need more time to get signal, your sample is too small or your ICP is too broad.

Document everything. Every experiment, win or loss, goes into a shared log: what you tested, what you predicted, what happened, what you learned. Stewart Butterfield put it plainly: customer feedback was at the centre of Slack’s operating culture from day one, qualitative signals checked against quantitative data constantly. That discipline took Slack from a failed gaming company’s internal tool to a $27.7 billion acquisition.

Conclusion

Stage 1 is discovery. Stage 2 is transfer. Stage 3 is optimisation.

Most companies run Stage 3 experiments at Stage 1 (optimising a motion they haven’t validated) and Stage 1 experiments at Stage 3 (still arguing about ICP when they should be scaling what works).

Know your stage. Run the right experiments. Don’t move on until you’ve hit the exit criterion.

That’s not a limitation. That’s how you build a GTM system that outlasts the founder.

If you’re not sure which stage you’re in, or your experiments aren’t generating clean signal, drop me a message at angkan.mukherjee@gmail.com and happy to think it through.

GTM Experiments by Stage: What to Test, When, and When to Stop

The Core Problem

STAGE 1: $0 TO $1M ARR

STAGE 2: $1M TO $5M ARR

STAGE 3: $5M ARR AND BEYOND

THREE RULES THAT APPLY AT EVERY STAGE

Conclusion

Further Reading