Analytics for AI Apps: Tracking Tokens, Latency, and Cost per User
AI products have a new unit economics problem: every user action has a marginal cost. How to track tokens, model latency, and cost per account with custom events.
Classic SaaS analytics assumes a comfortable fact: serving one more user action costs approximately nothing. AI apps broke that assumption. Every generation has a real marginal cost in tokens, every model call has tail latency users feel, and a single enthusiastic free-tier user can quietly cost you more than a paying customer brings in. Standard web analytics does not see any of this — but it can, with a handful of well-designed custom events.
The three quantities that decide AI unit economics
- Tokens per action: input + output tokens for each generation, the raw material of your COGS.
- Latency per action: time-to-first-token and total generation time — the AI product's equivalent of page speed, with the same conversion consequences.
- Cost per account: tokens × model price, accumulated per user, compared against what that user pays you. The margin question, per person.
The event design
One event type, fired server-side after each model call, carries all three:
// After each model call, server-side
await fetch('https://clycyo.com/api/collect', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
tracking_id: process.env.CLYCYO_TRACKING_ID,
type: 'event',
visitor_id: user.clycyo_visitor_id, // persisted at signup
event_name: 'generation_completed',
event_properties: {
model: 'claude-sonnet-4-6',
input_tokens: usage.input_tokens,
output_tokens: usage.output_tokens,
cost_usd: computeCost(usage),
ttft_ms: timing.firstToken,
total_ms: timing.total,
feature: 'document_summary',
},
}),
});Server-side, because token counts and true latency live there — and because client-side cost events would be trivially blockable. The visitor_id join (the same one used for Stripe revenue attribution) connects each generation to the user's full journey: the channel that acquired them, the pages they visited, and the plan they pay for.
The reports that change decisions
Cost per account vs revenue per account
Sum cost_usd per user, set against their subscription. The distribution is always more interesting than the average: typically a long tail of cheap users and a small head of users burning multiples of their plan price. That head is your pricing-page redesign waiting to happen — usage caps, metered tiers, or an enterprise conversation.
Margin per feature
Group by the feature property. When document_summary costs 4× what chat costs but drives half the retention, you have a real product strategy question — and actual numbers to argue it with.
Latency vs abandonment
Join ttft_ms with what users did next. If generations over 3 s of first-token latency correlate with session abandonment, that is your case for a faster model on the interactive path — or for streaming UX work. Same analysis as page speed vs conversion, new bottleneck.
Cost per acquisition channel
The exotic one nobody computes: because generations carry visitor_id and the visitor carries first-touch UTM, you can see that users from Channel A cost twice as much to serve as users from Channel B. CAC was never the whole story; for AI apps, cost-to-serve by channel completes it.
Keep the volume sane
High-frequency apps can emit a lot of generation events. Two pragmatic options: aggregate per session server-side before sending (one event per N generations with summed tokens), or sample uniformly and scale up — unit economics needs honest distributions, not every row. Start unsampled within the free 10k events/month and decide with data.
AI products get measured on a new axis, but the machinery is the same one good web analytics always had: events with properties, joined on one user record. Design four good events and your unit economics goes from vibes to a dashboard.