GuidesMay 18, 20267 min read

Analytics for AI Apps: Tracking Tokens, Latency, and Cost per User

AI products have a new unit economics problem: every user action has a marginal cost. How to track tokens, model latency, and cost per account with custom events.

Classic SaaS analytics assumes a comfortable fact: serving one more user action costs approximately nothing. AI apps broke that assumption. Every generation has a real marginal cost in tokens, every model call has tail latency users feel, and a single enthusiastic free-tier user can quietly cost you more than a paying customer brings in. Standard web analytics does not see any of this — but it can, with a handful of well-designed custom events.

The three quantities that decide AI unit economics

Tokens per action: input + output tokens for each generation, the raw material of your COGS.
Latency per action: time-to-first-token and total generation time — the AI product's equivalent of page speed, with the same conversion consequences.
Cost per account: tokens × model price, accumulated per user, compared against what that user pays you. The margin question, per person.

The event design

One event type, fired server-side after each model call, carries all three:

// After each model call, server-side
await fetch('https://clycyo.com/api/collect', {
  method: 'POST',
  headers: { 'content-type': 'application/json' },
  body: JSON.stringify({
    tracking_id: process.env.CLYCYO_TRACKING_ID,
    type: 'event',
    visitor_id: user.clycyo_visitor_id, // persisted at signup
    event_name: 'generation_completed',
    event_properties: {
      model: 'claude-sonnet-4-6',
      input_tokens: usage.input_tokens,
      output_tokens: usage.output_tokens,
      cost_usd: computeCost(usage),
      ttft_ms: timing.firstToken,
      total_ms: timing.total,
      feature: 'document_summary',
    },
  }),
});

Server-side, because token counts and true latency live there — and because client-side cost events would be trivially blockable. The visitor_id join (the same one used for Stripe revenue attribution) connects each generation to the user's full journey: the channel that acquired them, the pages they visited, and the plan they pay for.

The reports that change decisions

Cost per account vs revenue per account

Sum cost_usd per user, set against their subscription. The distribution is always more interesting than the average: typically a long tail of cheap users and a small head of users burning multiples of their plan price. That head is your pricing-page redesign waiting to happen — usage caps, metered tiers, or an enterprise conversation.

Margin per feature

Group by the feature property. When document_summary costs 4× what chat costs but drives half the retention, you have a real product strategy question — and actual numbers to argue it with.

Latency vs abandonment

Join ttft_ms with what users did next. If generations over 3 s of first-token latency correlate with session abandonment, that is your case for a faster model on the interactive path — or for streaming UX work. Same analysis as page speed vs conversion, new bottleneck.

Cost per acquisition channel

The exotic one nobody computes: because generations carry visitor_id and the visitor carries first-touch UTM, you can see that users from Channel A cost twice as much to serve as users from Channel B. CAC was never the whole story; for AI apps, cost-to-serve by channel completes it.

Keep the volume sane

High-frequency apps can emit a lot of generation events. Two pragmatic options: aggregate per session server-side before sending (one event per N generations with summed tokens), or sample uniformly and scale up — unit economics needs honest distributions, not every row. Start unsampled within the free 10k events/month and decide with data.

AI products get measured on a new axis, but the machinery is the same one good web analytics always had: events with properties, joined on one user record. Design four good events and your unit economics goes from vibes to a dashboard.