AI credit pricing models: how tokens, credits, and hybrid billing actually work (2026)

AI credit pricing models: how tokens, credits, and hybrid billing actually work (2026)

Mar 11, 2026

Arnon Shimoni

I firmly believe lots of AI companies get credits wrong. I've personally consulted some who treat them like a payment method, with a prepaid balance, deduct on use, done.

When they hit a serious ARR number they realize their "credit system" is a column in Postgres, a cron job, and a finance team reconciling spreadsheets every month.

Credits are a pricing architectural decision as well as a software architecture one.

What is an AI credit?

Simply put, an AI credit is a unit of prepaid value that maps to compute consumption. When you buy 1,000 credits from an AI platform, each credit represents a defined amount of usage: tokens processed, images generated, API calls made, or minutes of audio transcribed.

The concept sounds simple but the implementation is usually far from it.

Credits sit at the intersection of three systems: your pricing logic (how much does a credit cost), your metering layer (what consumption does one credit represent), and your financial ledger (what's the revenue recognition treatment of unused credits).

For example:

  • OpenAI maps each credit to a specific number of tokens at a specific model tier.

  • Anthropic varies the exchange rate by model, so a Claude Haiku call burns fewer credits than a Claude Opus call.

  • A vertical AI company selling "document processing credits" might see wildly different compute costs per credit depending on document complexity.

That variability is the whole problem. And it's why a credit system built as an afterthought breaks faster than any other part of your billing stack.

Wait - is an AI credit just consumption-based pricing?

Mostly, but not exactly. A credit can represent a token consumed, an image generated, or a minute of audio processed. Those are consumption events. But a credit can also represent an outcome: a document successfully classified, a support ticket resolved, a lead scored. That's not consumption in the traditional sense. The customer isn't paying for compute they used. They're paying for something the AI did for them.

In practice, the terms get used interchangeably. Consumption-based pricing, usage-based pricing, and metered pricing all describe roughly the same mechanic in most SaaS contexts. The real difference is framing: consumption emphasizes the resource used, usage emphasizes the customer's activity, and metered describes the billing mechanism that enables both.

Lots of AI systems price individual actions, and the rate can be priced in tokens, which is only priced later on with a rate-card applied.


The distinction matters when you're designing your credit system because a credit that maps to token consumption needs a metering layer. A credit that maps to an outcome needs a classification layer. Your billing infrastructure needs to handle both, because many AI products will end up charging for some mix of the two.

The 5 pricing models AI companies actually run

AI pricing in 2026 isn't one model. Most companies run some combination of these, which is exactly what makes billing complicated.

Model

How it works

Best for

Risk

Per-token / per-unit

Charge per token, API call, or output unit

API-first products with predictable per-unit costs

Bill shock causes churn; a single runaway automation = $40K invoice

Credit-based (prepaid wallet)

Customers buy credit packs; usage deducts from wallet

Products with variable usage across customers

Credit system complexity (rollover, pooling, tiered rates) outgrows most implementations fast

Subscription + usage overage

Base fee covers an allowance; overage rates above it

B2B AI products with a clear "normal" usage band

Invoice logic combining fixed + variable + tiers requires custom code

Hybrid

Subscription + credits + enterprise contracts in one system

Multi-segment companies (self-serve, mid-market, enterprise)

Separate billing systems per segment = $1M+/year in reconciliation costs

Outcome-based

Charge per successful outcome delivered

Vertical AI with well-defined, measurable outputs

Defining "success" vs. failed attempts that still consumed compute

1. Per-token / per-unit pricing

The simplest model: charge per token, per API call, or per unit of output. OpenAI and Anthropic both publish per-token rates. You consume, you pay.

Where it works: API-first products with predictable per-unit costs.

Where it doesn't work: Customers hate unpredictable bills. A single runaway automation can generate a $40K invoice, and that's a churn event disguised as revenue.

2. Credit-based (prepaid wallet)

Customers buy credit packs upfront. Usage deducts from the wallet. This is the dominant model for AI platforms in 2026 because it solves two problems at once: customers get cost predictability, and you get cash upfront.

Where it works: Any AI product where usage varies by customer. Credits absorb the variance.

Where it doesn't work: When your credit system can't handle rollover, expiry, team pooling, tiered exchange rates by model, and burn-down visibility. Most "credit systems" are a balance field and a decrement function. That works until someone in finance asks: "How many credits did Team A's workspace consume against the shared pool last quarter, and what was the blended cost per credit across model tiers?"

If answering that requires a data engineer and two days, your credit system is a spreadsheet pretending to be infrastructure.

3. Subscription + usage overage

A base subscription covers a usage allowance. Exceed it, and overage rates kick in. This is the model most B2B AI companies gravitate toward because it combines predictable base revenue with usage upside.

Where it works: Products with a clear "normal" usage band and occasional spikes.

Where it doesn't work: The invoice. Combining a fixed subscription charge with variable usage overage across multiple pricing tiers with volume discounts on a single invoice is where standard billing tools start requiring custom code. And once that custom code exists, you own it forever.

4. Hybrid (subscription + credits + enterprise contracts)

The reality for AI companies past $5M ARR. Self-serve customers are on credits. Mid-market is on subscription + overage. Enterprise has custom contracts with committed spend, volume discounts, and quarterly true-ups.

Where it works: Companies serving multiple segments simultaneously.

Where it doesn't work: If your billing system can't model all three in one place, this won't work. The typical failure mode: self-serve billing runs on one system, enterprise billing runs on spreadsheets plus a CPQ tool, and finance spends two weeks every month reconciling them into something that passes audit.

5. Outcome-based pricing

Charge based on the outcome delivered, not the compute consumed. Per successful document processed. Per lead qualified. Per code review completed. This model is emerging fast because it aligns price with value better than any token-counting scheme.

Where it works: Vertical AI products where the output is well-defined and measurable.

Where it doesn't work: Defining "success." If your billing system can't distinguish between a successful outcome and a failed attempt that still consumed compute, you're either overcharging customers or eating the cost.

How real companies price AI in 2026

Theory is one thing. Here's how eight companies actually implement these models, and what their pricing reveals about the billing complexity underneath.

Company

Model

Unit

How it works

Billing complexity

OpenAI

Per-token

Tokens

$0.15–$14 per 1M tokens depending on model.


Batch endpoint offers a discount.

Straightforward metering, but customers need to track spend across multiple models with different rates. Each model tier is a different price point on the same invoice.

Mistral AI

Per-token

Tokens

$0.02–$2.00 per 1M tokens.


Wider model range, lower floor.

Same metering challenge as OpenAI. The spread between cheapest ($0.02/1M) and most expensive ($2.00/1M) model is 100x, so credit-to-cost mapping matters enormously.

ElevenLabs

Credits + tiers

Characters most commonly

Monthly tiers from $5 (30K characters) to $1,320 (11M characters).


1 character = 1 credit.


Overage at $0.06 to $0.15/minute.

Hybrid in disguise. Looks like a subscription, but it's really a credit allowance with overage. The billing system needs to track character consumption against the tier ceiling, then switch to overage rates mid-cycle.

Black Forest Labs

Credit-based

Megapixels

1 credit = $0.01. FLUX.2 Pro costs ~$0.03 per megapixel.


Higher resolutions = more credits per image.

Credit wallet with variable burn rates. A 1024×1024 image costs different credits than a 2048×2048. The billing system needs to calculate megapixel-based pricing per request.

Slack

Subscription (bundled AI)

User/month

$7.25 to $15/user/month.


AI features bundled into plan tiers, no separate AI billing.


Killed the $10/user AI add-on in 2025.

Classic seat-based. Slack chose to absorb AI compute costs into the subscription rather than meter them. Works because AI features (summaries, search) are supplementary, not the core product.

Datadog

Hybrid

Host + GB

$15 to $40/host/month for infrastructure monitoring.


$0.10/GB for log indexing.

Textbook hybrid billing.


Fixed per-host subscription plus variable per-GB usage, with a high-water mark calculation that requires hourly metering. Two different billing dimensions on one invoice.

Fin (Intercom)

Outcome-based

Per resolution

$0.99 per resolved conversation.


Base platform at $29/month.

The billing system needs to distinguish "resolved" from "escalated." Only successful outcomes get billed. This requires the AI agent's resolution logic to feed directly into the billing layer.

Decagon

Outcome-based

Per conversation or resolution

Custom pricing. Customers choose per-conversation (any interaction) or per-resolution (only fully resolved).


Enterprise contracts range $95K to $590K/year.

Two billing modes for the same product.


The per-resolution model shares Fin's complexity.


The per-conversation model is simpler but still requires real-time event tracking at scale.


I have a few patterns worth noting, from my experience:

  • The per-token model is clean until it isn't. OpenAI and Mistral both publish simple per-token rates, but customers using multiple models on the same account face a multi-rate metering problem. Mistral's 100x spread between cheapest and most expensive model means a credit system with flat exchange rates would massively over- or under-charge depending on model mix.

  • Most subscription AI products are hiding usage billing. ElevenLabs looks like a subscription, but it's actually a credit allowance with overage pricing. Slack chose to bundle AI costs into the seat price, which works when AI features supplement the core product. It wouldn't work if AI usage were the primary value driver.

  • Outcome-based pricing shifts the billing problem from metering to classification. Fin charges $0.99 per resolution. The hard part isn't tracking the dollar amount. It's programmatically defining when a conversation counts as "resolved" and feeding that classification into the billing layer in real time. Decagon adds another layer by offering both per-conversation and per-resolution models, so the billing system needs to support two fundamentally different event definitions for the same product.

  • Hybrid billing is the norm, not the exception. Datadog runs per-host subscriptions plus per-GB usage with high-water mark calculations. ElevenLabs runs credit tiers with overage. Even the pureest of per-token providers end up with batch discounts, committed-use contracts, and volume tiers that turn simple metering into multi-dimensional billing. If your billing system only handles one model cleanly, you're already behind.

How major AI platforms structure their credit systems

The platforms that get this right treat credits as a first-class financial object, not a feature bolted onto a subscription tool.

Here's what to consider when designing your pricing architecture:

  1. Credit wallets with real ledger logic. Not a balance column. A wallet with transaction history, per-event metering, allocation rules (per-user vs. pooled), rollover policies, and expiry logic that connects to revenue recognition.

  2. Tiered exchange rates and rate-cards. One credit ≠ one unit of compute across all models. Snowflake does this very well with their rate cards. A well-designed credit system maps different exchange rates to different products, models, or tiers, and lets you adjust those rates without rewriting code.

  3. Burn-down tracking with cost visibility. Knowing how many credits a customer used is table stakes. The real question is how much those credits cost you in compute, and whether this customer is profitable at their current credit price. Without burn-down tracking tied to inference costs, you're pricing in the dark.

  4. Real-time metering. AI usage generates millions of sub-cent events per day. If your credit system reconciles usage nightly, your customers see stale balances, your finance team works with stale data, and your margin calculations are always a day behind.

Where credit systems don't work well

  1. When credits are treated as a payment method instead of a financial primitive. When credits live as a line item on a subscription invoice rather than as an independent ledger with its own rules, you can't do pooling, rollover, expiry, or tiered exchange rates without custom code. Every feature request from finance or product becomes an engineering project.

  2. No cost reconciliation against credit consumption. You know how many credits each customer consumed. You don't know what those credits cost you in infrastructure. This means you can't answer whether your $99 credit pack is profitable when a customer uses it entirely on your most expensive model. By the time you figure it out, you've been losing money on your best customers for months.

  3. Separate systems for self-serve and enterprise. Self-serve runs on credits through your billing tool. Enterprise runs on custom contracts through spreadsheets. The two systems don't share a ledger, so finance can't produce a unified view of revenue, and product can't see usage patterns across segments.

What to look for in a credit system in 2026

The difference between a credit system that scales and one that doesn't comes down to whether credits are a primitive in your billing architecture or a workaround layered on top of it.

You shouldn't settle for a wallet hack only.

Capability

Why it matters

The gap in most tools

Credit wallets as ledger objects

Credits need transaction history, allocation rules, expiry, rollover — not just a balance

Most tools store credits as a metadata field on the subscription

Tiered exchange rates per model/product

Different AI models have different costs; credit pricing should reflect this

Fixed 1:1 credit-to-usage mapping, or manual rate tables in spreadsheets

Real-time burn-down with cost data

You need to see credit consumption alongside infrastructure cost per customer

Usage data available next-day at best; no cost layer at all

Hybrid invoice support

One invoice combining subscription + credit usage + overage

Requires stitching two systems together manually

Pooled and per-user allocation

Enterprise teams share credit pools; individuals have allowances

Credits are per-account only, no sub-allocation

How Solvimon handles AI credit pricing

Solvimon was built by ex-Adyen engineers who built and operated Adyen's internal billing engine at €970B+ in annual payment volume. Credits and tokens are first-class primitives in Solvimon, not fields on a subscription record.

  1. Credit wallets in Solvimon carry full ledger logic: transaction history, allocation rules (per-user or pooled), rollover, expiry, and tiered exchange rates that map to different models or product tiers. When a customer's team of 50 shares a credit pool across three AI products with different compute costs per credit, Solvimon tracks that natively.

  2. Real-time metering processes millions of usage events and decrements credit balances as they happen. No nightly reconciliation. No stale balances.

  3. Hybrid billing runs subscriptions, credits, usage overage, and enterprise custom contracts in one system. One ledger. One invoice. No reconciliation across tools.

  4. PSP-agnostic: Solvimon works closely with Adyen, but also Stripe, Checkout.com and other PSPs. Your payments stack can evolve without rebuilding your credit architecture.

For AI companies, Solvimon is free up to $3M billed, then 0.40% of volume.

Choosing the right pricing model for your AI product

Your pricing model determines your billing architecture, your margin visibility, and how fast you can adjust when the market shifts.

Are you scaling self-serve? Credits give customers cost predictability and give you cash upfront. But build the credit system as infrastructure, not as a hack on top of your subscription tool.

Are you moving upmarket? You need hybrid billing that handles self-serve credits and enterprise contracts in one place. The companies that run these on separate systems spend $1M+/year in hidden costs reconciling them.

Past, say, $2M ARR with your credit system still living in a Postgres column and a cron job? If you rebuild it yourself and lose 6-9 months of engineering time at least… You should consider using infrastructure built for this problem.

For a comparison of platforms that handle AI billing, see our guide: AI billing software: 6 platforms built for tokens, credits, and inference pricing →

Solvimon's AI billing infrastructure is free up to $3M billed!