
Craft
Read time: 9 min

Arnon Shimoni
✓ Expert opinion
tl;dr: Most billing systems model a credit wallet as a prepaid cash balance. That works at day zero. It breaks the moment your product has multiple types of credits with different per-unit costs, different margins, and different rate cards sitting between your token layer and your customer-facing price. Our customer Reson8 needed multiple wallets, structured around a rate card layer instead of one prepaid counter.
Lots and lots of billing systems model credits as prepaid cash. Pay $100, get $100 of credit. Burn the credit, the balance goes down. That's because it's often treated as "just an engineering thing" where it's a fancy counter.
The model is clean until your product has more than one thing customers buy credits for.
Why we'd rather hold credits than money
There's a line we come back to internally:
I'd rather have credits in a wallet than money, because money I have to give back. Credits I can expire.
A money wallet creates a liability. Whatever your customer deposits, you owe it back if they don't spend it. Credits work on your terms: when they expire, what they cover, under what conditions they're valid. The obligation is yours to design.
Not great for customers necessarily, but great for businesses running with AI.
A wallet targeting a single product category is what tax law calls single-purpose: intended use is known at purchase, so VAT is calculated when the funds go in. A general-purpose wallet, where a customer might spend across your product, is multiple-purpose: intended use is unknown at purchase, so VAT is calculated when they actually use it. Credits are a third category. You're selling a product (100 credits for $90), VAT applies at purchase, and the credit has its own exchange mechanics entirely separate from currency.
So if you have three wallet types, you get three different VAT treatments and three different points of revenue recognition.
If you model them as one thing and finance finds out later, and won't be happy with you.
Why single wallets aren't right
The single-wallet model assumes interchangability. All credits are equal. One credit buys one unit of anything in the product.
That assumption is kinda true for simple products but not for modern AI stuff where the cost of delivering one unit varies across workloads. A minute of transcription on a custom-trained domain model costs more to deliver than a minute of batch generic transcription. If you change languages it becomes even clearer, because the underlying token countss are not the same.
When those two workloads draw from the same credit pool, the billing system can't enforce that difference. That means customer's credits become fungible across products that aren't and the margin doesn't match what was invoiced.
That's a data model problem!
What Reson8 needed
Reson8 builds hyper-customizable speech recognition for European languages: real-time, domain-adaptive, running on EU GPUs with no audio retention. They bill by the minute across multiple workload types. Standard transcription. Custom-domain models adapted on up to 1M tokens of customer context with real-time processing versus batch - and each has a different cost basis.
The moment a customer at Reson8 pre-purchases minutes, the question becomes: which kind? Standard-transcription minutes and custom-domain real-time minutes are different products. You can't let customers use one pool for the other: the product doesn't allow it, and the margin profile doesn't support it.
What companies like Reson8, ElevenLabs, Wispr need is at least three distinct rating systems, that translate to wallet types: one for standard minutes, one for custom-domain minutes, and one that handles real-time processing overages as a metered charge when the pre-purchased pool runs empty.
In theory, each can have its own top-up schedule, expiry rules, and invoicing behavior.
A single prepaid balance doesn't model that. It guesses at it at best.
The token layer is why
Speech AI adds a dimension most billing systems weren't built around. The model thinks in tokens. The customer thinks in minutes. Contracts denominate in credits. Finance works in dollars (or Euros).
Each link is a rate:
tokens → minutes → credits → dollar
A minute of real-time transcription in a custom-adapted model on a dedicated EU cluster consumes more tokens than a standard batch job on a generic model. The conversion factor varies by workload, by model version, by language pack, by the level of domain adaptation the customer has configured.
Most billing systems let you set a price per event, or a price per unit of consumption. What they don't support is a rate card layer sitting between the metering layer and the wallet layer. A rule that says: one meter event tagged workload: custom-realtime burns 3 credits from wallet B, while one meter event tagged workload: standard-batch burns 1 credit from wallet A.

Remember I said engineers see it as a counter? That's not how counters work.
When that rate card lives in application code rather than billing configuration, it's invisible to finance and re-coded every time the cost model changes. Which in AI, is quote often.
Why the rate card has to be in the billing layer
Lots of speech vendors also have custom model adaptation as a product in its own right: a one-time charge to adapt the model on a customer's data, then a recurring credit pool to use it. The initial adaptation and the ongoing credit wallet are related but distinct events.
The whole revenue stack has to handle both on the same invoice and recognize revenue at the right moment for each.
You could call this hybrid, where a one-time charge triggers a wallet provisioning event, the wallet burns down as the customer runs workloads, it auto-tops-up on a schedule or on demand, and overages switch to real-time metering when the pool hits zero.
For that to work, the wallet has to carry more than just a counter, but some extra metadata: which plan provisioned it, which meters feed into it, which rate card converts events into credit deductions. A counter doesn't do that.
The Stripe method (and lots of other vendors) have gift-cards or "balances" given to a customer, but they're not separate objects - you can't spin them up and move them around. Connecting them requires orchestration code you write and maintain. Lago's credit primitives don't compose naturally with multi-wallet, multi-rate-card configurations and if you're building on top of them - you need your own custom ogic that reimplements the billing layer on top of the billing layer.
In Solvimon, Wallets are a first-class primitive. Multiple wallet types per customer, each tied to a specific meter and a rate card, with configurable top-up and expiry rules. The rate card lives in configuration. Finance can see it, and the billing system enforces it.
What changes for revenue recognition
Good wallet modeling cleans up revenue recognition.
A customer pre-purchasing 10,000 standard minutes creates a deferred revenue liability. Each minute consumed triggers a recognition event against the right wallet. When the pool empties and the customer tips into metered overages, billing switches from credit burn to real-time usage. Finance sees one invoice, one ledger. The wallet's depletion is the revenue recognition schedule.
During billing calculation, credits are reserved against pending invoices, then deducted only when the invoice goes final. Available balance is real balance minus reservations. That gap between reservation and deduction prevents customers from overspending mid-cycle, and makes the recognition schedule traceable without a spreadsheet.
For Reson8, that matters. EU customers care about data handling and contract structure. An invoice that reflects custom-domain usage, standard usage, and real-time overages as distinct line items, each traced to its wallet and rate card, holds up in procurement, in audit, and in the customer relationship. That's not my favourite thing to have to design around, but healthcare and financial services procurement teams care quite a lot about this.
The big decision/question behind credit pricing you need to make
Credits carry a pricing decision inside them: what value to attach to each unit. The structural question is whether your billing system can model the relationships between your metering layer, your credit pools, and your revenue recognition without engineers maintaining the translation.
For AI companies running multiple workloads with different cost bases, that requires multiple wallets and a rate card layer that understands what each pool represents.
Frequently Asked Questions
Why would a company prefer credits over a money wallet?
A money wallet creates a financial liability: you owe the deposited amount back if the customer doesn't spend it. Credits are a product sale: the customer buys a defined unit with defined terms, including when those credits expire. That distinction affects your balance sheet, your VAT treatment, and how you recognize revenue. For many AI companies, credits are preferable precisely because you set the expiry terms rather than holding an open-ended obligation.
What is the difference between a credit wallet and a prepaid balance?
A prepaid balance is a single pool of value denominated in currency. A credit wallet is a typed pool denominated in a product-specific unit (minutes, tokens, API calls) with a rate card defining how it converts to currency and a meter defining what consumes it. For simple products, they're equivalent. For products with multiple workload types, only the wallet model holds up.
Why would an AI company need multiple credit wallets for the same customer?
When a product has multiple distinct workloads with different unit costs and different margins, a single pool treats all credit consumption as equivalent regardless of delivery cost. Multiple wallets enforce the boundary. Each wallet is tied to the workloads it covers, with its own rate card and top-up behavior.
What is a rate card in AI billing?
A rate card is a configuration layer that defines how a metering event converts into a credit deduction. For a speech AI company, this might be: one minute of real-time custom-domain transcription = 3 credits from the custom wallet. The rate card sits between the metering layer (which counts consumption) and the wallet layer (which tracks the balance).

When rate cards live in application code, they're invisible to finance and hard to update - which is why you should keep them in a billing system.
How does token-to-credit conversion work in practice?
The model layer consumes tokens. The product layer exposes a customer-facing unit (e.g., minutes). The billing layer converts that unit into credits at a rate defined by the rate card. A minute of transcription in a given model configuration costs a known number of tokens to produce. When the model changes and the per-minute token cost changes, the rate card updates in configuration.
Can Stripe Billing handle multiple credit wallet types?
Stripe Billing supports basic combinations of subscription billing and metered usage, and has a credit grants, but not tied to wallets. Getting them to interact, especially across multiple credit types with different rate cards, requires custom orchestration code. Most teams building multi-workload AI products end up maintaining that orchestration layer themselves.
How does Solvimon model credit wallets?
Solvimon treats Wallets as a first-class primitive: multiple wallet types per customer, each associated with a specific meter and a rate card, each with configurable top-up and expiry rules.

The rate card layer handles the conversion between consumption units and credit deductions. Revenue recognition is calculated against wallet events. Wallet configuration lives in Solvimon rather than in application code.
