What Is a Token? LLM Pricing Explained for Non-Technical Users

Deploybase · February 19, 2025 · LLM Guides

Contents

What is a Token LLM: Overview

As of March 2026, every LLM API charges per token, not per word or per request. Understanding tokens is critical to predicting costs and optimizing spending. A token is a small chunk of text that models process internally. On average, 1 token ≈ 4 characters in English. But this varies wildly depending on language, content type, and tokenization scheme.

Key Facts:

  • 1 English word ≈ 1.3 tokens
  • 1 email (300 words) ≈ 400 tokens
  • 1 page of code (50 lines) ≈ 200 tokens
  • "Hello world" = 2 tokens
  • "You're" = 2 tokens (contraction splits)
  • "123456789" = 1 token (numbers are compressed)

Tokens are smaller than words but larger than individual characters. Models "see" text as sequences of tokens, not letters.

What Is a Token

A token is the fundamental unit that language models process. It represents a chunk of text, typically a word, a portion of a word, or punctuation.

Examples:

TextTokensCount
"Hello"["Hello"]1
"Hello world"["Hello", "world"]2
"You're"["You", "'re"]2
"running"["run", "ning"]2
"1234567890"["1234567890"]1
"Hello, world!"["Hello", ",", "world", "!"]4
"ChatGPT"["Chat", "G", "P", "T"]4

Notice how words are sometimes split. "running" breaks into "run" and "ning" because the tokenizer learned that "run" + "ning" is a common pattern. This saves tokens compared to character-by-character encoding.

Why Tokens Instead of Characters?

Models process tokens, not characters, because:

  1. Efficiency: 1 token vs 5 characters = 5x fewer inputs for the model
  2. Semantic meaning: "hello" is more meaningful as one chunk than h-e-l-l-o
  3. Consistent scaling: models have fixed input/output limits in tokens, not characters

A model with a 4K token limit can handle roughly 16K characters of English text. If models worked at character level, they'd be 5x less efficient.

Token Counting in Different Languages

English is token-efficient. 1 token ≈ 4 characters. But other languages are not.

English vs Other Languages (token count for "Hello world" equivalent):

LanguagePhraseTokensChar:Token Ratio
English"Hello world"211:2 = 5.5:1
French"Bonjour monde"213:2 = 6.5:1
Japanese"こんにちは世界"2012:20 = 0.6:1
Chinese"你好世界"48:4 = 2:1
Korean"안녕하세요 세계"810:8 = 1.25:1
Arabic"مرحبا بالعالم"612:6 = 2:1

Japanese and Korean are token-expensive because their character sets are huge (10K+ characters). The tokenizer creates separate tokens for rare characters, which inflates token counts.

Cost Implication:

Using Claude Opus ($5/1M input tokens), the phrase "Hello world" costs roughly $0.00000001. The Japanese equivalent "こんにちは世界" costs 10x more: $0.0000001.

For global applications serving multiple languages, budget more for non-Latin script content.

How Tokenization Works

Tokenization is the process of splitting text into tokens. Models don't do this themselves. Before text reaches a model, an external tokenizer preprocesses it.

Tokenization Algorithm (Byte-Pair Encoding, BPE):

  1. Start with all individual characters: h, e, l, l, o
  2. Find the most frequent adjacent pair: "l" + "l" = "ll" (appears often)
  3. Replace all instances: h, e, ll, o
  4. Repeat: find next frequent pair "e" + "l" = "el"
  5. Replace: h, el, l, o
  6. Repeat until vocabulary size reaches desired limit (typically 50K tokens)

Result: "hello" = [h, el, lo] = 3 tokens instead of 5 characters.

Different Tokenizers, Different Token Counts:

  • GPT-4 uses "cl100k_base" tokenizer: "Hello world" = 2 tokens
  • Claude uses "claude-tokenizer": "Hello world" = 2 tokens
  • LLaMA uses "llama-tokenizer": "Hello world" = 2 tokens (usually, context dependent)

Same text, same token count across major models, but not guaranteed. Always use the model's official tokenizer.

Special Tokens:

Models use hidden tokens for structure:

  • <start> (beginning of sequence)
  • <end> (end of sequence)
  • <pad> (padding for batch processing)
  • Function calling and tool use add 30-50 tokens overhead

These hidden tokens count toward token limits but aren't visible in API responses.

Why Tokens Matter for Pricing

LLM APIs charge per token because it directly correlates to computational cost.

Model Compute Cost:

Each token processed requires floating-point operations (FLOPs). A model with 70B parameters needs 70B * 2 FLOPs per token (the "2" accounts for matrix multiplication). On an NVIDIA H100 with 3 petaFLOPS, processing 1 token costs:

Cost per token ≈ (70B params * 2 FLOPs/token) / (3000T FLOPs/sec) ≈ 0.047 milliseconds

Over 1 million tokens: 47,000 milliseconds = 47 seconds. Multiply by GPU cost ($2/hour = $0.00055/second) = $0.026 per 1M tokens for raw compute.

APIs add overhead (infrastructure, data center rent, support), so actual pricing is 10-100x higher. Claude Opus at $5/1M tokens reflects compute cost + overhead.

Token Pricing vs Flat Rate:

Why not charge per request instead of per token?

  • A request for 10 tokens and 10,000 tokens requires vastly different compute
  • Per-request pricing would waste resources on short requests or charge excessively for long requests
  • Per-token pricing is fair: short request = low cost, long request = high cost

If Claude charged $0.10 per request:

  • A short prompt (10 tokens): cost overestimated by 1000x
  • A long analysis (100K tokens): cost underestimated by 1000x

Per-token pricing solves this.

Calculating Inference Cost

To estimate costs, count tokens in realistic prompts and multiply by pricing.

Step-by-Step:

  1. Write a representative prompt and response
  2. Count input tokens (prompt)
  3. Count output tokens (response)
  4. Look up model pricing
  5. Multiply: (input tokens * input rate + output tokens * output rate) / 1M tokens

Example: Customer Support Chatbot

Prompt:

You are a helpful customer support agent for an e-commerce company.
Answer the following question from a customer about our return policy.

Question: Can I return an item after 30 days?

Context: Our return policy allows returns within 30 days of purchase
for most items, with some exceptions for clearance items.

Response:

Yes, we accept returns within 30 days of purchase. Items must be in
original condition with tags attached. Clearance items are final sale.
To start a return, visit our Returns page or contact support@company.com.

Token Count (using Claude tokenizer):

  • Prompt: ~120 tokens
  • Response: ~45 tokens

Cost Calculation:

Using Claude Sonnet 4.6 ($3/$15 per 1M):

Cost = (120 * $3 + 45 * $15) / 1M
     = (360 + 675) / 1,000,000
     = $0.001035 per interaction

For 10,000 customer interactions/day:

Daily cost = 10,000 * $0.001035 = $10.35
Monthly cost = $10.35 * 30 = $310.50

Using Haiku instead ($1/$5):

Cost = (120 * $1 + 45 * $5) / 1M = $0.000345
Daily cost = $3.45
Monthly cost = $103.50

The model choice (Haiku vs Sonnet vs Opus) drives a 3-10x cost difference.

Common Tokenization Surprises

Tokenizers are unintuitive. Edge cases break mental models.

Surprise 1: Extra Spaces Are Tokens

"Hello world"    = 2 tokens
"Hello  world"   = 3 tokens (extra space is a token!)
"Hello\n\nworld" = 4 tokens (newlines add tokens)

Cleaning whitespace saves tokens.

Surprise 2: Numbers Are Cheap, Strings Are Expensive

"12345" = 1 token
"apple" = 1 token
"applesauce" = 2 tokens (tokens: "apple", "sauce")
"Apple_Sauce_123" = 4 tokens (underscores split tokens)

Numbers compress well. Long strings without spaces are expensive.

Surprise 3: Punctuation Adds Up

"Hi." = 2 tokens (Hi, .)
"Hi..." = 3 tokens (Hi, ., .)
"Hi!!!!!!!" = 9 tokens

Repeated punctuation is token-inefficient. Clean input = saved tokens.

Surprise 4: Common Phrases Are Optimized

"you are" = 2 tokens
"you're" = 2 tokens (no penalty for contraction)
"ChatGPT" = 4 tokens (proper nouns with caps are expensive)
"chatgpt" = 1 token (lowercase is cheaper)

Uppercase, especially in acronyms, adds tokens.

Surprise 5: Code is Token-Expensive

"print('hello')" = 7 tokens
"print hello" = 2 tokens

Code has more punctuation and special characters, so same logical content uses more tokens than prose.

Tokens vs Characters vs Words

Quick comparison for mental models:

MetricPer 1000 Chars (English)Per 1000 Words
Tokens250 tokens1300 tokens
Cost (Sonnet)$0.75$3.90

Rough Conversion:

  • 1 word ≈ 1.3 tokens
  • 1 sentence (15 words) ≈ 20 tokens
  • 1 paragraph (100 words) ≈ 130 tokens
  • 1 page (400 words) ≈ 520 tokens

For back-of-envelope estimates, use: 1 token ≈ 0.75 words or 1 word ≈ 1.3 tokens.

Tools for Token Counting

Don't estimate. Count actual tokens.

Official Tokenizer Libraries:

  • Anthropic: anthropic.get_token_count() in SDK, or token counter
  • OpenAI: tiktoken library (Python)
  • Google: google-cloud-aiplatform (Python)
  • Groq: Llama tokenizer (varies by model)

Web-Based Tools:

Python Example (Anthropic):

import anthropic

client = anthropic.Anthropic()

prompt = "What is the meaning of life?"
response = client.messages.count_tokens(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": prompt}]
)
print(f"Token count: {response.input_tokens}")

Always validate token counts before deploying to production. Token counting is cheap (free on most platforms).

Examples: Real Workload Costs

Example 1: Document Summarization

Input: 10-page report (3,000 words = 4,000 tokens) Output: 500-word summary (650 tokens)

Cost per document (Sonnet 4.6):

(4000 * $3 + 650 * $15) / 1M = $0.0198

Summarize 100 documents/day: $1.98/day or $59/month. Upgrade to Opus: $0.0656/document or $196/month for 100 documents/day.

Example 2: Email Classification (High Volume)

Input: Email (500 tokens average) Output: Classification label (10 tokens)

Cost per email (Haiku 4.5):

(500 * $1 + 10 * $5) / 1M = $0.00055

Classify 50,000 emails/day: $27.50/day or $825/month.

Example 3: Chat Application (Conversational)

Session history: 1,000 tokens (context) New user message: 50 tokens Response: 200 tokens

Cost per message (Sonnet 4.6):

(1050 * $3 + 200 * $15) / 1M = $0.00315 + $0.003 = $0.00615

100,000 daily messages: $615/day or $18,450/month.

For comparison: ChatGPT Plus is $20/month for unlimited usage. If the app runs in-house on LLM API, the cost is dramatically higher. This is why many apps use rate limiting or smaller models.

Example 4: Code Generation (IDE Integration)

Prompt: Current file (3,000 tokens) + chat (200 tokens) Response: Generated code (300 tokens)

Cost per completion (Sonnet 4.6):

(3200 * $3 + 300 * $15) / 1M = $0.0126

50 completions/day per developer (common rate): $0.63/day or $18.90/month per developer.

Cursor Pro is $20/month, so actual API cost is only 95% of subscription price. This is why Cursor is viable even with the subscription overhead.

FAQ

How many tokens is the Bible?

Approximately 790,000 words = 1,000,000 tokens. Cost to process with Sonnet: (1M * $3) / 1M = $3. Input the entire Bible to Claude for analysis for $3.

Does pagination or formatting affect token count?

Yes. Markdown, HTML, JSON, code blocks all add tokens. Plain text is most efficient. Structured data (JSON, XML) has overhead from symbols and keys.

If a model outputs fewer tokens than expected, does the cost decrease?

Yes. Only tokens actually generated are charged. If the prompt allows 200 tokens but generation stops at 50 tokens, only 50 are billed. Set max_tokens to control output length.

Why do input and output tokens have different prices?

Generating tokens is harder than processing input tokens. Input processing is cached and parallel. Output generation is sequential and computationally heavier. Most models charge 2-10x more for output than input.

Is there a way to reduce token count without changing meaning?

Yes:

  1. Remove filler words and redundancy
  2. Use compressed formats (lists instead of prose)
  3. Use JSON or structured formats for complex data
  4. Truncate historical context in conversations (keep recent messages only)
  5. Use lowercase, minimal punctuation

How accurate is the "1 token = 4 characters" rule?

Roughly 75% accurate for English. Fails for:

  • Code (more punctuation)
  • Non-English languages (more expensive)
  • Proper nouns with capitals (more expensive)
  • Numbers and repeated characters (cheaper)

Always use official tokenizers for precision.

Can tokens represent emojis?

Yes. Emoji = 1-2 tokens depending on complexity. Unicode expansion adds tokens.

Advanced Token Concepts

Token Probability Distribution:

Models don't just generate the next token. They generate a probability distribution over the entire vocabulary (50K tokens for Claude). The model samples from this distribution. High probability tokens (high confidence) and low probability tokens (high uncertainty) both consume 1 token of output, but carry different information value.

High-entropy outputs (e.g., creative writing) might need more tokens to reach the same quality as low-entropy outputs (classification, summaries). This affects optimal use cases for different models.

Special Tokens and Hidden Costs:

Models use special tokens for formatting:

  • Messages use <start_conversation> and <end_conversation> tokens
  • Function calling adds 30-50 token overhead per function
  • Tool use (web search, database queries) adds tokens for tool descriptions and schemas
  • Streaming adds minimal overhead but requires connection management

Budget 5-10% extra tokens for "hidden" infrastructure tokens beyond raw content.

Token Efficiency Across Domains:

DomainToken EfficiencyNotes
Fiction~200 tokens/1000 wordsNatural language, minimal special chars
Code~150 tokens/1000 wordsMore special chars, operators
Math~100 tokens/1000 symbolsDense notation, formula heavy
JSON~80 tokens/1000 symbolsPunctuation and structure overhead
Logs~250 tokens/1000 linesOften contains natural language

Structured formats (JSON, XML) are less efficient than prose. If optimizing for cost, trade structure for readability.

Token Counting for Streaming:

Streaming returns tokens one at a time. Total tokens consumed = sum of all tokens. Streaming doesn't save tokens, only improves UX (users see real-time output). Budget for full token count even in streaming scenarios.

Real-World Debugging: Token Count Mismatches

Common issue: API bill is 2x higher than expected.

Root causes:

  1. Hidden context. System prompts + retrieved documents + conversation history adds up fast. Audit token count on real requests.

  2. Batching inefficiency. If aggregating 100 small requests, the aggregate token count might surprise. Batch API still charges for every token, just at discount.

  3. Streaming overhead. Streaming doesn't save tokens. Every token streamed is still billed. Check if streaming is necessary.

  4. Retry loops. Failed requests are still billed. Retries double the cost. Implement careful retry logic.

  5. Model version mismatch. A typo switching from Sonnet to Opus is a 65% price increase. Audit model selection in code.

To debug: enable token counting on 1% of production requests. Log expected vs actual token usage. Differences reveal hidden costs.

Final Thoughts

Tokens are the fundamental unit of LLM economics. Understanding them is essential for cost prediction, optimization, and responsible scaling.

A token-aware developer:

  • Estimates costs before deploying
  • Optimizes prompts to use fewer tokens
  • Chooses models intentionally (not defaulting to expensive ones)
  • Monitors actual vs expected token usage
  • Explains costs to non-technical stakeholders

Tokens will remain the unit of LLM pricing for the foreseeable future. Mastering tokenization is a core skill for anyone building AI applications.

Sources