Opus 4.7 Reads the Same Words. Counts More of Them.

Anthropic’s new model uses a new tokenizer that burns through up to 35% more tokens for the same text. Sticker price didn’t budge. Your bill might. Here’s who feels it first, what it actually costs yo

Apr 17, 2026

Anthropic released Opus 4.7 yesterday.

Same price per token.

Your bill can still climb up to 35% on the exact same prompts.

Not a price hike. Not officially. The rate card didn’t move a penny. Five bucks per million input tokens. Twenty-five per million output. Identical to 4.6. Anthropic went on the record and everything.

What changed is the tokenizer. (This is the part where half the audience’s eyes glaze over and they start planning dinner. Stay with me for ninety more seconds. Then you can go make Spaghetti-Os.)

A tokenizer is the thing that chops your text into the little billable pieces AI companies charge for. Opus 4.7 uses a new one that’s sharper at understanding what you wrote, which is great, and also counts the same paragraph as up to 35% more tokens, which is less great. The actual range Anthropic publishes is 1.0x to 1.35x. Plain English prose lands near the low end. Code, structured data, tables, JSON, and non-English text land near the high end. Nobody hits a flat 35% on everything. Some workloads feel it like a sneeze. Others feel it like a rent hike.

If you’re on Claude Pro or Max, you’re not paying per token. You’re paying per plan. Which means the 35% lands as “fewer effective messages before the app starts politely informing you to come back in four hours.” Same dollar. Less usage.

If you’re on the API, or Claude Code, or you’ve wired Claude into any automation or pipeline, you pay per token. Your ten-cent request can become thirteen cents on the exact same text. Nothing about your prompt changed. The math changed underneath it.

Stick-figure cashier shrugs next to a $5 register display and a receipt showing a 35% surcharge, illustrating how Claude Opus 4.7's new tokenizer raises effective costs up to 35% despite unchanged per-token pricing for AI content creators. — A magic trick where the $5 stays visible and the extra 35% happens behind the curtain.

The Tokenizer Change Is Not a Bug. It’s a Forcing Function.

Creators have been stuffing prompts with decorative adjectives, loading system instructions with clauses nobody reads, pasting entire 4,000-word articles into Claude when they needed one paragraph rewritten. This update is the first time that habit showed up on an invoice.

Nobody was going to tell you to trim your prompts when it didn’t matter.

Now it matters.

This is the tax on sloppy context. It is also, if we’re being fair, overdue.

I’ve watched creators write Voiceprints that sound like a man writing his own Tinder bio in the third person. Three pages of “my voice is authentic, engaging, warm yet professional, conversational but informed, casual but smart, bold but not aggressive.” (If your Voiceprint needs “warm yet professional” AND “casual but smart” AND “bold but not aggressive” in the same paragraph, you didn’t write a Voiceprint. You wrote your own LinkedIn recommendation and forgot to sign someone else’s name to it.)

Every adjective in that Voiceprint is now a billable token. Every “warm yet professional” is a small recurring surcharge on a document that was already doing approximately nothing for your writing. The lean-beats-bloat argument has been alive in the archive for a while, but it was theoretical. Now the math took a side.

The Six Habits That Just Got Expensive

Every one of these was always bad. Now they’re bad and metered.

1. The bloated system prompt. If your system prompt runs 800 words and contains the phrase “you are a helpful assistant” anywhere in it, it’s costing you. Every request. Forever. Trim to what’s load-bearing. Delete the decorative framing. If a line doesn’t change the model’s behavior when you remove it, it was never doing work.

2. The horoscope Voiceprint. Yours should be operational. Banned words. Rhythm patterns. Specific tics. Hard constraints. Not “conversational yet authoritative with a touch of playful warmth.” If your Voiceprint can’t tell Claude what NOT to do, it’s decoration. (I built mine for three pages before admitting two and a half were vibes. The current version is one page and produces tighter output than the bloated one ever did. Go figure.)

3. Pasting whole articles when you need one paragraph. You don’t need Claude to read 3,000 words to help you rewrite 200. Excerpt the section. Show the context you actually need. Nothing else. This one alone probably accounts for a quarter of the waste in most creator workflows.

4. Dragging 40-turn conversation histories. The model re-reads the entire conversation every turn. If you’ve been talking to the same thread for three hours about five unrelated tasks, you’re paying to include all of it in every new request. Compact. Start fresh when the task changes. Your first-hour prompt about a newsletter draft does not need to keep accompanying your fourth-hour question about a spreadsheet formula.

5. Using Opus for everything. Sonnet 4.6 is 40% cheaper per token and handles most content generation, classification, and routine work without meaningful quality loss. Save Opus for the tasks that actually benefit from it: long coding sessions, deep agentic workflows, high-stakes reasoning, vision-heavy work. If you’re using Opus to format a bulleted list, you’re ordering filet mignon to feed a raccoon. (The raccoon does not care. The raccoon is going to take a single bite, drop the rest into a storm drain, and then fight its own reflection. You could have served it a mitten. The outcome is identical.)

6. Ignoring caching on the API. If you’re repeatedly passing the same context (your Voiceprint, style guide, project knowledge, research doc), cache it. Cache reads are priced at 10% of standard input. The math pays back after one reuse for a five-minute cache, two reuses for a one-hour cache. If you’re running daily content workflows and not caching, you are lighting money on fire with excellent posture. This isn’t a minor tweak. It’s the single biggest lever for controlling Opus cost post-4.7, and it’s been sitting right there the whole time.

Stick-figure creator calmly writes in a small notebook labeled "Lean Prompt" while a massive filing cabinet labeled "Full Stack 2024 Workflow" burns behind them, illustrating how lean AI prompts survive model updates while elaborate workflows break with every Claude release. — The workflow built on tricks breaks every six months. The one built on discipline just keeps going.

The Deeper Point (Which Will Outlive Opus 4.7)

Every model update breaks somebody’s workflow.

Tokenizers change. Context windows shift. Default behaviors adjust. What worked at “medium effort” starts needing “high.” Prefill goes away. Extended thinking budgets disappear. A new effort tier appears between “high” and “max” and half your prompts suddenly feel underpowered.

This is the churn that nobody warns you about when they sell you the “just use AI” gospel. Models drift. Tokenizers get rewritten. The workflow built around one specific version, tuned to one specific model, decorated with tricks that happen to work this week. That workflow breaks every six months. Sometimes faster.

The workflow built on discipline survives.

Tight prompts survive model updates. Lean Voiceprints survive tokenizer changes. Curated context survives context-window shifts. Caching survives almost everything. Better inputs always beat bigger ones, and now they also cost less in a visible, measurable way.

This is not a glamorous insight. It is not going to win any conference keynotes. There’s no certification for “guy who audits his own prompts monthly.” But it’s the actual difference between creators who ship every day for three years and creators who get flattened by every platform announcement like they personally owe it money.

Opus 4.7 is better at almost everything. Better coding. Better vision. Better instruction-following. Better agentic self-verification. Better long-context retrieval. The benchmarks are real and the gains are real. For a lot of workloads, the quality jump more than justifies the extra tokens, especially once you factor in that low-effort 4.7 now matches medium-effort 4.6 on output quality.

It also eats more tokens.

Both things can be true at once. Most improvements in this space come with a tradeoff, and the tradeoff just got a price tag you can read.

The move isn’t to panic. The move isn’t to switch providers or write a rage post about Anthropic’s stealth pricing. The move is to go audit your own system today and remove the 30% of context that was never doing any work in the first place.

It was always costing you.

Now you can see it.

Expensive slop is still slop. It just comes with a receipt now.

🧉 What’s one thing in your AI setup you KNOW is bloat, but haven’t cut yet because you’re scared it’s secretly doing something? Adjective stacks. Mission-statement preambles. Voice descriptors. Instructions you wrote six months ago and never revisited. The line that survives every cleanup because you’re not sure what happens if you pull it.

Crafted with love (and AI),

Nick “Tokenizer Tax Survivor” Quick

PS: Half the creators reading this have a Voiceprint longer than their last three posts combined. If that’s you, the free Voiceprint Quick-Start Guide walks you through the version that actually earns its token count. Grab it, trim your doc in an afternoon, and save yourself the tokenizer tax on every request going forward. And if this landed, forward it to the creator you know who’s been “refining” the same 2,800-word voice prompt since November. They’ll thank you. Or they’ll block you. Either way, you’ll have helped.

PPS: Like, comment, restack, subscribe. I publish every day, which means your inbox can either get one more entry from me or one more from a real estate app you signed up for in 2019 and never figured out how to unsubscribe from. Choose wisely. (I am not an impartial party to this decision.)

Co-Write with AI

Discussion about this post

Ready for more?