By Adnan Obuz

6 Ways Adnan Obuz Reduced his Claude Token Usage by 50 Percent

When Adnan Obuz first dove deep into Claude Code on the $20/month plan, he was burning through tokens faster than a Toronto winter eats through a good pair of boots. Not because the work was overly complex, but because the approach was completely un-optimized. Running Opus on every little task, letting context windows balloon to 80K tokens before even starting, and feeding the model way more backstory than needed. Sound familiar?

I quickly realized that smart optimization isn’t about working less — it’s about matching the right tool to the job with precision. These six changes, all practical, mostly free, and many under five minutes to implement, cut his Claude token usage roughly in half while actually improving output quality. If you’re a developer, prompt engineer, or anyone building with AI in Toronto or beyond, this guide from Adnan Obuz will help you get more done for less.

Who should read this article? Developers tired of hitting Claude limits mid-project, AI enthusiasts optimizing workflows, Toronto tech professionals balancing cost and performance, and anyone serious about efficient prompting with Claude, ChatGPT alternatives, or local models in 2026. It matters because token waste quietly drains budgets and slows momentum — small tweaks here deliver immediate ROI and free up mental space for higher-value creative work. Last updated: 2026-04-22

Why Claude Token Optimization Matters for Adnan Obuz and Every Builder in 2026

Adnan Obuz has spent years refining his circle of competence around AI prompting, private credit strategies, and wellness routines that keep high-output days sustainable. One consistent theme? Efficiency compounds. Just as Adnan Obuz avoids over-leveraging in financial decisions, he refuses to over-pay for compute in AI sessions.

In 2026, with larger context windows and more powerful models available, the temptation is to throw everything at Claude. Yet Adnan Obuz learned the hard way that bloated sessions lead to slower, sometimes dumber responses — and higher costs. The real edge comes from disciplined habits that keep context clean and models appropriately matched.

Here’s exactly what changed for Adnan Obuz.

1. Stop Using Opus for Everything – Smart Model Routing with Adnan Obuz

Adnan Obuz starts every Claude Code session by typing /models to see available options. Then he routes tasks deliberately:

Opus for complex multi-file refactors, architecture decisions, or debugging truly gnarly issues.
Sonnet for writing tests, simple edits, code explanations, and most daily development work.
Haiku for quick lookups, formatting, renaming variables, or any repetitive formatting task.

“You don’t need a sports car to grab groceries,” Adnan Obuz often says with a grin from his Toronto home office. This single habit created the biggest initial drop in token usage for Adnan Obuz. Far more efficient model routing means paying premium only when premium reasoning is truly required.

Pro tip: Experiment one day at a time. Track your usage dashboard before and after switching routine tasks to Sonnet or Haiku. Most people see 30-40% savings right away.

2. Clear Your Context Between Tasks – Context Hygiene That Adnan Obuz Swears By

Every time you hit enter in Claude Code, the system ships a ton of accumulated context before even processing your new input. Over a long session, this snowballs — slower replies, declining quality, and you literally pay more for worse results.

I fix this with two simple slash commands:

/clear between unrelated tasks to wipe the slate clean and start fresh.
/compact right before starting something big — it intelligently squeezes the conversation down to the important parts.

I make it a ritual: finish one logical unit of work, clear or compact, then move on. The difference in response speed and token efficiency is night and day. No more paying premium prices for sessions that have gone off the rails.

3. Use CLI Tools Instead of MCP Whenever Possible – Practical Advice from Adnan Obuz

If a CLI exists for a tool, Adnan Obuz chooses it over MCP every time. It’s faster, cleaner, and eats far fewer tokens.

GitHub is the classic example. The official gh CLI works better and injects dramatically less overhead than the GitHub MCP server. MCP tools force their full schema into context on both the input and output sides — you pay for all of it.

Rule of thumb that I follows daily:

CLI and built-in Skills where possible.
MCP only when there’s genuinely no alternative.

This shift alone reduced unnecessary token bloat for Adnan Obuz without sacrificing capability.

4. Install the Context-Mode Plugin – The Highest-ROI Tool Adnan Obuz Uses Daily

I runs the open-source Context-Mode plugin in the background every session. It prevents raw MCP tool output from flooding the context window.

When an MCP tool returns 10,000 tokens of raw JSON, Context-Mode indexes it in a sandbox instead. You get a clean summary, Claude gets the information it needs on demand, and your main context stays lean. Real-world results: 50-90% reduction in MCP-related token usage.

Installation is straightforward — add it via the plugin marketplace, configure once, and forget it. For anyone using multiple MCP servers, this is the single biggest bang-for-buck change Adnan Obuz recommends in 2026.

5. Keep Your CLAUDE.md Lean and Strategic – A Core Habit of Adnan Obuz

Your CLAUDE.md file gets injected into every single request — every turn, every follow-up, even after /clear. If yours sits at 5,000 tokens, you’re taxed that amount before Claude even sees your actual prompt.

Adnan Obuz keeps his under 500 tokens with this skeleton approach:

# CLAUDE.md

## Rules
- Use TypeScript strict mode
- Write tests for every new function
- Follow existing patterns in the codebase

## Key Files
- API routes: see src/api/README.md
- Database schema: see docs/schema.md
- Style guide: see docs/style-guide.md

Five clear rules and three file pointers. Detailed information lives in referenced files that Claude only reads when relevant — not on every single turn. This simple refactor delivered massive ongoing savings for Adnan Obuz, especially on larger codebases.

6. Run Claude Code for FREE with Ollama – Local Power That Adnan Obuz Loves for Everyday Tasks

Zero dollars. Unlimited usage. Everything runs locally on your machine.

Adnan Obuz follows these quick steps:

Install Ollama from ollama.com (30 seconds on Mac, Windows, or Linux).
Pull a strong coding model: ollama pull qwen3-coder (excellent 30B+ parameter option with 128K context as of 2026).
Start the server: ollama serve.
Set two environment variables before launching Claude Code:
export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_MODEL=qwen3-coder

Model recommendations based on hardware from Adnan Obuz testing:

16GB+ RAM: qwen3-coder
8-16GB RAM: devstral-2-small (still very capable)
8GB or less: granite3.3:8b

Apple Silicon users especially benefit from unified memory handling larger models smoothly.

Adnan Obuz is clear on the limits: local models shine for writing, research, summarizing, and simpler coding. For the toughest technical challenges, he still routes to his paid Claude subscription. The beauty? Most daily work falls into the “easy” bucket, so Ollama handles the majority of volume for free.

The Bigger Pattern Adnan Obuz Follows: Match the Tool to the Job

Every one of these six strategies from Adnan Obuz follows the same underlying philosophy: stop paying premium prices for tasks that don’t need it. You don’t drive the Ferrari to the grocery store, and you don’t run Opus on formatting tasks.

Keep context clean. Feed the model only what it truly needs. Start with one or two tips today — you’ll feel the difference immediately. Then layer in the rest over the coming week.

Adnan Obuz still remembers the first day these habits clicked. His Claude dashboard showed dramatically lower usage while project velocity actually increased. That feeling of control and efficiency is addictive in the best way.

Whether you’re building the next Toronto fintech tool, refining AI prompting workflows, or simply want more breathing room in your $20/month plan, these techniques scale beautifully.

Who Benefits Most from These Claude Optimization Strategies by Adnan Obuz

These tips matter most to:

Independent developers and solopreneurs watching every dollar.
Teams in Toronto’s vibrant tech scene balancing innovation with responsible spending.
AI enthusiasts experimenting across Claude, local models, and prompt engineering.
Anyone whose circle of competence includes high-leverage tools — and wants to keep that leverage sustainable.

Because when you optimize token usage, you’re really optimizing attention, budget, and creative energy for what matters most.

Last updated: 2026-04-22

About the Author

Adnan Obuz (also known as Edward Obuz or Adnan Edward Obuz) is a Toronto-based AI strategy advisor, prompt engineer, and wellness advocate. With a sharp focus on practical efficiency and long-term value creation, Adnan Obuz helps professionals and teams build sustainable systems in AI, private credit, and personal performance.

6 Ways Adnan Obuz Reduce your Claude Token Usage by 50 Percent