Local coding
Run a coding LLM on your own machine. Hardware, models, and the OSS agents that replace (or extend) Claude Code.
Why local?
Privacy by default, no rate limits, predictable cost after the hardware, and it works offline. What you give up is the top of the curve: cloud frontier models still pull ahead on long-horizon agent work and on the most complex repo-wide reasoning. For completions, refactors, code review, and small multi-file edits, local in 2026 is competitive.
Step 1 - Pick your hardware tier
| Tier | Top model size at usable quant | Tokens/sec ballpark | Watts |
|---|---|---|---|
| Apple Silicon, 16GB unified | 7-9B Q4 | 15-30 | ~30W |
| Apple Silicon, 32-64GB unified | up to ~30B dense, or ~80B MoE | 20-40 | 40-80W |
| Apple Silicon, 256GB (M3 Ultra Mac Studio) | up to ~120B MoE, or 70B dense at low quant | 15-25 | ~120W |
| GPU, 16-24GB VRAM | 14-27B Q4 | 60-120 | 250-450W |
| GPU, 48GB+ VRAM | 70B-class at Q4 | 80-150 | 450W+ |
| CPU + 32GB RAM | 7-9B Q4 | 3-8 | ~65W |
Step 2 - Pick your model
Match to your tier. Each links to its tracked page on this site.
- 8GB GPU / 16GB Mac. Qwen3.5-9B at Q4. Best fit we track for this tier; small open coders are still the thinnest part of the 2026 landscape, so calibrate expectations - good for inline completions, less good for autonomous edits.
- 16-24GB GPU / 32GB Mac. Mistral Small 3.2 24B or Qwen3.6-27B. The volume sweet spot. gpt-oss-20b is a permissive-license alternative that runs on a wider range of hardware.
- 48GB GPU / 64GB+ Mac. Qwen3-Coder-Next. 3B active / 80B total MoE, ~70.6% SWE-Bench Verified, 256K context, Apache 2.0. Strongest open-weights coder that fits a single workstation today.
What we're not recommending and why: DeepSeek V4-Pro (~80.6% SWE-Bench Verified, 1T params, 1M context) and GLM-5.1 (754B MoE, 58.4% SWE-Bench Pro - the leading open-weights score) are both real options, but GLM-5.1 wants ~8x H100 minimum and DeepSeek V4 is in the same data-center bracket. They're outside what one developer fits under a desk.
Step 3 - Pick your runtime
- Ollama - easiest install, wraps llama.cpp, has a model registry. Default for new users.
- LM Studio - GUI on top of llama.cpp / MLX. Pleasant on a Mac.
- llama.cpp - GGUF directly, no abstraction, every flag exposed.
- MLX - Apple Silicon native. Faster than llama.cpp on M-series for many models.
- vLLM - production GPU serving. Overkill for one developer; right answer if you're sharing the machine.
Step 4 - Pick the agent
Coming from Claude Code, you have two paths.
Keep Claude Code, point it at a local backend. Set ANTHROPIC_BASE_URL to a local proxy that translates to/from the Anthropic Messages API. LiteLLM in front of Ollama is the common setup. You keep the workflow you know; you lose the parts of Claude Code's scaffold that were tuned to Claude's behavior. Sensible if local is your sometimes-mode, not your default.
Switch tools. Pick by where you live:
- Aider - terminal, git-aware, native BYOM, repo map, auto-commits. The closest analogue to Claude Code's feel and the cleanest swap if you're going local-first.
- Continue.dev - VSCode / JetBrains. Inline completions plus a chat sidebar, BYOM.
- Cline - VSCode visual agent with step-by-step approval and browser automation.
- Roo Code, OpenCode - newer entrants, worth tracking but less stable.
A note on tool-use: agents tuned for frontier cloud models (Claude Code, Cline) tend to over-call tools or burn context on local models with weaker function-calling. Aider's diff-based approach degrades more gracefully on small models.
Step 5 - Wire it up
Two minimal recipes. Substitute your model and quant.
Aider with Ollama:
ollama pull qwen3-coder-next:q4
aider --model ollama/qwen3-coder-next
Claude Code at a local endpoint:
litellm --model ollama/qwen3-coder-next --port 4000
ANTHROPIC_BASE_URL=http://localhost:4000 claude
The proxy handles Anthropic-to-OpenAI translation. Expect rough edges around tool-use streaming.
Honest limits
The gap to cloud frontier has narrowed but not closed. On SWE-Bench Verified (April 2026), Qwen3-Coder-Next sits around 70%, the strongest open coders that fit a workstation (Qwen3.6-27B at 77.2%, MiniMax M2.5 at 80.2%) reach the high 70s to low 80s, and the largest open models (DeepSeek V4-Pro at 80.6%) sit just over - against Claude Opus 4.7 at 87.6% on the same benchmark. The headline gap is real but narrower than a year ago. On the kinds of tasks that fit in a few thousand tokens of context - completions, single-file edits, code review, small refactors - a 24GB+ local setup is hard to distinguish from frontier. Where cloud still pulls ahead: long-horizon agent loops, refactors that touch hundreds of files, and tasks that depend on the model holding a plan together for tens of minutes. Treat single-digit benchmark deltas as noise; calibrate against your own repo.
Last updated May 2026.