Optimization Articles
Browse 143 articles about Optimization.
ClaudeMem vs. Dumping Full Context into Claude Code: The 10x Token Cost Difference Explained
Dumping all past context into Claude Code is expensive. ClaudeMem's three-layer vector search cuts retrieval token costs by ~10x.
Context Mode for Claude Code Compresses 315KB Sessions to 5KB — Here's How to Install and Use It
Context Mode achieves a 63x compression ratio on Claude Code sessions. Install steps, slash commands, and when to use it over alternatives.
One Founder Video Lifted Conversion Rate 33% — Here's the Claude Code Landing Page Stack Behind a $1.2M Business
A founder video moved CVR from 10% to 15%. Video testimonials cut Google Ads CPA 7x. Here's the full Claude Code stack that powers it.
John Preskill's Quantum Paper Used an Open-Source LLM Optimizer — and It Made Algorithms 1,000x Better
Caltech's John Preskill co-authored a paper where AI did the heavy lifting — improving early quantum algorithms by 1,000x via OpenEvolve.
Landing Page Speed Kills Conversions: 4 Data Points and How Claude Code Fixes Them in One Pass
A 2-second load time cuts conversions in half. Four speed-to-CVR data points and how to fix them by pasting your Lighthouse report into Claude.
Omar Khattab's DSPy Follow-Up: How Auto-Optimizing Your Harness Put a Tiny Model at #1 on TerminalBench
DSPy's creator showed a Haiku-powered harness beat larger models on TerminalBench. The secret: 10M tokens of automated harness optimization.
9 Video Testimonials Cut Google Ads Cost Per Conversion from $200 to $30 — The Landing Page Data Behind the Drop
Adding 9 video testimonials to a landing page dropped Google Ads CPC from $200 to $30. Page speed data shows 4-second load kills 80% of conversions vs.
Andrej Karpathy Said 'The Tokenizer Must Go' — DeepSeek's Vision Architecture Is Starting to Prove Him Right
Karpathy called pixels better inputs than text tokens after DeepSeek's OCR paper. Their new visual primitives model takes that idea further with 7,000x…
Claude Code Context Mode Compresses 315KB Sessions to 5KB — Here's How to Install It
Context Mode routes tool calls through a sandbox and shrinks a 56KB Playwright snapshot to 299 bytes. Two commands to install.
Claude Code /ultra review: 5 Things You Need to Know Before Running It ($5–$20 Per Run)
Ultra review spins parallel reviewer agents but costs $5–$20 per run and requires a Claude account, not just an API key. What to know first.
DeepSeek's 'Thinking with Visual Primitives': 5 Technical Breakthroughs in the Paper That Briefly Disappeared
DeepSeek's vision paper was published then pulled. Here are 5 key technical details — including inline bounding-box tokens and a 7,000x compression ratio.
DeepSeek Vision's 7,000x Image Compression Pipeline: From 756px Input to 81 KV Cache Entries
DeepSeek's vision model compresses a 756x756 image through four stages down to 81 KV cache entries — a ~7,000x total compression ratio. Here's each step.
DeepSeek Vision vs. Claude Sonnet 4.6 vs. Gemini Flash 3: Which Vision Model Uses 10x Less KV Cache?
DeepSeek's vision model uses ~90 KV cache entries per image vs. ~870 for Sonnet 4.6 and ~1,000 for Gemini Flash 3. Here's what that means for cost.
Andrej Karpathy on DeepSeek's OCR Paper: Why Pixels May Beat Tokens as AI Inputs
Karpathy called DeepSeek's Oct 2025 OCR paper — 10x text compression, 97% accuracy — a sign that tokenizers are on the way out.
Anthropic's Harness Detection Bug: 3 Things That Triggered Unexpected Claude Code Charges
A git commit mentioning 'hermes.md' triggered a $200.98 overage on a plan showing 86% unused. Here's exactly what caused it and how Anthropic responded.
How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step
From Ollama for daily use to vLLM for serving to TensorRT-LLM for production — here's the complete local AI runtime stack and when to use each layer.
Claude Code Skills Architecture: 4 Layers That Keep Your AI Agent Fast and Focused
The .claude/skills/ folder uses progressive context loading — only ~100 tokens read at search time — to keep Claude Code lightweight across dozens of SOPs.
Cursor SDK + GPT-5.5 Scores 87.2% vs Native Codex's 61.5% — The Harness Is the Bottleneck
Switching GPT-5.5 from Codex's native harness to Cursor's SDK jumped functionality from 61.5% to 87.2% — a 26-point gain from the harness alone.
GitHub Copilot's CPO Says the Flat-Rate AI Pricing Model Is Dead — What Usage-Based Billing Means for Builders
GitHub Copilot CPO Mario Rodriguez said flat-rate AI pricing 'is no longer sustainable.' Here's what the shift to usage-based billing means for AI builders.
Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System
One user turned 383 files and 100+ meeting transcripts into a compact wiki using Karpathy's raw/wiki pattern — and dropped Claude token usage by 95%.