Why I switched the cheap default to Gemini
If you build with Claude, you probably default to Haiku for cheap, high-volume background work. I did the same for an image-transcription feature in my Bluesky tooling: Haiku 4.5 for routine scanning, Opus 4.7 for interactive. Both Anthropic.
| alias | latency | $/image | chord-token recall |
|---|---|---|---|
gemini-2.5-flash-lite |
~8s | $0.001 | 95% |
gemini-2.5-flash |
~10s | $0.003 | 100% |
gemini-3.5-flash |
~10s | $0.014 | 100% |
claude-haiku-4-5 |
~7s | $0.008 | 18% |
claude-opus-4-7 |
~20s | $0.12 | 91% |
Gemini 2.5 Flash-Lite — Google's cheapest production model at $0.10/$0.40 per million tokens — recalled 21 of 22 chord tokens from a dense chord-detection table. Haiku at eight times the cost recalled 4 of 22.
Haiku wasn't failing to read the image. It was failing to transcribe it. Asked to "preserve structure (tables, columns, ordered lists)," Haiku returned a prose summary with "Sample entries:" and gave up on the rest of the table. The Gemini models — including Lite — copied it out row by row. The gap wasn't OCR capability; it was instruction-following.
Acting on the result needed software work first. Gemini 3.5 Flash, announced at Google I/O on May 19 — four days before this conversation — was unknown to my tooling. The local Gemini library was three months old, missing the new model and a recent API parameter, thinking_level, that controls how much output budget the model spends on internal reasoning. Default is medium, which silently consumes tokens. A max_output_tokens=1000 call returned 160 characters of truncated output and finishReason: MAX_TOKENS — no error.
Two PRs fixed it. One updated the library: added gemini-3.5-flash, threaded thinking_level through (silently dropped for older models that 400 on it), rewrote the model docs to May 2026 reality. The other added five model choices to the bsky image transcriber — three Gemini, two Anthropic — and made gemini-2.5-flash-lite the routine default. 3.5 Flash is in the list too, but Lite is fourteen times cheaper for one less correct token out of 22.