← Back to Changelog
System Update
Released: 2026-02-26
New feature: Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.
- On cache hit, the system emits synthetic SSE chunks from the stored response — no upstream API call needed
- Content is split on rune boundaries (50 runes/chunk, 10ms intervals) to preserve multi-byte characters
- Proper X-Cache-Hit, X-Cached-Tokens, and X-Actual-Model headers on streaming cache hits
- Non-streaming cache hits continue to work as before (direct JSON response)
Cache Correctness Hardening
- Temperature guard: Only caches requests where temperature: 0 is explicitly present in the raw JSON body. Omitted temperature (Go zero value 0.0) is no longer falsely treated as
cacheable — providers default to ~1.0 for omitted values
- SSE error safety: If synthetic SSE emission fails mid-stream, the handler returns immediately instead of falling through to normal processing, preventing HTTP double-write
corruption
- Tool call exclusion: Responses containing tool_calls are excluded from cache storage since the SSE emitter only supports text content replay
Cache TTL & Infrastructure
- Default prompt cache TTL extended from 10 → 30 minutes
Enjoy it.