← Back to Changelog
Feature Added
Released: 2026-03-01
Context Compression automatically summarizes conversation history using a smaller, cost-efficient model before sending requests to your primary model. This significantly reduces input token costs while preserving conversation context.
### Highlights
- **Up to 78% token savings** on long multi-turn conversations
- **Three compression strategies** to balance quality vs. savings:
- `conservative` — compresses after 8+ turns (minimal context loss)
- `on` — compresses after 6+ turns (balanced)
- `aggressive` — compresses after 3+ turns (maximum savings)
- **All endpoints supported:**
- `POST /v1/chat/completions`
- `POST /v1/messages`
- `POST /v1/responses`
### How to Enable
#### Option 1: API Key Dashboard (Zero Code Changes)
Go to [API Key Management](https://apertis.ai/token) → Edit your API key →
Enable Context Compression and select your preferred strategy. All requests using
that key will automatically apply compression.
#### Option 2: Per-Request via Request Body
```json
{
"model": "gpt-4.1",
"messages": [...],
"compression": {
"enabled": true,
"strategy": "on",
"model": "gpt-4.1-mini"
}
}
```
#### Option 3: Per-Request via HTTP Headers
X-Context-Compression: on
X-Compression-Model: gpt-4.1-mini
#### SDK Support
Compression examples are now available for all supported SDKs:
- Python SDK (OpenAI, Anthropic, Responses API)
- TypeScript / Vercel AI SDK (@apertis/ai-sdk-provider)
- LangChain (via default_headers)
- LlamaIndex (via additional_kwargs)
- LiteLLM (via extra_headers)
#### Priority
Request body params > HTTP headers > API key defaults.
Per-request settings always override key-level defaults.
See more on [**Documentation**](https://docs.apertis.ai/api/text-generation/context-compression)