Prompt-cache observability for LLM APIs. Per-call hit ratio, cost saved, regression alerts. Anthropic, OpenAI, Bedrock.
Cost + latency aggregation for LLM agent runs. Group calls into named runs, get totals, p50/p95, and per-model breakdowns. Composes with cachebench.
Idempotency keys for LLM agent retries. Deterministic content-derived keys (UUIDv5 or sha256-hex) so retries dedupe at the provider.
Pre-warm Anthropic prompt cache before user traffic. Injects cache_control breakpoints, fires a tiny warmup call, optionally verifies the cache hit, and reports tokens, latency, and estimated cost. No SDK dependency.
Stable canonical hash of LLM request/message structures. Recursive key-sorting JSON canonicalization + sha256, with per-provider ignore-lists so semantically-equal Anthropic/OpenAI/Bedrock requests produce the same hash. Useful for cache keys and idempotency.
Stretto is a high performance thread-safe memory-bound Rust cache.
Content-addressable LRU cache for LLM agent tool calls. Same tool, same args -> same answer, returned from memory. Optional TTL, content-addressable on (tool_name, args) with canonical-JSON keys.