AI middleware that reduces LLM quota usage by 80-95% through smart caching, task decomposition, and context optimization