A local LLM inference platform with model pulling, quantization optimization, and high-performance serving