Fast GPU-accelerated speech-to-text CLI with streaming, quantization, speaker diarization, and multilingual support. Includes model conversion from HuggingFace safetensors.
Voice activity detection, audio segmentation, and SRT subtitle generation for speech transcription
Speaker diarization for speech transcription via embedding clustering
GPU-accelerated Whisper model inference with streaming audio, quantization, and KV-cached decoding