Voice Pipeline

Built-in voice system with STT, TTS, wake detection, 48 voice commands, and AI companion mode

Overview

Lite Suite has a built-in voice pipeline — press a hotkey to dictate text, navigate the workspace with voice commands, or have a full conversation with an AI companion. Everything runs locally using Whisper for transcription and Qwen3-TTS for speech synthesis.

Features

Global dictation — press a hotkey, speak, get text pasted into the active panel
LLM refinement — optional pass to clean up grammar and punctuation before pasting
48 voice commands — navigate panels, control the workspace, trigger actions hands-free
Wake detection — hands-free activation without pressing a hotkey
AI companion mode — full voice conversation with an AI that remembers context
Emotion classification — the system detects emotional tone and adjusts TTS expression
11 visualizer modes — audio-reactive visualizations during voice interaction

STT (Speech-to-Text)

Local transcription via Whisper (runs on port 8080, started on demand). Multiple model sizes available — smaller models are faster, larger models are more accurate.

| Model | Speed | Accuracy | |-------|-------|----------| | Tiny | Fastest | Basic | | Base | Fast | Good | | Small | Moderate | Better | | Medium | Slower | Best |

TTS (Text-to-Speech)

Qwen3-TTS running locally on port 5123 via FastAPI + CUDA. Voice selection, speed control, and emotion-aware speech synthesis.

Voice Commands

48 built-in voice commands for hands-free workspace control:

Navigation — "open terminal", "switch to editor", "go to settings"
Workspace — "zen mode", "canvas mode", "new panel"
Actions — "run build", "commit changes", "search files"

Companion Mode

A full conversational AI that listens continuously and responds with voice. The conversation engine manages idle/listening/processing/speaking states with context awareness across the session.

Configuration

Voice settings are accessible in Settings > Voice:

Hotkey — key combination to start/stop recording
Transcription model — Whisper model size
LLM refinement — toggle on/off, select provider and model
TTS voice — select voice and speed
Wake word — enable/disable hands-free activation

Troubleshooting

The hotkey doesn't trigger recording. Another application may have claimed the same key combination. Assign a different hotkey in Settings.

Transcription is slow. Try a smaller Whisper model (tiny or base) in Settings for faster results.

TTS sounds robotic. Ensure the TTS server is running (port 5123). It starts on demand when voice features are first used.