Lite SuiteLite Suite

Voice Pipeline

Built-in voice system with STT, TTS, wake detection, 48 voice commands, and AI companion mode

Overview

Lite Suite has a built-in voice pipeline — press a hotkey to dictate text, navigate the workspace with voice commands, or have a full conversation with an AI companion. Everything runs locally using Whisper for transcription and Qwen3-TTS for speech synthesis.

Features

  • Global dictation — press a hotkey, speak, get text pasted into the active panel
  • LLM refinement — optional pass to clean up grammar and punctuation before pasting
  • 48 voice commands — navigate panels, control the workspace, trigger actions hands-free
  • Wake detection — hands-free activation without pressing a hotkey
  • AI companion mode — full voice conversation with an AI that remembers context
  • Emotion classification — the system detects emotional tone and adjusts TTS expression
  • 11 visualizer modes — audio-reactive visualizations during voice interaction

STT (Speech-to-Text)

Local transcription via Whisper (runs on port 8080, started on demand). Multiple model sizes available — smaller models are faster, larger models are more accurate.

| Model | Speed | Accuracy | |-------|-------|----------| | Tiny | Fastest | Basic | | Base | Fast | Good | | Small | Moderate | Better | | Medium | Slower | Best |

TTS (Text-to-Speech)

Qwen3-TTS running locally on port 5123 via FastAPI + CUDA. Voice selection, speed control, and emotion-aware speech synthesis.

Voice Commands

48 built-in voice commands for hands-free workspace control:

  • Navigation — "open terminal", "switch to editor", "go to settings"
  • Workspace — "zen mode", "canvas mode", "new panel"
  • Actions — "run build", "commit changes", "search files"

Companion Mode

A full conversational AI that listens continuously and responds with voice. The conversation engine manages idle/listening/processing/speaking states with context awareness across the session.

Configuration

Voice settings are accessible in Settings > Voice:

  • Hotkey — key combination to start/stop recording
  • Transcription model — Whisper model size
  • LLM refinement — toggle on/off, select provider and model
  • TTS voice — select voice and speed
  • Wake word — enable/disable hands-free activation

Troubleshooting

The hotkey doesn't trigger recording. Another application may have claimed the same key combination. Assign a different hotkey in Settings.

Transcription is slow. Try a smaller Whisper model (tiny or base) in Settings for faster results.

TTS sounds robotic. Ensure the TTS server is running (port 5123). It starts on demand when voice features are first used.