Talk to your AI assistant naturally. Voice input, voice output, hands-free mode.
Voice mode in Pinchr uses a multi-step pipeline to process your speech:
The entire pipeline takes 2-5 seconds depending on response length. Audio stays local — only text is sent to AI providers.
There are three ways to use voice in Pinchr:
Hold Space while talking. Release to send.
Click the mic button 🎤 in the chat input to start recording.
Toggle always-on listening. Say "Hey Pinchr" to activate.
Configure voice settings in Settings → Voice.
Voice mode requires API keys for transcription (Whisper) and text-to-speech:
OpenAI's Whisper API transcribes your voice. Extremely accurate across 50+ languages.
Get API key →Choose from OpenAI TTS, ElevenLabs, or macOS built-in voices. Configure in Settings.
Pinchr supports multiple TTS providers with different voices and quality levels:
Natural-sounding voices with low latency. Six voice options: Alloy, Echo, Fable, Onyx, Nova, Shimmer.
Highest quality voices with emotion and tone control. Clone your own voice or use presets.
Built-in macOS voices. No API key required, but lower quality than cloud options.
Once you've selected a TTS provider, pick a voice personality:
For ElevenLabs, you can browse their voice library or clone your own voice for a truly personalized assistant.
Enable always-on listening for a truly hands-free experience:
Privacy: Hands-free mode processes audio locally for wake word detection. Audio is only sent to Whisper after the wake word is detected.
Whisper supports transcription in 50+ languages, including:
Whisper auto-detects your language — no configuration needed. TTS voice selection depends on your provider (OpenAI and ElevenLabs support most major languages).
Combine voice with computer use for truly hands-free automation:
Your agent narrates what it's doing on screen, keeping you in the loop without needing to look.