Docs/Voice Mode

Voice Mode

Talk to your AI assistant naturally. Voice input, voice output, hands-free mode.

How Voice Mode Works

Voice mode in Pinchr uses a multi-step pipeline to process your speech:

1
Record Audio
Press and hold Space or click the mic button. Pinchr captures your voice locally.
2
Transcribe (Whisper)
Your audio is sent to OpenAI's Whisper API for transcription. The text appears in the chat.
3
AI Response
Your agent processes the transcribed text like a normal chat message and generates a response.
4
Text-to-Speech
The response is converted to audio using your selected TTS provider and played back.

The entire pipeline takes 2-5 seconds depending on response length. Audio stays local — only text is sent to AI providers.

Activating Voice Mode

There are three ways to use voice in Pinchr:

Push-to-Talk

Hold Space while talking. Release to send.

Default mode
Click-to-Talk

Click the mic button 🎤 in the chat input to start recording.

Great for touch screens
Hands-Free

Toggle always-on listening. Say "Hey Pinchr" to activate.

Requires wake word

Configure voice settings in Settings → Voice.

Setting Up Voice

Voice mode requires API keys for transcription (Whisper) and text-to-speech:

🎙️
Speech-to-Text (Whisper)

OpenAI's Whisper API transcribes your voice. Extremely accurate across 50+ languages.

Get API key →
🔊
Text-to-Speech (TTS)

Choose from OpenAI TTS, ElevenLabs, or macOS built-in voices. Configure in Settings.

  1. 1.Go to Settings → Voice
  2. 2.Add your OpenAI API key for Whisper transcription
  3. 3.Choose a TTS provider and configure your preferred voice
  4. 4.Test with the Test Voice button

Text-to-Speech Providers

Pinchr supports multiple TTS providers with different voices and quality levels:

OpenAI TTS
Recommended

Natural-sounding voices with low latency. Six voice options: Alloy, Echo, Fable, Onyx, Nova, Shimmer.

Cost: ~$0.015/1K charsLatency: ~1-2s
ElevenLabs
High Quality

Highest quality voices with emotion and tone control. Clone your own voice or use presets.

Cost: ~$0.30/1K charsLatency: ~2-4s
macOS System Voices
Free

Built-in macOS voices. No API key required, but lower quality than cloud options.

Cost: FreeLatency: <1s

Choosing a Voice

Once you've selected a TTS provider, pick a voice personality:

OpenAI Voices
Alloy — Neutral, balanced tone
Echo — Warm, friendly
Fable — Calm, soothing
Onyx — Deep, authoritative
Nova — Energetic, upbeat
Shimmer — Bright, cheerful
Preview each voice with the Test Voice button in Settings.

For ElevenLabs, you can browse their voice library or clone your own voice for a truly personalized assistant.

Hands-Free Mode

Enable always-on listening for a truly hands-free experience:

  1. 1.Go to Settings → Voice → Hands-Free Mode
  2. 2.Toggle Always Listening
  3. 3.Choose a wake word: "Hey Pinchr" or "Assistant"
  4. 4.Pinchr listens in the background. Say your wake word to activate.
💡

Privacy: Hands-free mode processes audio locally for wake word detection. Audio is only sent to Whisper after the wake word is detected.

Language Support

Whisper supports transcription in 50+ languages, including:

• English
• Spanish
• French
• German
• Italian
• Portuguese
• Chinese
• Japanese
• Korean
• Russian
• Arabic
• Hindi

Whisper auto-detects your language — no configuration needed. TTS voice selection depends on your provider (OpenAI and ElevenLabs support most major languages).

Tips & Best Practices

  • Use a good microphone: Built-in Mac mics work, but external USB mics improve accuracy
  • Speak naturally: Whisper handles accents, pauses, and filler words well
  • Reduce background noise: Quiet environments work best for transcription
  • Adjust playback speed: In Settings, change TTS speed from 0.5x to 2x
  • Interrupt anytime: Press Esc to stop playback

Voice + Computer Use

Combine voice with computer use for truly hands-free automation:

You (voice):
"Open Safari and check my Gmail inbox."
Agent (spoken):
Opening Safari and navigating to Gmail...
→ Agent clicks Safari icon, types gmail.com, signs in
Agent (spoken):
You have 3 new emails. The first is from Sarah about the Q1 report.

Your agent narrates what it's doing on screen, keeping you in the loop without needing to look.

Need help with voice mode?

Join our community or reach out — we're here to help.