ctrlSPEAK: Voice-to-Text with a Triple-Tap
ctrlSPEAK: Voice-to-Text with a Triple-Tap
Tired of typing long emails, notes, or code comments? ctrlSPEAK (github.com/patelnav/ctrlspeak) is your set-it-and-forget-it speech-to-text companion for macOS. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks - effortlessly copied and pasted. It’s lightweight, low-overhead, and stays out of your way until you call it.
Key Features
🎙️ Triple-Tap Magic
Start and stop recording with a quick Ctrl triple-tap. No complicated shortcuts to remember:
- Triple-tap Ctrl → Recording starts (audio cue plays)
- Speak naturally → Your voice is captured
- Triple-tap Ctrl again → Recording stops, text appears at cursor
📋 Auto-Paste Anywhere
Text lands exactly where you need it - no extra clicks, no manual copying. Works in any app where your cursor can blink: email clients, code editors, note-taking apps, chat windows.
🍎 Apple Silicon Optimized
Harnesses Apple Silicon’s MPS (Metal Performance Shaders) for blazing performance. The default Parakeet 0.6B MLX model is specifically optimized for M1/M2/M3 Macs with load times under 1 second and transcription in half a second.
🌟 Top-Tier Speech Models
Choose from multiple open-source recognition models:
- Parakeet 0.6B (MLX) - Default, optimized for Apple Silicon, best speed/accuracy balance
- NVIDIA Canary - Multilingual (En, De, Fr, Es) with excellent punctuation
- OpenAI Whisper - Fast, accurate, with superior punctuation and capitalization
- Nemotron (Experimental) - Real-time streaming transcription as you speak
📜 Transcription History Browser
Access your past transcriptions anytime:
- Press r in the UI to browse history
- Copy any previous transcription with Enter or c
- Search through past recordings
- Data stored locally in
~/.ctrlspeak/history.db
🔊 Audio Feedback
Hear when recording begins and ends - no need to check the screen. Uses pleasant “Notification Pluck” sounds from Pixabay.
Requirements
⚠️ macOS 12.3+ required - Monterey or later ⚠️ Apple Silicon Mac recommended - For MLX acceleration (works on Intel with CUDA models) ⚠️ Python 3.10+ - For manual installation
Required Permissions:
- 🎤 Microphone - For recording your voice
- ⌨️ Accessibility - For global keyboard shortcuts
Grant these on first launch and you’re good to go!
Get Started
Option 1: Homebrew (Recommended)
# Add the tap
brew tap patelnav/ctrlspeak
# Basic installation (MLX models only)
brew install ctrlspeak
# Full installation with all model support (recommended)
brew install ctrlspeak --with-nvidia --with-whisperInstallation options explained:
--with-nvidia- Enables NVIDIA Parakeet and Canary models (best performance)--with-whisper- Enables OpenAI Whisper models
Option 2: Manual Installation
# Clone the repository
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Install core dependencies
pip install -r requirements.txt
# Optional: NVIDIA model support
pip install -r requirements-nvidia.txt
# Optional: Whisper model support
pip install -r requirements-whisper.txtUsage
Start ctrlSPEAK
# If installed with Homebrew
ctrlspeak
# If installed manually
python ctrlspeak.pyRecord Your Voice
- Triple-tap Ctrl to start recording
- Speak clearly into your microphone
- Triple-tap Ctrl again to stop
- Text appears automatically at your cursor position
UI Controls
Once running, use these shortcuts in the terminal:
| Key | Action |
|---|---|
r | View transcription history |
m | Switch speech recognition models |
d | Change audio input device |
l | View logs |
h | Show help |
q | Quit |
Model Selection
Switch models with the --model flag using short aliases:
# Default - Parakeet optimized for Apple Silicon
ctrlspeak --model parakeet
# Multilingual with punctuation (NVIDIA)
ctrlspeak --model canary
# Smaller Canary model
ctrlspeak --model canary-180m
# OpenAI Whisper
ctrlspeak --model whisper
# Real-time streaming (experimental)
ctrlspeak --model nemotronView all available models:
ctrlspeak --list-modelsCommand Line Options
ctrlspeak [OPTIONS]
Options:
--model MODEL Select speech recognition model (default: parakeet)
--list-models Show all available models
--no-history Disable transcription history saving
--history-db PATH Custom path for history database
--source-lang LANG Source language code (default: en)
--target-lang LANG Target language code (default: en)
--debug Enable debug logging
--check-only Verify configuration without running
--check-compatibility Check system compatibility
Examples:
ctrlspeak # Run with defaults
ctrlspeak --model whisper # Use Whisper model
ctrlspeak --no-history # Disable history
ctrlspeak --history-db ~/backup/history.db # Custom DB locationPrivacy & Data
- Local processing - All transcription happens on your device
- Local storage - History saved to
~/.ctrlspeak/history.dbwith user-only permissions - No cloud - No data sent to external servers
- Full control - Disable history with
--no-historyor use custom database location
Performance
Tested on MacBook Pro (M2 Max) with 7-second audio file:
| Model | Framework | Load Time | Transcription Time |
|---|---|---|---|
| parakeet-tdt-0.6b-v3 | MLX (Apple Silicon) | 0.97s | 0.53s |
| canary-1b-flash | NeMo (NVIDIA) | 32.06s | 3.20s |
| whisper-large-v3 | Transformers | 5.44s | 2.53s |
Recommendation: For most users on Apple Silicon, the default Parakeet MLX model provides the best balance of speed and accuracy.
Troubleshooting
- No sound on recording start/stop - Check your system volume isn’t muted
- Keyboard shortcuts not working - Grant accessibility permissions in System Settings
- Transcription errors - Try speaking more clearly or switch models
- “No module named ’nemo’” errors - Reinstall with
brew reinstall ctrlspeak --with-nvidia
🔗 Website: github.com/patelnav/ctrlspeak
🔗 GitHub: github.com/patelnav/ctrlspeak
Why This Tool Rocks
- Zero Friction: Triple-tap Ctrl and speak - no window switching, no manual copying
- Blazing Fast: Sub-second transcription on Apple Silicon with MLX acceleration
- Works Everywhere: Text appears wherever your cursor is - any app, any field
- Multiple Models: Choose the best engine for your needs (speed vs. accuracy vs. multilingual)
- Privacy First: All processing local, no cloud dependency, full control over your data
- History Built-In: Review, search, and reuse past transcriptions anytime
- Open Source: MIT licensed, free forever, community contributions welcome
- Audio Feedback: Know exactly when you’re recording without looking at the screen
Crepi il lupo! 🐺