ctrlSPEAK: Voice-to-Text with a Triple-Tap

Tired of typing long emails, notes, or code comments? ctrlSPEAK (github.com/patelnav/ctrlspeak) is your set-it-and-forget-it speech-to-text companion for macOS. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks - effortlessly copied and pasted. It’s lightweight, low-overhead, and stays out of your way until you call it.

Key Features

🎙️ Triple-Tap Magic

Start and stop recording with a quick Ctrl triple-tap. No complicated shortcuts to remember:

Triple-tap Ctrl → Recording starts (audio cue plays)
Speak naturally → Your voice is captured
Triple-tap Ctrl again → Recording stops, text appears at cursor

📋 Auto-Paste Anywhere

Text lands exactly where you need it - no extra clicks, no manual copying. Works in any app where your cursor can blink: email clients, code editors, note-taking apps, chat windows.

🍎 Apple Silicon Optimized

Harnesses Apple Silicon’s MPS (Metal Performance Shaders) for blazing performance. The default Parakeet 0.6B MLX model is specifically optimized for M1/M2/M3 Macs with load times under 1 second and transcription in half a second.

🌟 Top-Tier Speech Models

Choose from multiple open-source recognition models:

Parakeet 0.6B (MLX) - Default, optimized for Apple Silicon, best speed/accuracy balance
NVIDIA Canary - Multilingual (En, De, Fr, Es) with excellent punctuation
OpenAI Whisper - Fast, accurate, with superior punctuation and capitalization
Nemotron (Experimental) - Real-time streaming transcription as you speak

📜 Transcription History Browser

Access your past transcriptions anytime:

Press r in the UI to browse history
Copy any previous transcription with Enter or c
Search through past recordings
Data stored locally in ~/.ctrlspeak/history.db

🔊 Audio Feedback

Hear when recording begins and ends - no need to check the screen. Uses pleasant “Notification Pluck” sounds from Pixabay.

Requirements

⚠️ macOS 12.3+ required - Monterey or later ⚠️ Apple Silicon Mac recommended - For MLX acceleration (works on Intel with CUDA models) ⚠️ Python 3.10+ - For manual installation

Required Permissions:

🎤 Microphone - For recording your voice
⌨️ Accessibility - For global keyboard shortcuts

Grant these on first launch and you’re good to go!

Get Started

Option 1: Homebrew (Recommended)

# Add the tap
brew tap patelnav/ctrlspeak

# Basic installation (MLX models only)
brew install ctrlspeak

# Full installation with all model support (recommended)
brew install ctrlspeak --with-nvidia --with-whisper

Installation options explained:

--with-nvidia - Enables NVIDIA Parakeet and Canary models (best performance)
--with-whisper - Enables OpenAI Whisper models

Option 2: Manual Installation

# Clone the repository
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install core dependencies
pip install -r requirements.txt

# Optional: NVIDIA model support
pip install -r requirements-nvidia.txt

# Optional: Whisper model support
pip install -r requirements-whisper.txt

Usage

Start ctrlSPEAK

# If installed with Homebrew
ctrlspeak

# If installed manually
python ctrlspeak.py

Record Your Voice

Triple-tap Ctrl to start recording
Speak clearly into your microphone
Triple-tap Ctrl again to stop
Text appears automatically at your cursor position

UI Controls

Once running, use these shortcuts in the terminal:

Key	Action
`r`	View transcription history
`m`	Switch speech recognition models
`d`	Change audio input device
`l`	View logs
`h`	Show help
`q`	Quit

Model Selection

Switch models with the --model flag using short aliases:

# Default - Parakeet optimized for Apple Silicon
ctrlspeak --model parakeet

# Multilingual with punctuation (NVIDIA)
ctrlspeak --model canary

# Smaller Canary model
ctrlspeak --model canary-180m

# OpenAI Whisper
ctrlspeak --model whisper

# Real-time streaming (experimental)
ctrlspeak --model nemotron

View all available models:

ctrlspeak --list-models

Command Line Options

ctrlspeak [OPTIONS]

Options:
  --model MODEL           Select speech recognition model (default: parakeet)
  --list-models           Show all available models
  --no-history            Disable transcription history saving
  --history-db PATH       Custom path for history database
  --source-lang LANG      Source language code (default: en)
  --target-lang LANG      Target language code (default: en)
  --debug                 Enable debug logging
  --check-only            Verify configuration without running
  --check-compatibility   Check system compatibility

Examples:
  ctrlspeak                                    # Run with defaults
  ctrlspeak --model whisper                    # Use Whisper model
  ctrlspeak --no-history                       # Disable history
  ctrlspeak --history-db ~/backup/history.db  # Custom DB location

Privacy & Data

Local processing - All transcription happens on your device
Local storage - History saved to ~/.ctrlspeak/history.db with user-only permissions
No cloud - No data sent to external servers
Full control - Disable history with --no-history or use custom database location

Performance

Tested on MacBook Pro (M2 Max) with 7-second audio file:

Model	Framework	Load Time	Transcription Time
parakeet-tdt-0.6b-v3	MLX (Apple Silicon)	0.97s	0.53s
canary-1b-flash	NeMo (NVIDIA)	32.06s	3.20s
whisper-large-v3	Transformers	5.44s	2.53s

Recommendation: For most users on Apple Silicon, the default Parakeet MLX model provides the best balance of speed and accuracy.

Troubleshooting

No sound on recording start/stop - Check your system volume isn’t muted
Keyboard shortcuts not working - Grant accessibility permissions in System Settings
Transcription errors - Try speaking more clearly or switch models
“No module named ’nemo’” errors - Reinstall with brew reinstall ctrlspeak --with-nvidia

🔗 Website: github.com/patelnav/ctrlspeak

🔗 GitHub: github.com/patelnav/ctrlspeak

Why This Tool Rocks

Zero Friction: Triple-tap Ctrl and speak - no window switching, no manual copying
Blazing Fast: Sub-second transcription on Apple Silicon with MLX acceleration
Works Everywhere: Text appears wherever your cursor is - any app, any field
Multiple Models: Choose the best engine for your needs (speed vs. accuracy vs. multilingual)
Privacy First: All processing local, no cloud dependency, full control over your data
History Built-In: Review, search, and reuse past transcriptions anytime
Open Source: MIT licensed, free forever, community contributions welcome
Audio Feedback: Know exactly when you’re recording without looking at the screen

Crepi il lupo! 🐺