ctrlSPEAK: Voice-to-Text with a Triple-Tap

⬅️ Back to Tools

ctrlSPEAK: Voice-to-Text with a Triple-Tap

Tired of typing long emails, notes, or code comments? ctrlSPEAK (github.com/patelnav/ctrlspeak) is your set-it-and-forget-it speech-to-text companion for macOS. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks - effortlessly copied and pasted. It’s lightweight, low-overhead, and stays out of your way until you call it.

Key Features

🎙️ Triple-Tap Magic

Start and stop recording with a quick Ctrl triple-tap. No complicated shortcuts to remember:

  • Triple-tap Ctrl → Recording starts (audio cue plays)
  • Speak naturally → Your voice is captured
  • Triple-tap Ctrl again → Recording stops, text appears at cursor

📋 Auto-Paste Anywhere

Text lands exactly where you need it - no extra clicks, no manual copying. Works in any app where your cursor can blink: email clients, code editors, note-taking apps, chat windows.

🍎 Apple Silicon Optimized

Harnesses Apple Silicon’s MPS (Metal Performance Shaders) for blazing performance. The default Parakeet 0.6B MLX model is specifically optimized for M1/M2/M3 Macs with load times under 1 second and transcription in half a second.

🌟 Top-Tier Speech Models

Choose from multiple open-source recognition models:

  • Parakeet 0.6B (MLX) - Default, optimized for Apple Silicon, best speed/accuracy balance
  • NVIDIA Canary - Multilingual (En, De, Fr, Es) with excellent punctuation
  • OpenAI Whisper - Fast, accurate, with superior punctuation and capitalization
  • Nemotron (Experimental) - Real-time streaming transcription as you speak

📜 Transcription History Browser

Access your past transcriptions anytime:

  • Press r in the UI to browse history
  • Copy any previous transcription with Enter or c
  • Search through past recordings
  • Data stored locally in ~/.ctrlspeak/history.db

🔊 Audio Feedback

Hear when recording begins and ends - no need to check the screen. Uses pleasant “Notification Pluck” sounds from Pixabay.

Requirements

⚠️ macOS 12.3+ required - Monterey or later ⚠️ Apple Silicon Mac recommended - For MLX acceleration (works on Intel with CUDA models) ⚠️ Python 3.10+ - For manual installation

Required Permissions:

  • 🎤 Microphone - For recording your voice
  • ⌨️ Accessibility - For global keyboard shortcuts

Grant these on first launch and you’re good to go!

Get Started

Option 1: Homebrew (Recommended)

# Add the tap
brew tap patelnav/ctrlspeak

# Basic installation (MLX models only)
brew install ctrlspeak

# Full installation with all model support (recommended)
brew install ctrlspeak --with-nvidia --with-whisper

Installation options explained:

  • --with-nvidia - Enables NVIDIA Parakeet and Canary models (best performance)
  • --with-whisper - Enables OpenAI Whisper models

Option 2: Manual Installation

# Clone the repository
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install core dependencies
pip install -r requirements.txt

# Optional: NVIDIA model support
pip install -r requirements-nvidia.txt

# Optional: Whisper model support
pip install -r requirements-whisper.txt

Usage

Start ctrlSPEAK

# If installed with Homebrew
ctrlspeak

# If installed manually
python ctrlspeak.py

Record Your Voice

  1. Triple-tap Ctrl to start recording
  2. Speak clearly into your microphone
  3. Triple-tap Ctrl again to stop
  4. Text appears automatically at your cursor position

UI Controls

Once running, use these shortcuts in the terminal:

KeyAction
rView transcription history
mSwitch speech recognition models
dChange audio input device
lView logs
hShow help
qQuit

Model Selection

Switch models with the --model flag using short aliases:

# Default - Parakeet optimized for Apple Silicon
ctrlspeak --model parakeet

# Multilingual with punctuation (NVIDIA)
ctrlspeak --model canary

# Smaller Canary model
ctrlspeak --model canary-180m

# OpenAI Whisper
ctrlspeak --model whisper

# Real-time streaming (experimental)
ctrlspeak --model nemotron

View all available models:

ctrlspeak --list-models

Command Line Options

ctrlspeak [OPTIONS]

Options:
  --model MODEL           Select speech recognition model (default: parakeet)
  --list-models           Show all available models
  --no-history            Disable transcription history saving
  --history-db PATH       Custom path for history database
  --source-lang LANG      Source language code (default: en)
  --target-lang LANG      Target language code (default: en)
  --debug                 Enable debug logging
  --check-only            Verify configuration without running
  --check-compatibility   Check system compatibility

Examples:
  ctrlspeak                                    # Run with defaults
  ctrlspeak --model whisper                    # Use Whisper model
  ctrlspeak --no-history                       # Disable history
  ctrlspeak --history-db ~/backup/history.db  # Custom DB location

Privacy & Data

  • Local processing - All transcription happens on your device
  • Local storage - History saved to ~/.ctrlspeak/history.db with user-only permissions
  • No cloud - No data sent to external servers
  • Full control - Disable history with --no-history or use custom database location

Performance

Tested on MacBook Pro (M2 Max) with 7-second audio file:

ModelFrameworkLoad TimeTranscription Time
parakeet-tdt-0.6b-v3MLX (Apple Silicon)0.97s0.53s
canary-1b-flashNeMo (NVIDIA)32.06s3.20s
whisper-large-v3Transformers5.44s2.53s

Recommendation: For most users on Apple Silicon, the default Parakeet MLX model provides the best balance of speed and accuracy.

Troubleshooting

  • No sound on recording start/stop - Check your system volume isn’t muted
  • Keyboard shortcuts not working - Grant accessibility permissions in System Settings
  • Transcription errors - Try speaking more clearly or switch models
  • “No module named ’nemo’” errors - Reinstall with brew reinstall ctrlspeak --with-nvidia

🔗 Website: github.com/patelnav/ctrlspeak

🔗 GitHub: github.com/patelnav/ctrlspeak

Why This Tool Rocks

  • Zero Friction: Triple-tap Ctrl and speak - no window switching, no manual copying
  • Blazing Fast: Sub-second transcription on Apple Silicon with MLX acceleration
  • Works Everywhere: Text appears wherever your cursor is - any app, any field
  • Multiple Models: Choose the best engine for your needs (speed vs. accuracy vs. multilingual)
  • Privacy First: All processing local, no cloud dependency, full control over your data
  • History Built-In: Review, search, and reuse past transcriptions anytime
  • Open Source: MIT licensed, free forever, community contributions welcome
  • Audio Feedback: Know exactly when you’re recording without looking at the screen

Crepi il lupo! 🐺