Bumblebee: Perplexity's Supply-Chain Security Scanner for Developer Machines

⬅️ Back to Tools

Bumblebee: Developer Supply-Chain Scanner

Bumblebee https://github.com/perplexityai/bumblebee is a read-only scanner from Perplexity that inventories packages, extensions, and developer-tool metadata on macOS and Linux machines. When a supply-chain advisory names a compromised package or version, Bumblebee tells you which developer machines have it on disk.

SBOMs answer what shipped. EDR answers what ran. Supply-chain response often needs a third view: messy local state across lockfiles, package-manager metadata, extension manifests, and MCP configs. Bumblebee turns that scattered state into structured NDJSON records and, given an exposure catalog, flags exact matches.

Key Features

  • Zero Dependencies - Single static Go binary, no runtime deps beyond Go 1.25
  • Read-Only Design - Scans on-disk metadata only, never executes package managers
  • Three Scan Profiles - baseline (global), project (workspaces), deep (incident response with exposure catalogs)
  • Broad Coverage - npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer, MCP configs, editor extensions, browser extensions
  • Exposure Catalogs - Compare inventory against known-compromised package lists
  • NDJSON Output - Structured records per package and per finding, pipeable into SIEM pipelines
  • Content-Addressed IDs - Stable record IDs across runs for deduplication

What It Scans

Bumblebee reads lockfiles and metadata files. It does not run npm ls, pip show, or go list. It covers:

EcosystemSources
npmpackage-lock.json, pnpm-lock.yaml, yarn.lock, bun.lock + node_modules/**/package.json
PyPI*.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO
Gogo.sum, go.mod
RubyGemsGemfile.lock, installed *.gemspec
Composercomposer.lock, vendor/composer/installed.json
MCPmcp.json, claude_desktop_config.json, cline_mcp_settings.json, Gemini CLI settings
Editor ExtensionsVS Code, Cursor, Windsurf, VSCodium manifests
Browser ExtensionsChromium manifest.json, Firefox extensions.json per profile

Get Started

Installation

Requires Go 1.25+ on macOS or Linux:

# Install latest release
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest

# Or pin a specific version
go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1

If bumblebee is not found after install, ensure $GOBIN is in your $PATH:

export PATH=$PATH:$(go env GOPATH)/bin

Make it permanent:

echo 'export PATH=$PATH:$(go env GOPATH)/bin' >> ~/.zshrc

Smoke Test

Run the built-in self-test against embedded fixtures:

bumblebee selftest
# selftest OK (3 findings in 1ms)

The fixtures use deliberately fake package names and make no network calls. A non-zero exit means something is wrong with the install.

Usage

Baseline Scan (Global Inventory)

Scans common global package roots, language toolchains, editor/browser extensions, and MCP configs:

bumblebee scan --profile baseline > inventory.ndjson

This is the command to run on a schedule (cron, launchd, systemd).

Project Scan

Target specific development directories:

bumblebee scan --profile project \
  --root "$HOME/code" \
  --root "$HOME/Developer"

Deep Scan (Incident Response)

For on-demand checks against known compromises:

bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog ./threat_intel/ \
  --findings-only

Supply --exposure-catalog a JSON file or a directory of *.json catalog files. The --findings-only flag suppresses normal package records and shows only matches.

Filter by Ecosystem

Limit a run to specific package managers:

bumblebee scan --profile baseline --ecosystem npm,pypi
bumblebee scan --profile baseline --ecosystem go

Preview Scan Roots

See what directories Bumblebee will scan without actually scanning:

bumblebee roots --profile baseline

Threat Intel Catalogs

The repo ships with maintained exposure catalogs in threat_intel/. These are JSON files built from public threat-intelligence reporting on recent supply-chain campaigns. The format is straightforward:

{
  "schema_version": "0.1.0",
  "entries": [
    {
      "id": "advisory-2026-0042",
      "name": "example-pkg 1.2.3 (compromised release)",
      "ecosystem": "npm",
      "package": "example-pkg",
      "versions": ["1.2.3"],
      "severity": "critical"
    }
  ]
}

Point --exposure-catalog at the threat_intel/ directory and Bumblebee will match every inventoried package against every entry. Findings output includes the catalog ID, severity, and evidence.

Output Format

Records are NDJSON, one per line. Package records look like this:

{
  "record_type": "package",
  "ecosystem": "npm",
  "package_name": "@tanstack/query-core",
  "version": "5.59.20",
  "source_type": "pnpm-lockfile",
  "confidence": "high",
  "endpoint": {
    "hostname": "my-mbp",
    "os": "darwin",
    "arch": "arm64"
  }
}

Finding records add the exposure match details:

{
  "record_type": "finding",
  "finding_type": "package_exposure",
  "severity": "critical",
  "catalog_id": "advisory-2026-0042",
  "package_name": "example-pkg",
  "version": "1.2.3",
  "evidence": "exact name+version match"
}

What I Found Running It

I tested Bumblebee on a developer machine. Here is what the numbers looked like:

  • Self-test passed with 3 findings in 1ms
  • Baseline scan found 615 package records (354 npm, 139 PyPI, 122 Go)
  • Deep scan with threat intel catalogs across 50,999 files returned 0 findings
  • All 19 test packages pass

The zero findings part is good news. It means none of the scanned packages matched known-compromised versions in the current threat catalogs. Running this on your own machine once a month is a cheap way to keep that peace of mind.

Platforms

  • 🍎 macOS (Apple Silicon & Intel)
  • 🐧 Linux

🔗 GitHub: github.com/perplexityai/bumblebee

Why This Tool Rocks

  • Fills a gap between SBOM and EDR for supply-chain incident response
  • Reads messy local state that existing tools overlook (browser extensions, MCP configs)
  • Ships with actual threat intel catalogs, not just a framework
  • Stable record IDs mean you can deduplicate across scan runs
  • Single static binary, no runtime dependencies
  • Apache 2.0 licensed, written in Go with zero non-stdlib dependencies

Crepi il lupo! 🐺