Defuddle: Extract Clean Content from Any Web Page

⬅️ Back to Tools

Extract Clean Content from Any Web Page

Defuddle https://github.com/kepano/defuddle extracts the main content from web pages, removing clutter like comments, sidebars, headers, footers, and other non-essential elements.

With 5.6k stars and 224 forks, defuddle is a popular open-source tool for content extraction, originally created for the Obsidian Web Clipper but designed to run in any environment.

How It Works

Defuddle takes a URL or HTML input, identifies the main content using heuristics, and returns cleaned HTML or Markdown. It was built as a replacement for Mozilla Readability with some key differences:

  • More forgiving, removes fewer uncertain elements
  • Provides consistent output for footnotes, math, and code blocks
  • Uses mobile styles to guess unnecessary elements
  • Extracts more metadata, including schema.org data

Usage

Browser:

import Defuddle from "defuddle";

const defuddle = new Defuddle(document);
const result = defuddle.parse();

console.log(result.content);
console.log(result.title);
console.log(result.author);

Node.js:

import { JSDOM } from "jsdom";
import { Defuddle } from "defuddle/node";

const dom = new JSDOM(html, { url: "https://example.com/article" });
const result = await Defuddle(
  dom.window.document,
  "https://example.com/article",
  {
    markdown: true,
  },
);

CLI:

npx defuddle parse https://example.com/article --markdown

Response Properties

Defuddle returns an object with:

  • content — Cleaned extracted content
  • title — Article title
  • author — Author name
  • description — Description or summary
  • domain — Domain name
  • image — Main article image
  • published — Publication date
  • wordCount — Total word count

Bundles

  1. Core (defuddle) — Main bundle for browser, no dependencies
  2. Full (defuddle/full) — Includes math equation parsing and Markdown conversion
  3. Node.js (defuddle/node) — For Node environments, accepts any DOM implementation

Install

npm install defuddle

🔗 GitHub: github.com/kepano/defuddle