hfviewer: Visualize Any Hugging Face Model

hfviewer is a web tool that turns any Hugging Face model into an interactive architecture graph. Paste a URL or repo name, and it renders the model structure in your browser; how the layers connect, where embeddings enter, how attention repeats, how experts route.

Built by embedl, the same people behind embedl deploy and embedl hub.

How It Works

Hugging Face already has model cards, Spaces, and benchmarks. What it lacked was a fast way to see how a model is put together. hfviewer fills that gap.

The core interaction: paste a Hugging Face URL, open a visual map in the browser, and zoom from the broad system shape into the specific substructure that matters for understanding deployment, latency, and correctness.

Key Features

URL Magic

The simplest way to use it: replace huggingface.co with hfviewer.com in the URL.

huggingface.co/gpt2 → hfviewer.com/gpt2

No bookmarklet, no extension, no paste step. Just change the domain.

Granularity Levels

Switch between overview and detail. See the high-level architecture down to traced blocks and paths. Each granularity level reveals more structure without overwhelming the view.

Model Family Comparison

The Gemma 4 family page is the best example. It shows E2B, E4B, 26B-A4B, and 31B side by side with synchronized pan, zoom, and granularity controls. You can compare how the edge models differ from the dense model and the MoE variant in one view.

Embed in Model Cards

Press the Embed button to get the code and drop the graph directly into any Hugging Face model card. Makes documentation interactive without hosting anything yourself.

No Setup Required

No install, no export step, no config hunting. The server analyzes the model and creates the graph on first request. It can take a minute or two for complex models.

What It Shows

hfviewer visualizes the full model graph including:

Transformer blocks: how attention and feed-forward layers repeat
Embedding paths: where text, vision, and audio embeddings enter
Router gates: for MoE models, how tokens are routed to experts
Attention patterns: sliding window vs global attention, RoPE regimes
Multimodal merges: how vision and audio features merge into the language backbone

Sample Models

The site offers quick links to popular models to get a feel for the interaction:

Model	What to Look For
gpt2	Classic decoder-only transformer
google/vit-base-patch16-224	Vision backbone
openai/clip-vit-base-patch32	Dual encoder architecture
t5-small	Encoder-decoder with more structural depth
deepseek-ai/DeepSeek-V4-Pro	Sparse MoE reasoning model
Qwen/Qwen3.5-4B	Larger reasoning stack
Qwen/Qwen3.5-0.8B	Small instruction-tuned LLM
Qwen/Qwen3.6-27B	Hybrid linear/full attention vision-language
nvidia/parakeet-tdt-0.6b-v3	Streaming Conformer-TDT speech recognizer

Why It Matters

Model understanding is rarely linear. Sometimes you start from a paper or blog post and need to verify the architecture visually. Sometimes you see a node, a route, or a merge in the graph and want to check against the config. hfviewer makes that loop fast.

The Gemma 4 family page goes a step further, as it connects a technical blog article to the graph. Read about an architectural decision, jump into the corresponding part of the graph, move back into the article. Graph-to-text and text-to-graph in one browser tab.

That kind of interactive model documentation has been missing from the ML ecosystem, and hfviewer is a step toward fixing that.

🔗 Website: hfviewer.com

Crepi il lupo! 🐺