hfviewer: Visualize Any Hugging Face Model
hfviewer: Visualize Any Hugging Face Model
hfviewer is a web tool that turns any Hugging Face model into an interactive architecture graph. Paste a URL or repo name, and it renders the model structure in your browser; how the layers connect, where embeddings enter, how attention repeats, how experts route.
Built by embedl, the same people behind embedl deploy and embedl hub.
How It Works
Hugging Face already has model cards, Spaces, and benchmarks. What it lacked was a fast way to see how a model is put together. hfviewer fills that gap.
The core interaction: paste a Hugging Face URL, open a visual map in the browser, and zoom from the broad system shape into the specific substructure that matters for understanding deployment, latency, and correctness.
Key Features
URL Magic
The simplest way to use it: replace huggingface.co with hfviewer.com in the URL.
huggingface.co/gpt2 → hfviewer.com/gpt2No bookmarklet, no extension, no paste step. Just change the domain.
Granularity Levels
Switch between overview and detail. See the high-level architecture down to traced blocks and paths. Each granularity level reveals more structure without overwhelming the view.
Model Family Comparison
The Gemma 4 family page is the best example. It shows E2B, E4B, 26B-A4B, and 31B side by side with synchronized pan, zoom, and granularity controls. You can compare how the edge models differ from the dense model and the MoE variant in one view.
Embed in Model Cards
Press the Embed button to get the code and drop the graph directly into any Hugging Face model card. Makes documentation interactive without hosting anything yourself.
No Setup Required
No install, no export step, no config hunting. The server analyzes the model and creates the graph on first request. It can take a minute or two for complex models.
What It Shows
hfviewer visualizes the full model graph including:
- Transformer blocks: how attention and feed-forward layers repeat
- Embedding paths: where text, vision, and audio embeddings enter
- Router gates: for MoE models, how tokens are routed to experts
- Attention patterns: sliding window vs global attention, RoPE regimes
- Multimodal merges: how vision and audio features merge into the language backbone
Sample Models
The site offers quick links to popular models to get a feel for the interaction:
| Model | What to Look For |
|---|---|
| gpt2 | Classic decoder-only transformer |
| google/vit-base-patch16-224 | Vision backbone |
| openai/clip-vit-base-patch32 | Dual encoder architecture |
| t5-small | Encoder-decoder with more structural depth |
| deepseek-ai/DeepSeek-V4-Pro | Sparse MoE reasoning model |
| Qwen/Qwen3.5-4B | Larger reasoning stack |
| Qwen/Qwen3.5-0.8B | Small instruction-tuned LLM |
| Qwen/Qwen3.6-27B | Hybrid linear/full attention vision-language |
| nvidia/parakeet-tdt-0.6b-v3 | Streaming Conformer-TDT speech recognizer |
Why It Matters
Model understanding is rarely linear. Sometimes you start from a paper or blog post and need to verify the architecture visually. Sometimes you see a node, a route, or a merge in the graph and want to check against the config. hfviewer makes that loop fast.
The Gemma 4 family page goes a step further, as it connects a technical blog article to the graph. Read about an architectural decision, jump into the corresponding part of the graph, move back into the article. Graph-to-text and text-to-graph in one browser tab.
That kind of interactive model documentation has been missing from the ML ecosystem, and hfviewer is a step toward fixing that.
🔗 Website: hfviewer.com
Crepi il lupo! 🐺