What is TangleML and how does it work?

TangleML is an open-source platform for building and running machine learning pipelines using a visual drag-and-drop editor. You drag components onto a canvas, connect their inputs and outputs, and run the pipeline. Components are self-contained CLI programs that can be written in any language. TangleML handles orchestration, caching, and execution automatically.

Do I need to register or install anything to use TangleML?

No registration is needed to build pipelines. You can start immediately in the browser-based editor at the TangleML playground. To actually run pipelines, you can either run TangleML locally with Docker and uv, or use the Hugging Face hosted version.

What makes TangleML different from other pipeline tools?

TangleML uses content-based caching, which means only changed components re-execute. Components are CLI-based and language-agnostic, so you can mix Python, Java, Shell, and other languages in the same pipeline. The visual editor is also directly connected to the code; you can jump between the canvas and the component YAML whenever needed.

Is TangleML free and open source?

Yes. TangleML is fully open source. The backend and frontend code are available on GitHub under the tangleml organization. You can run it locally, on any cloud provider, or use the hosted playground. There is no vendor lock-in.

Build ML Pipelines with TangleML: A Drag-and-Drop Guide

⬅️ Back to Tutorials

🔧 What Is TangleML?

TangleML is a free, open-source system for building machine learning pipelines visually. Think of it as a node-based editor where each node is a reusable piece of code. You drag components onto a canvas, wire them together, and run the entire workflow.

The key idea: components are self-contained CLI programs. They can be written in Python, Java, Shell, or any language. TangleML orchestrates them in containers, caches results intelligently, and tracks every run.

The UI frontend is at tangleml/tangle-ui.

🚀 Why TangleML?

No registration to build: Open the editor and start dragging components immediately
Visual + code: Edit on the canvas or jump into the component YAML anytime
Content-based caching: Only changed steps re-run, saving hours on large pipelines
Language agnostic: Mix Python, Java, Shell, and more in one pipeline
Reproducible by design: Every run is recorded with logs, artifacts, and metadata
No vendor lock-in: Run locally, on any cloud, or use the hosted version

🏁 Quick Start: Your First Pipeline

Time needed: ~10 minutes.

Step 1: Open the Playground

Go to the TangleML Playground.

No account needed to build. You only need to log in when you want to submit a run.

Step 2: Find the Standard Library

On the left panel, click Standard Library Components. Navigate to the Quick Start folder. These are pre-built components for common ML tasks.

Step 3: Build the Pipeline

Add a data source
- Drag Chicago Taxi Trips Dataset onto the canvas
- This component fetches open data using a simple cURL command
Add a training component
- Drag Train XGBoost Model on CSV onto the canvas
Add a prediction component
- Drag XGBoost Predict on CSV onto the canvas

You now have three unconnected nodes on the canvas.

Step 4: Connect the Nodes

Click and drag from the output port of the dataset component to the input port of the training component
Connect the training component’s output to the prediction component’s input

TangleML will validate the connections and highlight any type mismatches.

Step 5: Run It

Click Submit (requires login for execution)
Switch to the Pipeline Run view
Watch each step turn green as it completes
Inspect logs, outputs, and artifacts for every task

If a step fails, click it to see the exact error log. Fix the component or its arguments, then resubmit. Because of caching, unchanged upstream steps will skip execution on the next run.

🧠 Core Concepts

Components

A component is a self-contained unit defined by a YAML file. It specifies inputs, outputs, and how to run the code (usually inside a Docker container). Components are reusable across pipelines and shareable between teams.

Tasks and Executions

When you connect components into a pipeline, you create a task graph. Each node becomes a task. When you submit, TangleML creates an execution that runs each task in the right order, passing data between them automatically.

Caching

TangleML hashes the container specification and input data for each task. If an identical task was run before, the result is reused. Even if you are running multiple pipelines in parallel, TangleML can reuse still-running executions. This saves significant time and compute cost.

🛠️ Running Locally (Optional)

If you prefer to self-host:

Install Docker and uv

Clone the repositories:

git clone https://github.com/tangleml/tangle.git tangle/backend --branch stable
git clone https://github.com/tangleml/tangle-ui.git tangle/frontend_build --branch gh_pages_stable --single-branch --depth 1

Start the app:
```
cd tangle && backend/start_local.sh
```
Open localhost:8000

Google Cloud Shell is another free option (50 hours per week). Follow the same clone and start steps inside Cloud Shell, then proxy port 8000.

📝 Tips

Start with the Quick Start components; they are fully documented
Click any component’s info dialog and check the Implementation tab to see the underlying YAML
Run the pipeline after adding each component to verify it works incrementally
Use the clone-run feature to reproduce exact results later

That is it. TangleML turns pipeline building from file editing into visual assembly. You focus on the logic; TangleML handles the orchestration.

Crepi il lupo! 🐺