Build ML Pipelines with TangleML: A Drag-and-Drop Guide
🔧 What Is TangleML?
TangleML is a free, open-source system for building machine learning pipelines visually. Think of it as a node-based editor where each node is a reusable piece of code. You drag components onto a canvas, wire them together, and run the entire workflow.
The key idea: components are self-contained CLI programs. They can be written in Python, Java, Shell, or any language. TangleML orchestrates them in containers, caches results intelligently, and tracks every run.
The UI frontend is at tangleml/tangle-ui.
🚀 Why TangleML?
- No registration to build: Open the editor and start dragging components immediately
- Visual + code: Edit on the canvas or jump into the component YAML anytime
- Content-based caching: Only changed steps re-run, saving hours on large pipelines
- Language agnostic: Mix Python, Java, Shell, and more in one pipeline
- Reproducible by design: Every run is recorded with logs, artifacts, and metadata
- No vendor lock-in: Run locally, on any cloud, or use the hosted version
🏁 Quick Start: Your First Pipeline
Time needed: ~10 minutes.
Step 1: Open the Playground
Go to the TangleML Playground.
No account needed to build. You only need to log in when you want to submit a run.
Step 2: Find the Standard Library
On the left panel, click Standard Library Components. Navigate to the Quick Start folder. These are pre-built components for common ML tasks.
Step 3: Build the Pipeline
Add a data source
- Drag Chicago Taxi Trips Dataset onto the canvas
- This component fetches open data using a simple cURL command
Add a training component
- Drag Train XGBoost Model on CSV onto the canvas
Add a prediction component
- Drag XGBoost Predict on CSV onto the canvas
You now have three unconnected nodes on the canvas.
Step 4: Connect the Nodes
- Click and drag from the output port of the dataset component to the input port of the training component
- Connect the training component’s output to the prediction component’s input
TangleML will validate the connections and highlight any type mismatches.
Step 5: Run It
- Click Submit (requires login for execution)
- Switch to the Pipeline Run view
- Watch each step turn green as it completes
- Inspect logs, outputs, and artifacts for every task
If a step fails, click it to see the exact error log. Fix the component or its arguments, then resubmit. Because of caching, unchanged upstream steps will skip execution on the next run.
🧠 Core Concepts
Components
A component is a self-contained unit defined by a YAML file. It specifies inputs, outputs, and how to run the code (usually inside a Docker container). Components are reusable across pipelines and shareable between teams.
Tasks and Executions
When you connect components into a pipeline, you create a task graph. Each node becomes a task. When you submit, TangleML creates an execution that runs each task in the right order, passing data between them automatically.
Caching
TangleML hashes the container specification and input data for each task. If an identical task was run before, the result is reused. Even if you are running multiple pipelines in parallel, TangleML can reuse still-running executions. This saves significant time and compute cost.
🛠️ Running Locally (Optional)
If you prefer to self-host:
Clone the repositories:
git clone https://github.com/tangleml/tangle.git tangle/backend --branch stable git clone https://github.com/tangleml/tangle-ui.git tangle/frontend_build --branch gh_pages_stable --single-branch --depth 1Start the app:
cd tangle && backend/start_local.shOpen localhost:8000
Google Cloud Shell is another free option (50 hours per week). Follow the same clone and start steps inside Cloud Shell, then proxy port 8000.
📝 Tips
- Start with the Quick Start components; they are fully documented
- Click any component’s info dialog and check the Implementation tab to see the underlying YAML
- Run the pipeline after adding each component to verify it works incrementally
- Use the clone-run feature to reproduce exact results later
That is it. TangleML turns pipeline building from file editing into visual assembly. You focus on the logic; TangleML handles the orchestration.
Crepi il lupo! 🐺