Build ML Pipelines with TangleML: A Drag-and-Drop Guide

⬅️ Back to Tutorials

🔧 What Is TangleML?

TangleML is a free, open-source system for building machine learning pipelines visually. Think of it as a node-based editor where each node is a reusable piece of code. You drag components onto a canvas, wire them together, and run the entire workflow.

The key idea: components are self-contained CLI programs. They can be written in Python, Java, Shell, or any language. TangleML orchestrates them in containers, caches results intelligently, and tracks every run.

The UI frontend is at tangleml/tangle-ui.

🚀 Why TangleML?

  • No registration to build: Open the editor and start dragging components immediately
  • Visual + code: Edit on the canvas or jump into the component YAML anytime
  • Content-based caching: Only changed steps re-run, saving hours on large pipelines
  • Language agnostic: Mix Python, Java, Shell, and more in one pipeline
  • Reproducible by design: Every run is recorded with logs, artifacts, and metadata
  • No vendor lock-in: Run locally, on any cloud, or use the hosted version

🏁 Quick Start: Your First Pipeline

Time needed: ~10 minutes.

Step 1: Open the Playground

Go to the TangleML Playground.

No account needed to build. You only need to log in when you want to submit a run.

Step 2: Find the Standard Library

On the left panel, click Standard Library Components. Navigate to the Quick Start folder. These are pre-built components for common ML tasks.

Step 3: Build the Pipeline

  1. Add a data source

    • Drag Chicago Taxi Trips Dataset onto the canvas
    • This component fetches open data using a simple cURL command
  2. Add a training component

    • Drag Train XGBoost Model on CSV onto the canvas
  3. Add a prediction component

    • Drag XGBoost Predict on CSV onto the canvas

You now have three unconnected nodes on the canvas.

Step 4: Connect the Nodes

  • Click and drag from the output port of the dataset component to the input port of the training component
  • Connect the training component’s output to the prediction component’s input

TangleML will validate the connections and highlight any type mismatches.

Step 5: Run It

  1. Click Submit (requires login for execution)
  2. Switch to the Pipeline Run view
  3. Watch each step turn green as it completes
  4. Inspect logs, outputs, and artifacts for every task

If a step fails, click it to see the exact error log. Fix the component or its arguments, then resubmit. Because of caching, unchanged upstream steps will skip execution on the next run.

🧠 Core Concepts

Components

A component is a self-contained unit defined by a YAML file. It specifies inputs, outputs, and how to run the code (usually inside a Docker container). Components are reusable across pipelines and shareable between teams.

Tasks and Executions

When you connect components into a pipeline, you create a task graph. Each node becomes a task. When you submit, TangleML creates an execution that runs each task in the right order, passing data between them automatically.

Caching

TangleML hashes the container specification and input data for each task. If an identical task was run before, the result is reused. Even if you are running multiple pipelines in parallel, TangleML can reuse still-running executions. This saves significant time and compute cost.

🛠️ Running Locally (Optional)

If you prefer to self-host:

  1. Install Docker and uv

  2. Clone the repositories:

    git clone https://github.com/tangleml/tangle.git tangle/backend --branch stable
    git clone https://github.com/tangleml/tangle-ui.git tangle/frontend_build --branch gh_pages_stable --single-branch --depth 1
  3. Start the app:

    cd tangle && backend/start_local.sh
  4. Open localhost:8000

Google Cloud Shell is another free option (50 hours per week). Follow the same clone and start steps inside Cloud Shell, then proxy port 8000.

📝 Tips

  • Start with the Quick Start components; they are fully documented
  • Click any component’s info dialog and check the Implementation tab to see the underlying YAML
  • Run the pipeline after adding each component to verify it works incrementally
  • Use the clone-run feature to reproduce exact results later

That is it. TangleML turns pipeline building from file editing into visual assembly. You focus on the logic; TangleML handles the orchestration.

Crepi il lupo! 🐺