Skip to content

Chaining Experiments with a DAG

Crystallize represents workflows as a directed acyclic graph. Nodes are regular Experiment instances; edges pass artifacts between them.

producer/config.yaml
outputs:
summary:
file_name: summary.json
writer: dump_json
loader: load_json
steps:
- produce_summary
  • The loader/writer functions live in outputs.py.
  • Pipeline steps accept artifacts by annotating parameters with Artifact (see the CLI tutorial for a full example).

Reference upstream outputs using experiment#artifact inside your downstream config.yaml:

consumer/config.yaml
datasource:
producer_summary: producer#summary
steps:
- inspect_summary

When the consumer runs, the datasource returns a dictionary whose values are the loader outputs (load_json(...) in this example).

from crystallize import ExperimentGraph
graph = ExperimentGraph.from_yaml("experiments/consumer/config.yaml")
result = graph.run()

ExperimentGraph.from_yaml inspects the folder hierarchy, finds dependencies, and executes them in topological order. The returned dictionary maps experiment name to Result.

Treatments flow through the graph by name. When you run a downstream experiment with treatment high_lr, the graph activates that treatment on the downstream node and any upstream node that defines a treatment with the same name. If an upstream experiment does not define high_lr, it runs with its baseline condition. This keeps treatment intent consistent while avoiding surprises when parents do not have matching variants.

  • n opens Create New Experiment. Enable Use outputs from other experiments to select artifacts from existing folders. Each selection adds an experiment#artifact entry under datasource:.
  • Graph experiments display a 📈 icon and show dependencies in the run screen. When you run the downstream node, the CLI executes prerequisites first.

If you need more control, construct an ExperimentInput manually:

from crystallize import ExperimentInput
ds = ExperimentInput(
summary=producer.artifact_datasource(step="Produce_SummaryStep", name="summary.json"),
metrics=analytics.artifact_datasource(step="WriteMetricsStep", name="metrics.csv"),
)
consumer_experiment.datasource = ds

ExperimentInput bundles multiple datasources and ensures replicate counts align when artifacts share the same upstream experiment.

ExperimentGraph.visualize_from_yaml("experiments/consumer/config.yaml")

The helper renders a Graphviz diagram (requires Graphviz installed) showing experiment dependencies—handy for large workflows.

  • Missing artifact – Ensure upstream experiments ran with ArtifactPlugin and that the file_name/step names match. The CLI error panel lists the missing path.
  • Replicate mismatch – If upstream artifacts have different replicate counts, update the producer configuration or homogenise the data before chaining.
  • Loader returns bytes – Provide a loader function in outputs.py to decode bytes into richer objects.