Chaining Experiments with a DAG
Crystallize represents workflows as a directed acyclic graph. Nodes are regular Experiment instances; edges pass artifacts between them.
1. Publish Outputs Upstream
Section titled “1. Publish Outputs Upstream”outputs: summary: file_name: summary.json writer: dump_json loader: load_jsonsteps: - produce_summary- The loader/writer functions live in
outputs.py. - Pipeline steps accept artifacts by annotating parameters with
Artifact(see the CLI tutorial for a full example).
2. Consume Them Downstream
Section titled “2. Consume Them Downstream”Reference upstream outputs using experiment#artifact inside your downstream config.yaml:
datasource: producer_summary: producer#summarysteps: - inspect_summaryWhen the consumer runs, the datasource returns a dictionary whose values are the loader outputs (load_json(...) in this example).
3. Run the Graph Programmatically
Section titled “3. Run the Graph Programmatically”from crystallize import ExperimentGraph
graph = ExperimentGraph.from_yaml("experiments/consumer/config.yaml")result = graph.run()ExperimentGraph.from_yaml inspects the folder hierarchy, finds dependencies, and executes them in topological order. The returned dictionary maps experiment name to Result.
4. Treatment Propagation in DAGs
Section titled “4. Treatment Propagation in DAGs”Treatments flow through the graph by name. When you run a downstream experiment with treatment high_lr, the graph activates that treatment on the downstream node and any upstream node that defines a treatment with the same name. If an upstream experiment does not define high_lr, it runs with its baseline condition. This keeps treatment intent consistent while avoiding surprises when parents do not have matching variants.
5. Using the CLI
Section titled “5. Using the CLI”nopens Create New Experiment. Enable Use outputs from other experiments to select artifacts from existing folders. Each selection adds anexperiment#artifactentry underdatasource:.- Graph experiments display a
📈icon and show dependencies in the run screen. When you run the downstream node, the CLI executes prerequisites first.
6. Combining Multiple Outputs
Section titled “6. Combining Multiple Outputs”If you need more control, construct an ExperimentInput manually:
from crystallize import ExperimentInput
ds = ExperimentInput( summary=producer.artifact_datasource(step="Produce_SummaryStep", name="summary.json"), metrics=analytics.artifact_datasource(step="WriteMetricsStep", name="metrics.csv"),)consumer_experiment.datasource = dsExperimentInput bundles multiple datasources and ensures replicate counts align when artifacts share the same upstream experiment.
7. Visualising
Section titled “7. Visualising”ExperimentGraph.visualize_from_yaml("experiments/consumer/config.yaml")The helper renders a Graphviz diagram (requires Graphviz installed) showing experiment dependencies—handy for large workflows.
8. Troubleshooting
Section titled “8. Troubleshooting”- Missing artifact – Ensure upstream experiments ran with
ArtifactPluginand that thefile_name/step names match. The CLI error panel lists the missing path. - Replicate mismatch – If upstream artifacts have different replicate counts, update the producer configuration or homogenise the data before chaining.
- Loader returns bytes – Provide a
loaderfunction inoutputs.pyto decode bytes into richer objects.