Putting It All Together – Building a Graph Experiment
Crystallize experiments can depend on one another by publishing artifacts and consuming them downstream. The CLI recognises these relationships and runs the graph in topological order. This tutorial builds a minimal two-stage pipeline: a producer writes an artifact, and a consumer reads it.
1. Create the Producer
Section titled “1. Create the Producer”-
Launch
crystallizeand pressn. -
Name the experiment
producer. Includedatasources.py,steps.py, andoutputs.py. Enable Add example code for quick scaffolding. -
In
outputs.py, add helpers that turn a Python dictionary into JSON bytes and back:import jsonfrom pathlib import Pathdef dump_json(payload: dict) -> bytes:return json.dumps(payload).encode("utf-8")def load_json(path: Path) -> dict:return json.loads(path.read_text()) -
Update
config.yaml:name: produceroutputs:summary:file_name: summary.jsonwriter: dump_jsonloader: load_jsonsteps:- produce_summarytreatments:baseline: {} -
Implement
produce_summaryinsteps.py:from crystallize import pipeline_step, Artifact@pipeline_step()def produce_summary(data: dict, *, summary: Artifact):summary.write({"total": sum(data["numbers"])})return data -
Ensure the datasource returns the numbers:
from crystallize import data_source, FrozenContext@data_sourcedef numbers(ctx: FrozenContext) -> dict[str, list[int]]:return {"numbers": [1, 2, 3]}
Running producer now writes data/producer/v0/.../summary.json.
2. Create the Consumer
Section titled “2. Create the Consumer”-
Press
nagain and name the experimentconsumer. Includedatasources.pyandsteps.py. Tick Use outputs from other experiments and selectproducer -> summary. -
The generated
config.yamlincludes a datasource reference:datasource:producer_summary: producer#summarysteps:- inspect_summary -
Implement the step to utilise the loaded JSON:
from crystallize import pipeline_step@pipeline_step()def inspect_summary(data: dict):summary = data["producer_summary"] # already a dict thanks to load_jsontotal = summary["total"]return data, {"total_from_producer": total}
Because we defined a loader in the producer, data["producer_summary"] is a dictionary instead of raw bytes.
3. Run the Graph
Section titled “3. Run the Graph”- On the selection screen, the consumer experiment is marked as a graph (its datasource references another experiment).
- Highlight
consumerand pressEnter. The CLI automatically runsproducerfirst, thenconsumer, respecting the dependency. - The summary tab includes separate sections for the producer and consumer, with artifacts and metrics for each.
4. Tips for Larger Graphs
Section titled “4. Tips for Larger Graphs”- Graph experiments can reference multiple outputs. Each alias becomes an entry in the datasource payload.
- The CLI persists treatment state per experiment. If you disable a treatment upstream, the consumer run uses the subset of outputs generated by the active treatments.
- You can load the entire directory programmatically via
ExperimentGraph.from_yaml("experiments/")and call.visualize_from_yaml(...)to render the DAG. - When branching graphs become large, use
cli.priorityandcli.groupto keep the selection tree navigable.