How-To: Advanced DAG Caching
Crystallize stores every run under data/<experiment>/vN/. The completion marker .crystallize_complete and accompanying metadata.json allow later runs to skip work. Two strategies control this behaviour.
1. Strategies
Section titled “1. Strategies”| Strategy | Behaviour |
|---|---|
"rerun" (default) | Always execute the experiment, ignoring cached artifacts. |
"resume" | If the latest run completed (baseline + all active treatments) and artifacts are still present, load metrics and artifacts instead of re-running. |
You can set the strategy per experiment (experiment.strategy = "resume") or pass strategy="resume" to Experiment.run() / ExperimentGraph.run().
2. When Does Resume Re-run?
Section titled “2. When Does Resume Re-run?”Even under resume, Crystallize re-executes an experiment when:
- The pipeline signature changes (code edits or new parameters).
- Treatments differ from the cached run (new names or different apply payloads).
- Upstream dependencies rerun and publish new artifacts.
- The completion marker or metadata is missing.
These rules are covered by the test suite (tests/test_experiment_graph.py, tests/test_experiment_graph_resume.py).
3. Mixed Replicates
Section titled “3. Mixed Replicates”Artifact metadata stores the replicate count of the producing experiment. When a downstream experiment has more replicates than the upstream producer, Artifact.fetch cycles indices so replicate i reads i % upstream_replicates. This lets you generate expensive artifacts once and reuse them many times.
4. Example: Skipping Work
Section titled “4. Example: Skipping Work”graph = ExperimentGraph.from_yaml("experiments/consumer/config.yaml")
# First run produces artifactsgraph.run(replicates=10)
# Second run loads existing resultsgraph.run(strategy="resume")Use logging (or the CLI) to confirm that steps are skipped: cached steps show lock icons in the run screen, and plugins such as LoggingPlugin(verbose=True) emit messages when resume loads stored metrics.
5. Manual Cache Controls
Section titled “5. Manual Cache Controls”- Delete
data/<experiment>to force a full rerun. - Toggle caching for specific steps in the CLI (
lon the run screen) to recompute only the parts you care about. - Set
ArtifactPlugin(versioned=True, artifact_retention=N)to keep multiple runs and prune older ones automatically.
Understanding these knobs keeps large DAGs responsive while guaranteeing fresh results when you change code or configuration.