How-To: Configure Experiments (config.yaml)
Crystallize discovers experiments by scanning for config.yaml. Each folder typically includes datasources.py, steps.py, outputs.py, and verifiers.py. The YAML file stitches them together.
1. Top-Level Fields
Section titled “1. Top-Level Fields”name: my-experiment # optional – defaults to folder namereplicates: 12 # applies to baseline + each treatment (default: 1)description: "Short blurb shown in the CLI details panel"2. CLI Metadata
Section titled “2. CLI Metadata”Controls how the experiment appears in the Textual UI.
cli: group: Feature Experiments # sidebar group priority: 10 # lower numbers sorted first icon: "🧪" # emoji shown next to the label color: "#85C1E9" # optional hex colour for the label hidden: false # skip discovery when true3. Datasource
Section titled “3. Datasource”Map aliases to factories defined in datasources.py or to outputs from upstream experiments.
datasource: raw: load_csv # loads via @data_source in datasources.py features: feature_experiment#embeddings # consumes another experiment’s output- When referencing another experiment (
experiment_name#output_name), the loader instantiates anArtifact. The downstream pipeline receives the return value of the artifact’s loader function. By default that isPath.read_bytes(), so overrideoutputs.*.loaderto decode bytes into richer objects. - If you provide a list of mappings instead of a dict, Crystallize merges them (useful when order matters).
4. Steps
Section titled “4. Steps”Ordered list of pipeline factories defined in steps.py.
steps: - load_dataframe - { clean_columns: { drop_nulls: true } } - train_model- Strings call the matching factory with no arguments.
- Dictionaries let you pass keyword arguments (
{factory: {param: value}}). Parameters flow into the decorated function and still support context injection. - A step returning
(data, metrics_dict)records metrics without mutating the context.
5. Outputs
Section titled “5. Outputs”Declare artifacts produced by the pipeline. Each entry becomes an Artifact instance.
outputs: model_blob: file_name: model.pkl # optional – defaults to alias writer: dump_pickle # function in outputs.py loader: load_pickle # used when another experiment consumes itPipeline steps accept these artifacts by annotating a parameter with Artifact:
from crystallize import pipeline_step, Artifact
@pipeline_step()def save_model(data, *, model_blob: Artifact): model_blob.write(data["model_bytes"]) return dataArtifacts are written under data/<experiment>/vN/... by the default ArtifactPlugin. Enable versioned: true on the plugin to retain multiple runs.
6. Treatments
Section titled “6. Treatments”Context changes evaluated against the baseline.
treatments: baseline: {} tuned_lr: learning_rate: 0.05 temperature_sweep: temperature: 0.9- Keys become treatment names in the CLI and result summaries.
- Values merge into the context. Use nested dictionaries if you want to group related parameters.
7. Hypotheses
Section titled “7. Hypotheses”Hook verifiers defined in verifiers.py.
hypotheses: - name: significance_check verifier: welch_t_test # function wrapped with @verifier metrics: total_reward # string, list, or nested lists- Metrics refer to keys recorded with
ctx.metrics.addor returned in(data, metrics_dict). - You can include multiple hypotheses; each runs independently after all replicates finish.
8. Putting It Together
Section titled “8. Putting It Together”Minimal example (examples/yaml_experiment/config.yaml):
name: yaml-demoreplicates: 8cli: group: Demo priority: 5 icon: "🧪"datasource: numbers: load_numberssteps: - add_delta - record_totaloutputs: total_blob: file_name: total.jsontreatments: baseline: {} plus_one: delta: 1hypotheses: - name: better_than_baseline verifier: welch_t_test metrics: total9. Tips
Section titled “9. Tips”- The CLI writes per-experiment state to
config.state.json(inactive treatments, cache toggles). Check that file into git if you want to share defaults. - Use
!includeor YAML anchors if you need to reuse fragments, but keep in mind the loader runs with standardyaml.safe_load. - For DAGs, consider adding a
description:to each node so the selection screen shows helpful detail.