Migration Guide
Upgrading to v0.25.2? Learn about strict injection and deterministic seeding changes.
Crystallize is a lightweight Python library for structuring scientific and machine learning (ML) experiments. It provides a clear, declarative framework to ensure your work is reproducible, statistically rigorous, and easy to understand. Instead of writing tangled scripts, you define modular components that Crystallize orchestrates into a controlled experiment.
At its heart, Crystallize organizes your research around a few key ideas:
PipelineStep objects that deterministically transform data.Get a feel for Crystallize by defining a simple A/B test. In this example, we test whether adding a value (delta) to our initial data significantly changes the outcome.
from crystallize import ( data_source, hypothesis, pipeline_step, treatment, verifier,)from crystallize import ( SeedPlugin, ParallelExecution, FrozenContext, Experiment, Pipeline,)from scipy.stats import ttest_indimport random
# 1. Define how to get data@data_sourcedef initial_data(ctx: FrozenContext): return [0, 0, 0]
# 2. Define the data processing pipeline@pipeline_step()def add_delta(data, ctx: FrozenContext): # The 'delta' value is injected by our treatment return [x + ctx.get("delta", 0.0) for x in data]
@pipeline_step()def add_random(data, ctx: FrozenContext): # Add some random noise to the data # So p-values don't throw errors return [x + random.random() for x in data]
@pipeline_step()def compute_metrics(data, ctx: FrozenContext): # Record a metric for the hypothesis ctx.metrics.add("result", sum(data)) return data
# 3. Define the treatment (the change we are testing)add_ten = treatment( name="add_ten_treatment", apply={"delta": 10.0} # This dict is added to the context)
# 4. Define the hypothesis to verify@verifierdef welch_t_test(baseline, treatment, alpha: float = 0.05): t_stat, p_value = ttest_ind( treatment["result"], baseline["result"], equal_var=False ) return {"p_value": p_value, "significant": p_value < alpha}
@hypothesis(verifier=welch_t_test(), metrics="result")def check_for_improvement(res): # The ranker function determines the "best" treatment. # Lower p-value is better. return res.get("p_value", 1.0)
# 5. Build and run the experimentif __name__ == "__main__": experiment = Experiment( datasource=initial_data(), pipeline=Pipeline([add_delta(), add_random(), compute_metrics()]), plugins=[SeedPlugin(), ParallelExecution()], ) experiment.validate() # optional result = experiment.run( treatments=[add_ten()], hypotheses=[check_for_improvement], replicates=20, # Run the experiment 20 times for statistical power )
# Print the results for our hypothesis hyp_result = result.get_hypothesis("check_for_improvement") print(hyp_result.results)This code defines a complete experiment, runs it 20 times for both the baseline (delta=0) and the treatment (delta=10), and uses a t-test to check if the difference in the result metric is statistically significant.
Crystallize’s documentation is organized by the Diátaxis framework, designed to help you find what you need quickly.
Migration Guide
Upgrading to v0.25.2? Learn about strict injection and deterministic seeding changes.
Troubleshooting
Fix common runtime errors (async loops in notebooks, missing context params, extras imports).
What makes Crystallize different from other experiment frameworks?
Crystallize is uniquely focused on scientific and statistical rigor. Its design enforces a clean separation between data processing (Pipelines), variations (Treatments), and evaluation (Hypotheses), which helps prevent common pitfalls in experimental design. The immutable context and automatic caching ensure results are trustworthy and reproducible.
How does Crystallize improve reproducibility?
Is Crystallize ready for production?
Crystallize is currently in a pre-alpha stage, as noted in the README.md. The core API is stabilizing, but breaking changes are still possible. It is best suited for research and development environments where rigorous experimentation is valued.
Ready to dive in? Head over to the Getting Started Tutorial to build your first experiment.