Skip to content

Contributing to Crystallize

First off, thank you for considering contributing to Crystallize! We welcome contributions of all kinds, from bug reports and documentation improvements to new features and architectural suggestions. Every contribution helps make Crystallize a better tool for everyone.

This project adheres to a philosophy of clarity, reproducibility, and scientific rigor. Please keep these principles in mind as you propose changes.

Getting Started: Setting Up Your Environment

Section titled “Getting Started: Setting Up Your Environment”

To contribute code, you’ll need to set up a local development environment. Crystallize uses pixi to manage dependencies and environments, which ensures a consistent setup for all contributors.

  1. Prerequisite: Ensure you have pixi installed on your system.

  2. Fork and Clone the Repository:

Terminal window
git clone https://github.com/<YOUR-USERNAME>/crystallize.git
cd crystallize
  1. Install Dependencies:

Use pixi to install all required dependencies defined under [tool.pixi] in pyproject.toml. This command creates a virtual environment in the .pixi directory and installs everything needed for development and testing.

Terminal window
pixi install

To maintain code quality and stability, we require that all contributions pass our suite of tests and linting checks.

  • Run Unit Tests: Execute the full test suite using pytest.
Terminal window
pixi run test
  • Run Linting and Formatting Checks: We use ruff for linting and formatting.
Terminal window
pixi run lint
  • Coverage (required for CI):
Terminal window
pixi run cov # generates coverage.xml
pixi run diff-cov # checks patch coverage against main

Please ensure both commands run successfully without errors before submitting a pull request.

We follow a standard GitHub workflow for code changes.

  1. Create a Branch: From the main branch, create a new feature branch for your changes. Please use a descriptive name.
Terminal window
# Example: git checkout -b feature/new-statistical-verifier
# Example: git checkout -b fix/cache-invalidation-bug
git checkout -b <type>/<short-description>
  1. Make Your Changes: Write the code for your new feature or bug fix.

  2. Add Tests: All new features must be accompanied by tests. Bug fixes should ideally include a test that exposes the bug and verifies the fix. Tests are located in the /tests directory.

  3. Validate: Run the tests and linter to ensure your changes haven’t introduced any issues.

Terminal window
pixi run test
pixi run lint
  1. Commit and Push: Commit your changes with a clear, descriptive commit message and push them to your forked repository.

  2. Open a Pull Request: Navigate to the Crystallize repository on GitHub and open a pull request from your feature branch to the main branch.

Our documentation is built with Starlight and the source files are located in the /docs directory. We welcome improvements, corrections, and new content.

  • Editing Pages: Most pages are standard Markdown (.md) files. You can edit them directly.
  • Structure: The documentation follows the Diátaxis framework. Please try to place new content in the appropriate section (Tutorial, How-To, Reference, or Explanation).
  • Submitting Changes: For small changes like typos or clarifications, you can use the “Edit this page” link on the documentation site. For larger changes, please follow the same PR process as for code contributions.

To help us review your PR efficiently, please ensure the following:

  • Clear Title: The PR title should be a concise summary of the change (e.g., “Feat: Add support for Chi-squared verifier”).
  • Detailed Description: Fill out the PR description template to explain the what and why of your changes. A good description helps the reviewer understand your thought process.

A good PR description looks like this:

### Summary
Clearly describe the motivation and objectives for this PR. For example: "This PR introduces a new `ChiSquaredVerifier` to enable goodness-of-fit tests within hypotheses, addressing issue #42."
### Changes
- Added `crystallize/verifiers/chi_squared.py` with the new verifier logic.
- Created `tests/test_chi_squared_verifier.py` with unit tests covering key scenarios.
- Updated documentation in `docs/reference/verifiers.md` to include the new class.

Explain exactly how the changes were tested. “All new and existing unit tests pass via pixi run test. Manually verified the verifier’s output against a known example from a statistics textbook.”

Q: I’m getting a ContextMutationError. Why?

Section titled “Q: I’m getting a ContextMutationError. Why?”

A: Crystallize uses an immutable FrozenContext to ensure reproducibility and prevent side effects. This error means a PipelineStep or Treatment tried to change an existing key in the context.

  • Incorrect: ctx["learning_rate"] = 0.001 (This will fail if learning_rate already exists)
  • Correct: ctx.add("new_learning_rate", 0.001) (Only add new, non-conflicting keys)

Treatments should only add new parameters to the context for downstream steps to use.

Q: My tests are failing due to cache inconsistencies. What should I do?

Section titled “Q: My tests are failing due to cache inconsistencies. What should I do?”

A: The cache lives in the .cache/ directory. If you suspect it’s stale or causing issues during development, you can safely delete it:

Terminal window
rm -rf .cache/

Q: Why is my experiment running slowly with the ThreadPoolExecutor?

Section titled “Q: Why is my experiment running slowly with the ThreadPoolExecutor?”

A: Python’s Global Interpreter Lock (GIL) means the ThreadPoolExecutor provides limited benefit for CPU-bound tasks (like heavy numerical computation), as only one thread can execute Python bytecode at a time.

  • For I/O-bound steps (e.g., waiting for API calls, reading from disk), use the default "thread" executor.
  • For CPU-bound steps (e.g., complex simulations, training a model), use the ParallelExecution plugin with the "process" executor:
Experiment(
plugins=[ParallelExecution(executor_type="process")]
)

Remember that all data passed between processes must be “picklable”.

All contributors are expected to adhere to our Code of Conduct. Please be respectful and constructive in all interactions.