Contributing to Crystallize
First off, thank you for considering contributing to Crystallize! We welcome contributions of all kinds, from bug reports and documentation improvements to new features and architectural suggestions. Every contribution helps make Crystallize a better tool for everyone.
This project adheres to a philosophy of clarity, reproducibility, and scientific rigor. Please keep these principles in mind as you propose changes.
Getting Started: Setting Up Your Environment
Section titled “Getting Started: Setting Up Your Environment”To contribute code, you’ll need to set up a local development environment. Crystallize uses pixi to manage dependencies and environments, which ensures a consistent setup for all contributors.
-
Prerequisite: Ensure you have pixi installed on your system.
-
Fork and Clone the Repository:
- Fork the official Crystallize repository on GitHub.
- Clone your fork to your local machine:
git clone https://github.com/<YOUR-USERNAME>/crystallize.gitcd crystallize- Install Dependencies:
Use pixi to install all required dependencies defined under [tool.pixi] in pyproject.toml. This command creates a virtual environment in the .pixi directory and installs everything needed for development and testing.
pixi installRunning Tests and Checks
Section titled “Running Tests and Checks”To maintain code quality and stability, we require that all contributions pass our suite of tests and linting checks.
- Run Unit Tests: Execute the full test suite using pytest.
pixi run test- Run Linting and Formatting Checks: We use
rufffor linting and formatting.
pixi run lint- Coverage (required for CI):
pixi run cov # generates coverage.xmlpixi run diff-cov # checks patch coverage against mainPlease ensure both commands run successfully without errors before submitting a pull request.
How to Contribute
Section titled “How to Contribute”Code Contributions
Section titled “Code Contributions”We follow a standard GitHub workflow for code changes.
- Create a Branch: From the main branch, create a new feature branch for your changes. Please use a descriptive name.
# Example: git checkout -b feature/new-statistical-verifier
# Example: git checkout -b fix/cache-invalidation-bug
git checkout -b <type>/<short-description>-
Make Your Changes: Write the code for your new feature or bug fix.
-
Add Tests: All new features must be accompanied by tests. Bug fixes should ideally include a test that exposes the bug and verifies the fix. Tests are located in the
/testsdirectory. -
Validate: Run the tests and linter to ensure your changes haven’t introduced any issues.
pixi run testpixi run lint-
Commit and Push: Commit your changes with a clear, descriptive commit message and push them to your forked repository.
-
Open a Pull Request: Navigate to the Crystallize repository on GitHub and open a pull request from your feature branch to the main branch.
Documentation Contributions
Section titled “Documentation Contributions”Our documentation is built with Starlight and the source files are located in the /docs directory. We welcome improvements, corrections, and new content.
- Editing Pages: Most pages are standard Markdown (
.md) files. You can edit them directly. - Structure: The documentation follows the Diátaxis framework. Please try to place new content in the appropriate section (Tutorial, How-To, Reference, or Explanation).
- Submitting Changes: For small changes like typos or clarifications, you can use the “Edit this page” link on the documentation site. For larger changes, please follow the same PR process as for code contributions.
Pull Request Guidelines
Section titled “Pull Request Guidelines”To help us review your PR efficiently, please ensure the following:
- Clear Title: The PR title should be a concise summary of the change (e.g., “Feat: Add support for Chi-squared verifier”).
- Detailed Description: Fill out the PR description template to explain the what and why of your changes. A good description helps the reviewer understand your thought process.
A good PR description looks like this:
### Summary
Clearly describe the motivation and objectives for this PR. For example: "This PR introduces a new `ChiSquaredVerifier` to enable goodness-of-fit tests within hypotheses, addressing issue #42."
### Changes
- Added `crystallize/verifiers/chi_squared.py` with the new verifier logic.- Created `tests/test_chi_squared_verifier.py` with unit tests covering key scenarios.- Updated documentation in `docs/reference/verifiers.md` to include the new class.Testing & Verification
Section titled “Testing & Verification”Explain exactly how the changes were tested. “All new and existing unit tests pass via pixi run test. Manually verified the verifier’s output against a known example from a statistics textbook.”
Developer FAQ & Common Pitfalls
Section titled “Developer FAQ & Common Pitfalls”Q: I’m getting a ContextMutationError. Why?
Section titled “Q: I’m getting a ContextMutationError. Why?”A: Crystallize uses an immutable FrozenContext to ensure reproducibility and prevent side effects. This error means a PipelineStep or Treatment tried to change an existing key in the context.
- Incorrect:
ctx["learning_rate"] = 0.001(This will fail iflearning_ratealready exists) - Correct:
ctx.add("new_learning_rate", 0.001)(Only add new, non-conflicting keys)
Treatments should only add new parameters to the context for downstream steps to use.
Q: My tests are failing due to cache inconsistencies. What should I do?
Section titled “Q: My tests are failing due to cache inconsistencies. What should I do?”A: The cache lives in the .cache/ directory. If you suspect it’s stale or causing issues during development, you can safely delete it:
rm -rf .cache/Q: Why is my experiment running slowly with the ThreadPoolExecutor?
Section titled “Q: Why is my experiment running slowly with the ThreadPoolExecutor?”A: Python’s Global Interpreter Lock (GIL) means the ThreadPoolExecutor provides limited benefit for CPU-bound tasks (like heavy numerical computation), as only one thread can execute Python bytecode at a time.
- For I/O-bound steps (e.g., waiting for API calls, reading from disk), use the default
"thread"executor. - For CPU-bound steps (e.g., complex simulations, training a model), use the
ParallelExecutionplugin with the"process"executor:
Experiment( plugins=[ParallelExecution(executor_type="process")])Remember that all data passed between processes must be “picklable”.
Code of Conduct
Section titled “Code of Conduct”All contributors are expected to adhere to our Code of Conduct. Please be respectful and constructive in all interactions.