Skip to content

Hypothesis


rank_by_p_value(result: dict) → float

A simple, picklable ranker function. Lower p-value is better.


Encapsulate a statistical test to compare baseline and treatment results.

__init__(
verifier: Callable[[Mapping[str, Sequence[Any]], Mapping[str, Sequence[Any]]], Mapping[str, Any]],
metrics: Optional[str, Sequence[str], Sequence[Sequence[str]]] = None,
ranker: Optional[Callable[[Mapping[str, Any]], float]] = None,
name: Optional[str] = None
) → None

rank_treatments(verifier_results: Mapping[str, Any]) → Mapping[str, Any]

Rank treatments using the ranker score function.


verify(
baseline_metrics: Mapping[str, Sequence[Any]],
treatment_metrics: Mapping[str, Sequence[Any]]
) → Any

Evaluate the hypothesis using selected metric groups.

Args:

  • baseline_metrics: Aggregated metrics from baseline runs.
  • treatment_metrics: Aggregated metrics from a treatment.

Returns: The output of the verifier callable. When multiple metric groups are specified the result is a list of outputs in the same order.