API Reference for Cross-PPI

Documentation for functions implementing cross-prediction-powered inference [ZC23] can be found here.

ppi_py.crossppi_mean_pointestimate(Y, Yhat, Yhat_unlabeled)[source]

Computes the cross-prediction-powered point estimate of the mean.

Parameters:
  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

Returns:

Cross-prediction-powered point estimate of the mean.

Return type:

float or ndarray

ppi_py.crossppi_mean_ci(Y, Yhat, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None)[source]

Computes the cross-prediction-powered confidence interval for the mean.

Parameters:
  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in (0, 1).

  • alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.

  • bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “Y”, “Yhat”, “Yhat_unlabeled”.

Returns:

Lower and upper bounds of the cross-prediction-powered confidence interval for the mean.

Return type:

tuple

ppi_py.crossppi_quantile_pointestimate(Y, Yhat, Yhat_unlabeled, q, exact_grid=False)[source]

Computes the cross-prediction-powered point estimate of the quantile.

Parameters:
  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • q (float) – Quantile to estimate.

  • exact_grid (bool, optional) – Whether to compute the exact solution (True) or an approximate solution based on a linearly spaced grid of 5000 values (False).

Returns:

Cross-prediction-powered point estimate of the quantile.

Return type:

float

ppi_py.crossppi_quantile_ci(Y, Yhat, Yhat_unlabeled, q, alpha=0.1, alternative='two-sided', bootstrap_data=None, exact_grid=False)[source]

Computes the cross-prediction-powered confidence interval for the quantile.

Parameters:
  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • q (float) – Quantile to estimate. Must be in the range (0, 1).

  • alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).

  • alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.

  • bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “Y”, “Yhat”, “Yhat_unlabeled”.

  • exact_grid (bool, optional) – Whether to use the exact grid of values or a linearly spaced grid of 5000 values.

Returns:

Lower and upper bounds of the cross-prediction-powered confidence interval for the quantile.

Return type:

tuple

ppi_py.crossppi_ols_pointestimate(X, Y, Yhat, X_unlabeled, Yhat_unlabeled)[source]

Computes the cross-prediction-powered point estimate of the OLS coefficients.

Parameters:
  • X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).

  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

Returns:

Cross-prediction-powered point estimate of the OLS coefficients.

Return type:

ndarray

ppi_py.crossppi_ols_ci(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None)[source]

Computes the cross-prediction-powered point estimate of the OLS coefficients.

Parameters:
  • X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).

  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).

  • alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.

  • bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “X”, “Y”, “Yhat”, “Yhat_unlabeled”.

Returns:

Lower and upper bounds of the cross-prediction-powered confidence interval for the OLS coefficients.

Return type:

tuple

ppi_py.crossppi_logistic_pointestimate(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, optimizer_options=None)[source]

Computes the cross-prediction-powered point estimate of the logistic regression coefficients.

Parameters:
  • X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).

  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • optimizer_options (dict, optional) – Options to pass to the optimizer. See scipy.optimize.minimize for details.

Returns:

Cross-prediction-powered point estimate of the logistic regression coefficients.

Return type:

ndarray

ppi_py.crossppi_logistic_ci(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None, optimizer_options=None)[source]

Computes the cross-prediction-powered confidence interval for the logistic regression coefficients.

Parameters:
  • X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).

  • Y (ndarray) – Gold-standard labels. Shape (n,).

  • Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).

  • X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).

  • Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).

  • alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).

  • alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.

  • bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “X”, “Y”, “Yhat”, “Yhat_unlabeled”.

  • optimizer_options (dict, ooptional) – Options to pass to the optimizer. See scipy.optimize.minimize for details.

Returns:

Lower and upper bounds of the cross-prediction-powered confidence interval for the logistic regression coefficients.

Return type:

tuple

[ZC23] T. Zrnic and E. J. Candès. Cross-Prediction-Powered Inference. arxiv:2309.16598, 2023.