API Reference for Cross-PPI¶
Documentation for functions implementing cross-prediction-powered inference [ZC23] can be found here.
- ppi_py.crossppi_mean_pointestimate(Y, Yhat, Yhat_unlabeled)[source]¶
Computes the cross-prediction-powered point estimate of the mean.
- Parameters:
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
- Returns:
Cross-prediction-powered point estimate of the mean.
- Return type:
float or ndarray
- ppi_py.crossppi_mean_ci(Y, Yhat, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None)[source]¶
Computes the cross-prediction-powered confidence interval for the mean.
- Parameters:
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in (0, 1).
alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.
bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “Y”, “Yhat”, “Yhat_unlabeled”.
- Returns:
Lower and upper bounds of the cross-prediction-powered confidence interval for the mean.
- Return type:
tuple
- ppi_py.crossppi_quantile_pointestimate(Y, Yhat, Yhat_unlabeled, q, exact_grid=False)[source]¶
Computes the cross-prediction-powered point estimate of the quantile.
- Parameters:
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
q (float) – Quantile to estimate.
exact_grid (bool, optional) – Whether to compute the exact solution (True) or an approximate solution based on a linearly spaced grid of 5000 values (False).
- Returns:
Cross-prediction-powered point estimate of the quantile.
- Return type:
float
- ppi_py.crossppi_quantile_ci(Y, Yhat, Yhat_unlabeled, q, alpha=0.1, alternative='two-sided', bootstrap_data=None, exact_grid=False)[source]¶
Computes the cross-prediction-powered confidence interval for the quantile.
- Parameters:
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
q (float) – Quantile to estimate. Must be in the range (0, 1).
alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).
alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.
bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “Y”, “Yhat”, “Yhat_unlabeled”.
exact_grid (bool, optional) – Whether to use the exact grid of values or a linearly spaced grid of 5000 values.
- Returns:
Lower and upper bounds of the cross-prediction-powered confidence interval for the quantile.
- Return type:
tuple
- ppi_py.crossppi_ols_pointestimate(X, Y, Yhat, X_unlabeled, Yhat_unlabeled)[source]¶
Computes the cross-prediction-powered point estimate of the OLS coefficients.
- Parameters:
X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
- Returns:
Cross-prediction-powered point estimate of the OLS coefficients.
- Return type:
ndarray
- ppi_py.crossppi_ols_ci(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None)[source]¶
Computes the cross-prediction-powered point estimate of the OLS coefficients.
- Parameters:
X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).
alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.
bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “X”, “Y”, “Yhat”, “Yhat_unlabeled”.
- Returns:
Lower and upper bounds of the cross-prediction-powered confidence interval for the OLS coefficients.
- Return type:
tuple
- ppi_py.crossppi_logistic_pointestimate(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, optimizer_options=None)[source]¶
Computes the cross-prediction-powered point estimate of the logistic regression coefficients.
- Parameters:
X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
optimizer_options (dict, optional) – Options to pass to the optimizer. See scipy.optimize.minimize for details.
- Returns:
Cross-prediction-powered point estimate of the logistic regression coefficients.
- Return type:
ndarray
- ppi_py.crossppi_logistic_ci(X, Y, Yhat, X_unlabeled, Yhat_unlabeled, alpha=0.1, alternative='two-sided', bootstrap_data=None, optimizer_options=None)[source]¶
Computes the cross-prediction-powered confidence interval for the logistic regression coefficients.
- Parameters:
X (ndarray) – Covariates corresponding to the gold-standard labels. Shape (n, d).
Y (ndarray) – Gold-standard labels. Shape (n,).
Yhat (ndarray) – Predictions corresponding to the gold-standard labels. Shape (n,).
X_unlabeled (ndarray) – Covariates corresponding to the unlabeled data. Shape (N, d).
Yhat_unlabeled (ndarray) – Predictions corresponding to the unlabeled data. Columns contain predictions from different models. Shape (N, K).
alpha (float, optional) – Error level; the confidence interval will target a coverage of 1 - alpha. Must be in the range (0, 1).
alternative (str, optional) – Alternative hypothesis, either ‘two-sided’, ‘larger’ or ‘smaller’.
bootstrap_data (dict, optional) – Bootstrap data used to estimate the variance of the point estimate. Assumes keys “X”, “Y”, “Yhat”, “Yhat_unlabeled”.
optimizer_options (dict, ooptional) – Options to pass to the optimizer. See scipy.optimize.minimize for details.
- Returns:
Lower and upper bounds of the cross-prediction-powered confidence interval for the logistic regression coefficients.
- Return type:
tuple
[ZC23] T. Zrnic and E. J. Candès. Cross-Prediction-Powered Inference. arxiv:2309.16598, 2023.