Benchmarks

This page summarizes benchmark results for the MultivariateImputer across multiple datasets and missingness patterns.

Benchmark Table

The table below is rendered with DataTables, the same third-party display library used in How to Use.

Methodology

Benchmarks are computed by taking each dataset, dropping rows with existing missing values (to preserve ground truth), injecting synthetic missingness, imputing, and scoring only the masked entries. Two missingness patterns are evaluated:

MAR_0.10: 10% missing-at-random across all cells.
Blocks_0.20x0.30: contiguous blocks covering 20% of the rows in 30% of the columns.

Datasets include three numeric-only scikit-learn tabular datasets (Diabetes, Wine, Breast Cancer), plus mixed-type Titanic and a synthetic mixed dataset. Metrics are split by data type: regression metrics (RMSE, MAE, R2, MAPE, SMAPE, median AE, bias, normalized RMSE) for numeric columns and classification metrics (accuracy, balanced accuracy, macro precision/recall/F1, MCC, Cohen’s kappa) for categorical columns. Coverage reports the fraction of masked values that received finite predictions.