Spotlight Poster
CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation
Aditya Gorla · Ryan Wang · Zhengtong Liu · Ulzee An · Sriram Sankararaman
East Exhibition Hall A-B #E-1300
Imagine trying to complete a puzzle where some pieces are missing—this is what data scientists face daily when working with incomplete datasets. Missing information in medical records, survey responses, or business data can lead to flawed analyses and poor decisions. Current methods for filling these gaps treat all missing data the same way, like assuming puzzle pieces disappeared randomly. We created CACTI, a new machine learning approach that recognizes that data often goes missing in patterns. CACTI learns these real-world patterns by reusing observed missingness structures to improve its predictions for filling in missing data. CACTI also reads column descriptions to understand relationships between different types of information, much like understanding that "blood pressure" and "heart rate" are related health measurements. When tested on real datasets, CACTI outperformed the best existing methods by an average of 7.8%, reaching up to 13.4% improvement in the most complex cases. This means researchers and organizations can now extract more accurate insights from incomplete data, more reliable findings, better analysis and improved downstream decisions—all from the same imperfect datasets they already have.