Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Restoring Task-Relevant Information in Synthetic Data: A Small-Scale V-Information View

Sid Bharthulwar

Keywords: [ Small-Scale Experiments ] [ Synthetic Data ] [ Inductive Biases ] [ LLMs ] [ Alignment ] [ Model Capacity ] [ CNNs ] [ Information Restoration ] [ V-Information ]


Abstract:

This paper investigates synthetic data generation as a mechanism for restoring or reformatting task-relevantinformation that is obscured or unusable for a specific, computationally bounded learner. We conduct a small-scale,controlled experiment on CIFAR-10, involving pixel permutation to corrupt data, a Convolutional Autoencoder(Conv-AE) synthesizer for information restoration, and a downstream CNN learner. Framed through V-Information,which quantifies information accessible to such a learner, empirical results demonstrate that while permutationdrastically reduces usable V-Information, the synthesizer partially restores it, leading to significant performancerecovery. We further explore how model capacities interact with this process, finding learner capacity beneficialonly when usable information is present. This highlights computation’s role in making latent informationaccessible, a principle highly relevant to current synthetic data practices in capabilities and alignment of foundation models.

Chat is not available.