Poster
Nonparametric Identification of Latent Concepts
Yujia Zheng · Shaoan Xie · Kun Zhang
East Exhibition Hall A-B #E-1809
We are born with the ability to learn concepts by comparing diverse observations. This helps us to understand the new world in a compositional manner and facilitates extrapolation, as objects naturally consist of multiple concepts. In this work, we argue that the cognitive mechanism of comparison, fundamental to human learning, is also vital for machines to recover true concepts underlying the data. This offers correctness guarantees for the field of concept learning, which, despite its impressive empirical successes, still lacks general theoretical support. Specifically, we aim to develop a theoretical framework for the identifiability of concepts with multiple classes of observations. We show that with sufficient diversity across classes, hidden concepts can be identified without assuming specific concept types, functional relations, or parametric generative models. Interestingly, even when conditions are not globally satisfied, we can still provide alternative guarantees for as many concepts as possible based on local comparisons, thereby extending the applicability of our theory to more flexible scenarios. Moreover, the hidden structure between classes and concepts can also be identified nonparametrically. We validate our theoretical results in both synthetic and real-world settings.
Imagine how we learn about the world as children. We see many different things – a fluffy cat, a furry dog, a feathered bird – and by comparing them, we start to understand underlying concepts like "animal," "furry," or "has wings." This ability to compare diverse examples helps us make sense of new things we've never encountered before.This paper argues that a similar process of comparison is crucial for computers to truly learn the basic concepts hidden in data. While current computer learning methods are powerful, they often lack a solid guarantee that they're learning the right concepts.Our work provides a new understanding of how computers can reliably identify these hidden concepts. We show that if a computer system is fed enough varied examples across different categories, it can pinpoint the fundamental concepts without needing to be told beforehand what types of concepts to look for or how they are connected.Even when the data is not perfectly diverse, our approach can still identify as many concepts as possible by making local comparisons. Furthermore, this method can also uncover the natural relationships between different categories of data and the concepts they represent. We've tested these ideas and confirmed they work, both in controlled experiments and with real-world datasets.