Poster
in
Workshop: 1st Workshop on Foundation Models for Structured Data (FMSD)
DriMM: Drilling Multimodal Model for Time-Series and Text in the Era of Large Models
Sebastiaan Buiting · Soumyadipta Sengupta · Abdallah Benzine · Amine EL KHAIR · Imane Khaouja · Youssef Tamaazousti
Multimodal contrastive learning can align time series sensor data with textual descriptions, but its use in industrial settings is still rare. This paper introduces DriMM, a Drilling Multimodal Model that learns joint representations from time series sensor data and textual activity labels from Daily Drilling Reports. DriMM uses large models for time series and pretrained language models to build a shared embedding space across modalities. Our experiments show that DriMM enables cross-modal retrieval and zero-shot classification of drilling activities. As a side effect, the learned mono-modal representations also improve linear probing classification accuracy compared to generic pretrained baselines. These results demonstrate the potential of large models for multimodal learning in domain-specific industrial tasks.