Poster
in
Affinity Workshop: LatinX in AI
Evaluation of Machine Learning regression techniques for analyzing Contaminated Soils
Rosa Encinas · Felipe Almeida
Abstract:
Environmental monitoring of contaminated urban zones is fundamental in risk management and decision-making. While the use of Machine Learning in these contexts is on the rise, its application to underground gas and vapor monitoring data remains limited. This study evaluates Supervised ML models for analyzing multivariate time series of gas emissions---including CH$_4$, CO$_2$, O$_2$, H$_2$S, and CO---from contaminated soil and subsoil environments. The dataset comprises observations from 131 gas monitoring wells across 14 buildings on the USP Leste campus in São Paulo, Brazil, collected between 2014 and 2022. Five regression models were tested: Linear Regression, k-Nearest Neighbors, Decision Tree, Random Forest, and XGBoost. Model performance was assessed using $R^2$, MAE, and RMSE. Random Forest consistently achieved superior performance in the per-well modeling scenario (Experiment 1), demonstrating lower error rates and effective pattern recognition across individual monitoring wells. In the all-well configuration (Experiment 2), the best-performing model varied depending on feature composition, with DT, RF, and XGBoost excelling in different settings.
Chat is not available.