Incorporating forest canopy openness and environmental covariates in predicting soil organic carbon in oak forest

The historical conversion of forests to rainfed agricultural lands in the semi -arid forest ecosystems is one of the primary sources of human -induced, greenhouse gas emission and causes of soil organic carbon (SOC) loss. This study aims to predict SOC contents as an extremely crucial factor in soil formation and fertility in the topsoil of traditionally managed semiarid oak forests. The research investigates the complex relationship between SOC and multiple soil and environmental factors using machine learning (ML) techniques. A total of 175 soil samples were taken from the topsoil (0 -30 cm). In total, 59 soil -environmental covariates were acquired from various sources, including soil property maps, derivatives of digital elevation model, remote sensing data, climatic model data, and geological data -all of which characterized the environmental variables. To generate the predictive models, Random Forest (RF) and k -nearest neighbors (k -NN) models were trained; additionally, those models were combined into an ensemble model (k-NN-RF) for predicting SOC. In addition, we utilized a bootstrapping method to quantify uncertainties for SOC predictions. To understand the soil -environmental relationships, post -hoc model assessment was conducted using variable importance analysis, demonstrating that the main predictors included canopy cover percentage, terrain surface texture, calcium carbonate equivalent, and midslope position. External validation of ML performance revealed that RF had the highest accuracy, achieving a Lin 's concordance correlation coefficient (CCC) of 0.74, followed by k-NN-RF with a CCC of 0.63. The uncertainty analysis demonstrated consistently low uncertainty across the entire study area. However, areas with complex canopy cover exhibited a higher potential for uncertainty compared to areas with more uniform canopy cover. Results indicated that the maximum SOC content was found in the forest and pasture lands. These findings underscore the significance of canopy cover in predicting SOC, which serves as a crucial indicator for forest ecosystem management and land use change. Therefore, this research provides valuable insights for developing management strategies that consider the interplay between canopy cover and soil organic carbon in semi -arid ecosystems.