Abstract

Accurate parameterizations of entrainment and detrainment rates (ε and δ) of shallow cumulus are crucial for improving simulations of atmospheric energy and water cycles. Existing ε and δ parameterizations are often derived from theoretical derivations, limited observations, or numerical simulations, which lack comprehensive global observational support. To address this limitation, parameterizations of shallow cumulus ε and δ are developed and evaluated using a global satellite-derived data set. For parameterizations based on the physical approach, the recommended scheme incorporates environmental relative humidity (RH_e) and vertical velocity to parameterize ε, while δ is parameterized using ε and RH_e. Furthermore, the machine learning (ML) approach trained on thermodynamic, dynamic, and cloud microphysical properties can accurately predict ε and δ. Comparative analysis reveals that ML performs better than the physical approach. These findings provide valuable insights for refining cumulus parameterizations and enhancing the accuracy of climate model simulations.

Plain Language Summary

Clouds play a critical role in shaping the Earth's climate, but accurately representing their behavior in climate models remains a challenge. Entrainment and detrainment are two key processes that describe how cumulus clouds interact with the atmosphere, influencing cloud formation, lifespan, and their impact on climate. However, most existing methods for determining entrainment and detrainment rely heavily on limited data or theoretical assumptions, which can introduce inaccuracies in climate simulations. To address this, our study used a global satellite data set to improve the representation of entrainment and detrainment in climate models. We explored two approaches: one based on the physical approach and another using machine learning (ML), where algorithms learn patterns from extensive cloud and atmospheric data to make predictions. Our results show that while the physical approach offers valuable insights, the ML approach performs better at predicting entrainment and detrainment. This is because ML can uncover complex relationships in the data that traditional approaches are unable to capture. These findings help improve cloud representations in climate models, thereby enhancing the accuracy of long-term climate change predictions.

Key Points

Entrainment and detrainment rate parameterizations for shallow cumulus are derived using global satellite observations for the first time
Optimal entrainment and detrainment rate parameterizations based on the physical approach are determined
Machine learning demonstrates superior performance compared to the physical approach in predicting entrainment and detrainment rates

1 Introduction

Shallow cumulus clouds, which cover an average of 5%–30% of the sky (Mieslinger et al., 2019; Norris, 1998), significantly impact the atmospheric thermal structure (Li et al., 2019; Neggers et al., 2007), large-scale circulation (Nie, 2013; A. Siebesma, 1998), and cloud-climate feedback (Bony & Dufresne, 2005), and their effects are represented through parameterizations in current climate models (Arakawa, 2004; Gu et al., 2020; Yang et al., 2021). Cumulus parameterizations are essential for simulating climate phenomena, such as typhoons (Zhao et al., 2018a), the Madden-Julian Oscillation (Del Genio et al., 2012; Jiang et al., 2020), monsoons (Zou & Zhou, 2011), and the El Niño-Southern Oscillation (B. Lu and Ren, 2016). Among cumulus parameterizations, entrainment and detrainment rates (ε and δ) are two of the most essential parameters (Stanfield et al., 2019; Zhao, Wang, Liu and Wu, 2024; Zhao, Wang, Liu, Wu, et al., 2024; Zhu, Lu, Xu, et al., 2024). In commonly used mass-flux schemes, ε and δ govern the mass exchange between cumulus clouds and the environment (Arakawa & Schubert, 1974; Luo et al., 2010; Z. Wang, 2020). However, as noted by de Rooy et al. (2013), ε and δ are poorly constrained and show significant spatial and temporal variability. This uncertainty constitutes a significant limitation in cumulus parameterizations and directly affects their ability to realistically represent cumulus processes. Numerous studies have highlighted the pivotal role of ε and δ in simulating climate phenomena (Bush et al., 2015; Sanderson et al., 2008; Tokioka et al., 1988; Zhao et al., 2012), underscoring the need for improved parameterizations of ε and δ.

Based on observational data or model simulations, numerous studies have developed parameterizations for ε and δ by linking them to thermodynamic and dynamic factors (Li et al., 2025; Villalba-Pradas & Tapiador, 2022). However, a wide variety of such schemes exist, and no consensus has been reached, nor has a unified scheme been established (de Rooy et al., 2013). The parameterizations of ε are often developed based on cloud radius (Simpson, 1971; Squires & Turner, 1962; Takahashi et al., 2021; Turner, 1962), constant values (Derbyshire et al., 2004; A. Siebesma, 1998; Soares et al., 2004), or cloud height (Gregory & Rowntree, 1990; A. P. Siebesma and Cuijpers, 1995). However, these parameterizations are overly simplified and fail to represent the intricate entrainment-mixing processes occurring in the atmosphere. Consequently, researchers have utilized both observational data and model simulations to explore more advanced parameterizations based on vertical velocity, buoyancy, and their derivative forms (Gregory, 2001; Lin, 1999; Lu et al., 2016; Xu et al., 2021; Zhang et al., 2016). Environmental relative humidity (RH_e) is another commonly used parameter for ε parameterization, but its relationship with ε is still under debate. Some studies reported a positive correlation between ε and RH_e (Jensen & Del Genio, 2006; Lu et al., 2018), while others found an inverse relationship (Bechtold et al., 2008; Zhao et al., 2018b), posing challenges to accurately parameterize ε. The different ε-RH_e relationships arise from two competition mechanisms related to cloud size and natural selection (Derbyshire et al., 2011). The first operates in convection-favorable environments (with higher RH_e), where larger clouds tend to develop more adiabatic cores and thus exhibit lower ε, that is, a negative relationship between ε and RH_e. The second becomes significant when there is sufficient variability: a positive ε-RH_e correlation can emerge because, in drier environments, only clouds with lower ε can survive. Furthermore, large-eddy simulations by Romps (2010) suggest that ε exhibits weak or no correlation with cloud height, buoyancy, or buoyancy gradient. These results underscore the persistent uncertainties in ε parameterizations.

Because of the difficulties in estimating δ, parameterizations for δ are relatively scarce. Early cumulus parameterizations assumed that detrainment occurs only near the cloud top (Arakawa & Schubert, 1974) or at levels of neutral buoyancy (Emanuel, 1991; Moorthi & Suarez, 1992). In many studies, δ is either prescribed as a constant (Gregory, 2001; Soares et al., 2004) or set equal to ε (Han & Pan, 2011; Tiedtke, 1989). Given the strong coupling between detrainment and entrainment processes, several studies have incorporated ε into the parameterization of δ (A. Siebesma, 1998). In the classical Kain-Fritsch scheme (Kain, 2004; Kain & Fritsch, 1990), δ is parameterized as a function of the critical mixing fraction between environmental air and cloud air (Böing et al., 2012; Bretherton et al., 2004; Dawe & Austin, 2013). In addition, RH_e is also commonly employed in δ parameterization (Bush et al., 2015; Stirling & Stratton, 2012).

Existing parameterizations for ε and δ are primarily derived from theoretical derivations, limited observational data sets, or numerical simulations, and thus lack support from extensive observational data sets. Moreover, most parameterizations rely on one or two influencing factors and are developed using conventional empirical curve-fitting approaches (Villalba-Pradas & Tapiador, 2022), hereafter referred to as “physical approaches”, which fail to account for the full complexity of processes affecting ε and δ. Machine learning (ML) approaches, capable of identifying nonlinear relationships between multiple variables via data-driven training, have been widely applied in geosciences (Christopoulos et al., 2024; Colfescu et al., 2024; Eusebi et al., 2025; Gao et al., 2024; Su et al., 2020; Wickramasinghe et al., 2024; Zhang et al., 2024; Zhao et al., 2023). For instance, Shin and Baik (2022) used ML to predict ε and δ using data obtained from large-eddy simulations. However, their results lack validation against global observational data sets. To address the gap, this study utilizes the global ε and δ data set derived from satellite observations by Zhu et al. (2025) and develops parameterizations for shallow cumulus ε and δ through both physical and ML approaches. The performance of the two approaches is evaluated and compared, providing insights into the improvement of cumulus parameterizations.

2 Data and Methods

2.1 Data Set

The ε and δ data set used in this study is derived from global observations from June–August 2017 by the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor onboard the Suomi National Polar-orbiting Partnership (SNPP) satellite. Additionally, the 6-hourly NCEP FNL reanalysis data (National Centers for Environmental Prediction, 2000) and National Oceanic and Atmospheric Administration (NOAA) sea surface temperature (SST) data (Huang et al., 2021), both on a 1° × 1° grid, were used in the retrieval process. The cloud product retrievals and the derivation of the ε and δ data sets are detailed in Zhu et al. (2025). The uncertainty analysis indicates that ε and δ exhibit relatively small uncertainty under the current retrieval framework (see Text S1 in Supporting Information S1 for details). Cloud droplet number concentration (N_d) was retrieved using cloud optical thickness (COT) and cloud droplet effective radius (r_e) (Szczodrak et al., 2001; Wang et al., 2021, 2023). The RH_e was obtained from the reanalysis data, while cloud base vertical velocity (w) was retrieved using the algorithm proposed by Zheng and Rosenfeld (2015). A total of 83,240 cloud samples were collected on a global scale, exhibiting broad spatial coverage across both land and ocean areas with no significant seasonal and regional biases (Figures S1 and S2 in Supporting Information S1). To ensure the reliability and generalizability of the parameterizations, 80% of the data was randomly selected for parameterization development, while the remaining 20% was reserved for independent evaluation. In the ML approach, the 80% portion used for model construction was further split into training and validation sets in an 8:2 ratio. The sampling process was repeated ten times with different random seeds to ensure robustness and quantify uncertainty.

2.2 Machine Learning Approach

Machine learning has found widespread applications in data analysis across multiple disciplines due to its ability to automatically learn data patterns and optimize models. Among various ML algorithms, gradient boosting, an ensemble learning method, combines multiple weak learners into a robust predictive model. Studies have demonstrated that it outperforms random forests, artificial neural networks, and convolutional neural networks (Abdollahi-Arpanahi et al., 2020; Ma et al., 2018; Yan et al., 2021). Light Gradient Boosting Machine (LightGBM), developed by Microsoft, is a gradient-boosting algorithm optimized for efficiently handling large-scale structured data sets (Ke et al., 2017). It offers advantages such as high training speed, low memory usage, and the ability to perform parallel computations. Gao et al. (2024) compared four ML algorithms for predicting cloud droplet number concentration and cloud droplet relative dispersion and found that LightGBM achieves the best performance in both computational efficiency and prediction accuracy, emphasizing its potential for advancing research on entrainment-mixing processes.

To achieve the study's objective of predicting ε and δ during entrainment-mixing processes with ML, LightGBM was selected as the preferred algorithm. Model input features were chosen based on their relevance to ε and δ and their availability in numerical models. For predicting ε, the selected features included environmental variables (RH_e, specific humidity (q_ve), environmental temperature (T_e), and environmental pressure (P_e)), in-cloud variables (w, liquid water content (LWC), N_d, r_e, liquid water path, and COT), and meteorological conditions (cloud fraction, lower tropospheric stability (LTS), SST, surface relative humidity (RH_surf), and terrain height (H_terrain)). For predicting δ, the observed ε was included as an additional input feature. Note that all variables used are temporally matched to the satellite observation time and spatially matched to a 1° × 1° grid, as detailed in Zhu et al. (2025).

One of the most widely used methods for interpreting ML is the SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017), which has been widely applied for ML interpretation in geoscientific applications (García & Aznarte, 2020). SHAP values provide a fair allocation of each feature's contribution to the prediction, facilitating a deeper understanding of how ML makes predictions. For any given feature, a positive SHAP value indicates a positive contribution relative to the baseline prediction, while a negative value indicates a negative contribution. The mean absolute SHAP value is a standard metric for ranking feature importance. Given that entrainment-mixing is a small-scale process characterized by complex nonlinear interactions among variables, SHAP offers a robust framework for elucidating the complex physical mechanisms underlying ML prediction of ε and δ (Gao et al., 2024).

3 Results

3.1 Parameterization of Entrainment and Detrainment Rates Based on the Physical Approach

Previous studies have proposed parameterizations through curve fitting and other physical approaches (Li et al., 2025; Xu et al., 2021; Zhang et al., 2016). In this section, parameterizations for ε and δ are developed using physical approaches. Following similar methods in earlier research (Lu et al., 2016; Xu et al., 2021), ε is parameterized as a function of RH_e and w. Figure 1 presents the fitted functions derived from a randomly selected 80% of the data, as well as the correlation coefficient (R) and root mean squared error (RMSE) between the fitted and calculated values. Additionally, it shows the R and RMSE between the fitted and calculated ε for the remaining 20% of the data, with calculated ε serving as the ground truth for evaluation. The parameterization based on RH_e (Figure 1a) outperforms that based on w (Figure 1b), achieving higher R and lower RMSE across both the fitting and test data sets. Due to the complexity of cumulus processes (Li et al., 2019; Wu et al., 2023; Yeom et al., 2025), ε is influenced by multiple factors (Lu et al., 2016; Xu et al., 2021). Therefore, a multi-variable parameterization using both RH_e and w, which are independent of each other, is presented in Figure 1c. The results indicate that the multi-variable parameterization (Figure 1c) yields higher R and lower RMSE than the single-variable cases (Figures 1a and 1b) in both fitting and test data sets. Overall, for the parameterization of ε, the multi-variable parameterization based on RH_e and w performs better than single-variable parameterizations.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Relationship between the fitted and calculated values of entrainment (ε) or detrainment rate (δ) based on satellite observations. The parameterizations of ε use environmental relative humidity (RH_e) and vertical velocity (w) and the parameterizations of δ use ε and RH_e. The total number of samples is given in the upper-right corner of each subplot, with 80% of the data used for parameterization and 20% used for testing scheme. R and RMSE represent the correlation coefficient and root mean square error, respectively.

For the parameterization of δ, similar to Gregory and Rowntree (1990), a linear fit based on ε demonstrates strong performance, with high R and low RMSE in both the fitting and test data sets (Figure 1d). Moreover, except for a few outliers, the fitted and calculated values of δ align closely with the 1:1 line. Compared to the linear fitting, a power-law fitting based on ε achieves comparable R yet lower RMSE, and the agreement between fitted and calculated values is closer to the 1:1 line (Figures 1d and 1e). The simple parameterization of δ is fine. However, given the significant influence of RH_e on δ and its frequent inclusion in δ parameterizations, this study also explores two alternative formulations (δ = ε(a − RH_e) and δ = aε(1 − RH_e), where a is a fitting parameter), inspired by Bechtold et al. (2014) and Stratton and Stirling (2012). Figures 1f and 1g illustrate the performance of these two parameterizations. Although the two forms are similar, the parameterization based on δ = ε(a − RH_e) outperforms the alternative, showing higher R and reduced RMSE in both the fitting and test data sets. Considering that the single-variable power-law parameterization exhibits enhanced performance, this study further optimizes the two-variable parameterizations by introducing a power-law exponent, resulting in two new parameterizations: δ = ε^b (a − RH_e) and δ = aε^b (1 − RH_e). Compared to the original schemes from Bechtold et al. (2014) and Stratton and Stirling (2012) (Figures 1f and 1g), the new parameterizations show comparable or improved R and reduced RMSE in both the fitting and test data sets. Moreover, the comparison between fitted and calculated values under the new parameterizations aligns more closely with the 1:1 line (Figures 1h and 1i). Overall, the δ parameterization based on δ = ε^b (a − RH_e) demonstrates the strongest performance.

3.2 Machine Learning-Based Prediction of Entrainment and Detrainment Rates

Due to the inherent complexity of the cumulus entrainment-mixing process, ε and δ are influenced by multiple factors, including environmental meteorological conditions, cloud microphysical properties, and macrophysical characteristics. As a result, in addition to developing traditional parameterizations, ML offers an alternative method for predicting ε and δ. Previous studies have shown that LightGBM outperforms other algorithms in investigating entrainment processes (Gao et al., 2024). Accordingly, this section utilizes the LightGBM to predict ε and δ.

Using LightGBM trained on 80% of the data, Figures 2a and 2b presents the validation results obtained using the remaining 20% of the data. The ε values predicted by the ML approach align well with the calculated values, with a high R and a low RMSE. Similarly, the predicted δ values exhibit strong correlations with the calculated values and align closely along the 1:1 line, also showing high R and low RMSE. Overall, the ML approach highlights its effectiveness in accurately predicting both ε and δ.

Although the ML approach shows strong performance in predicting ε and δ, interpreting how the ML generates these predictions is equally important. This not only enhances the physical interpretability of the ML approach but also offers useful insights for future parameterizations. Through SHAP analysis, the contributions of each input feature to the predictions can be explored. Figure 2 also presents the scatter plots of SHAP values for the features contributing to the predictions of ε and δ, as well as their rankings. From the SHAP scatter plots for predicting ε (Figure 2c), higher q_ve and RH_e contribute positively to the prediction, while higher T_e, w, P_e, and LWC have negative contributions. The feature ranking (Figure 2e) indicates that the top six features influencing ε are q_ve, RH_e, T_e, w, P_e, and LWC, followed by H_terrain, LTS, and SST. The remaining features have negligible impact. Among these, q_ve and RH_e are the most influential factors, with SHAP value contributions accounting for 26.6% and 20.6%, respectively. These results are consistent with physical expectations: higher RH_e reduces buoyancy and vertical velocity within the cloud core, allowing more time for cloud-environment mixing and thus increasing ε (Lu et al., 2016, 2018; Zhu et al., 2021; Zhu, Lu, Chen, et al., 2024). The global satellite observations used for training include cloud samples with sufficient variability (relevant to the natural selection mechanism) and are not confined to convection-favorable environments (associated with the cloud-size mechanism). Therefore, the ε-RH_e relationship learned by the ML approach is more likely to reflect the natural selection mechanism rather than the cloud-size mechanism, as discussed in Section 1 (Derbyshire et al., 2011). The ML approach effectively identifies the key factors influencing ε. Beyond RH_e and w, the interpretation of the ML also highlights the significant influence of T_e and P_e.

For the prediction of δ, Figures 2d and 2f also present the SHAP-based interpretation. The results reveal that an increase in ε contributes positively to the prediction of δ, and ε is the most significant one among the input features, with a SHAP value contribution as high as 55.9%. Other features that play relatively important roles in predicting δ include P_e, w, LWC, RH_e, T_e, SST, q_ve, and RH_surf. Among these, P_e, w, LWC, and T_e contribute positively to δ prediction, while RH_e, SST, q_ve and RH_surf have negative contributions. The remaining features exhibit negligible correlation with δ and thus can be neglected. The dominant contribution of ε to δ prediction is consistent with physical expectations, as entrainment and detrainment processes are closely related. Beyond ε, the contributions of environmental variables and w to the prediction of δ are also notable.

Note that the ML approach identifies several physically meaningful variables not directly involved in the retrieval of ε and δ as important predictors (such as w and RH_e), and SHAP analysis confirms that their relationships with ε and δ align with those in physical parameterizations, indicating that the ML approach likely captures underlying physical processes rather than retrieval-related patterns. Moreover, results show that the ML approach consistently yields accurate predictions of both ε and δ across diverse spatial and environmental conditions (Figures S3, S4, and S5 in Supporting Information S1), supporting the generalizability of the developed ML-based parameterizations. Nevertheless, we still acknowledge the possibility that the ML approach may learn the systematic and structural biases embedded in the retrieval algorithm.

3.3 Discussion: Comparison Between Physical and Machine Learning Approaches

The results from the previous two sections indicate that both physical and ML approaches achieve strong performance in predicting ε and δ. Table 1 provides a comprehensive comparison of the ε and δ parameterizations developed using the two approaches, based on statistical evaluation metrics (see Text S2 for details) from ten independent runs with different random seeds. Compared to the physical approach, the LightGBM trained on all features (LGB1) yields predictions with lower mean absolute error (MAE), mean squared error (MSE), and RMSE, indicating that ML outperforms the physical approach in accurately predicting ε. Notably, the physical approach relies on only one or two parameters for parameterization (e.g., the ε parameterization based on RH_e is referred to as P1_a; based on w as P1_b; and based on both RH_e and w as P1_c). In contrast, LGB1 uses 14 features for training and predicting ε. To ensure a fair comparison, this study further trained ML using the same input parameters as those employed in the physical approach. Specifically, the LightGBM trained on RH_e is labeled as LGB1_a, the LightGBM trained on w as LGB1_b, and the LightGBM trained on both RH_e and w as LGB1_c. As shown in Table 1, LGB1_a, LGB1_b, and LGB1_c yield lower MAE, MSE, and RMSE compared to P1_a, P1_b, and P1_c, respectively. This demonstrates that, even when using the same parameters for parameterization, ML consistently outperforms the physical approach in accurately predicting ε. These results demonstrate the superiority of ML over the physical approach in developing ε parameterizations.

Table 1. Comparison of the Evaluation Metrics for Entrainment (ε) or Detrainment Rate (δ) Parameterizations Established Using Physical and Machine Learning Approaches

Predicted variable	Parameterization	Specific form or features	MAE	MSE	RMSE
ε	P1_a	ε = aRH_e^b	0.589	1.126	1.061
	P1_b	ε = aw^b	0.753	1.669	1.292
	P1_c	ε = aRH_e^bw^c	0.578	1.133	1.063
	P1_GAM	RH_e, q_ve, T_e, P_e, LWC, N_d, r_e, LWP, COT, CF, LTS, SST, RH_surf, H_terrain	0.491	0.835	0.914
	LGB1_a	RH_e	0.521	1.068	1.033
	LGB1_b	w	0.737	1.638	1.280
	LGB1_c	RH_e, w	0.526	1.043	1.021
	LGB1	RH_e, q_ve, T_e, P_e, LWC, N_d, r_e, LWP, COT, CF, LTS, SST, RH_surf, H_terrain	0.329	0.510	0.714
δ	P2_a	δ = aε	0.314	0.221	0.469
	P2_b	δ = aε^b	0.199	0.158	0.396
	P2_c	δ = ε(a − RH_e)	0.395	0.284	0.532
	P2_d	δ = aε(1 − RH_e)	0.763	1.289	1.135
	P2_e	δ = ε^a (b − RH_e)	0.226	0.173	0.415
	P2_f	δ = aε^b (1 − RH_e)	0.376	0.514	0.716
	P2_GAM	ε, RH_e, q_ve, T_e, P_e, LWC, N_d, r_e, LWP, COT, CF, LTS, SST, RH_surf, H_terrain	0.134	0.129	0.358
	LGB2_a	ε	0.153	0.145	0.379
	LGB2_b	ε, RH_e	0.140	0.141	0.374
	LGB2	ε, RH_e, q_ve, T_e, P_e, LWC, N_d, r_e, LWP, COT, CF, LTS, SST, RH_surf, H_terrain	0.072	0.086	0.292

Note. For entrainment rate (ε), the physical parameterization based on environmental relative humidity (RH_e) is denoted as P1_a, that based on vertical velocity (w) is P1_b, and that based on both RH_e and w is P1_c, where a, b, and c are the fitted parameters. The generalized additive model trained on all features for ε prediction is denoted as P1_GAM. The LightGBM trained on RH_e, w, both RH_e and w, and all features are denoted as LGB1_a, LGB1_b, LGB1_c, and LGB1, respectively. For detrainment rate (δ), the physical parameterizations based on ε are denoted as P2_a and P2_b, while those based on both ε and RH_e are denoted as P2_c, P2_d, P2_e, and P2_f. The generalized additive model trained on all features for δ prediction is denoted as P2_GAM. The LightGBM trained on ε, both ε and RH_e, and all features are denoted as LGB2_a, LGB2_b, and LGB2, respectively.

Table 1 also summarizes the performance evaluation of the δ parameterizations. Compared to the physical approach, the LightGBM trained on all features (LGB2) significantly reduces the MAE, MSE, and RMSE between the predicted and calculated δ, demonstrating its superior predictive accuracy. Similarly, ML for δ prediction was developed using the same parameters as those used in the physical parameterizations. Specifically, the LightGBM trained on ε is labeled as LGB2_a, and the LightGBM trained on ε and RH_e is labeled as LGB2_b. The results show that LGB2_a and LGB2_b yield lower MAE, MSE, and RMSE compared to their respective physical parameterizations. This indicates that ML exhibits superior accuracy in predicting δ compared to the physical approach.

Compared to the physical approach, the ML approach inherently possesses more flexible functional forms, making the comparison structurally unbalanced. To ensure a more balanced comparison, a generalized additive model (GAM) with comparable complexity was applied to predict ε and δ using all 14 input variables. As shown in Table 1, the ML approach yields lower MAE, MSE, and RMSE than the GAM, confirming its superior predictive performance. Based on the evaluation of the prediction performance for ε and δ (Table 1), the parameterizations developed using ML exhibit significantly higher accuracy than those developed using physical approaches. This advantage likely arises from the fact that physical approaches are limited in their ability to account for the combined effects of multiple influencing factors, whereas ML can capture complex nonlinear relationships between variables through data-driven training. Furthermore, Figure S6 in Supporting Information S1 shows that the training and validation loss curves of the abovementioned ML models exhibit convergence without signs of overfitting (Gao et al., 2024; Seifert & Rasp, 2020). These findings offer valuable guidance for the future development of ε and δ parameterizations and support their applicability in broader climate and weather modeling contexts.

4 Concluding Remarks

The current parameterization schemes for entrainment and detrainment rates (ε and δ) are typically based on theoretical derivations, observations from individual sites, or numerical simulations, lacking support from the global observational data set. To fill this gap, this study utilizes a data set of ε and δ derived from global satellite observations for the development and evaluation of parameterizations using both physical and ML approaches, followed by a comparison of the two approaches. The main conclusions are as follows:

First, parameterizations of ε and δ are developed using physical approaches. The ε is recommended to be parameterized using a scheme incorporating RH_e and vertical velocity, while δ is recommended to be parameterized using a scheme incorporating ε and RH_e.

Second, ML-based predictions for ε and δ are developed and interpreted. The ML demonstrates high accuracy in predicting ε and δ. SHAP analysis reveals that, for predicting ε, environmental variables and vertical velocity make significant contributions. For predicting δ, ε contributes the most, while environmental variables and vertical velocity make smaller but still meaningful contributions.

Finally, comparative analysis highlights the significant advantages of ML in predicting ε and δ. Compared to physical approaches, ML achieves lower prediction errors for ε and δ. This advantage primarily stems from the ability of ML to effectively capture complex nonlinear relationships. In future efforts to improve parameterizations for numerical models, ML can be applied to predict variables that cannot be directly obtained from models or observations. Although challenges such as model interpretability and limited sample sizes may arise in practical implementation, the results of this study provide confidence for further application of ML in meteorological research.

In this study, parameterizations for ε and δ are developed based on the only currently available global-scale data set derived from the SNPP satellite. We acknowledge that the parameterizations developed in this study have limited temporal representativeness, as they are based on a data set confined to 3 months in 2017. This temporal constraint may limit their ability to capture the full range of seasonal variability and extreme meteorological regimes. In the future, the spatial and seasonal variability of ε and δ can be further investigated by incorporating more data sets covering the entire year. Meanwhile, the universality and robustness of the proposed parameterizations can be further evaluated against ε and δ data sets derived from other satellite platforms, such as the Terra and Aqua satellites, the Himawari-8/9 satellite, and the Fengyun series satellites.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (42325503, 42475089). The appointment of Chunsong Lu at Nanjing University of Information Science & Technology is partially supported by the Jiangsu Specially-Appointed Professor (R2024T01). Sinan Gao is supported by the National Science Foundation of China (42305091), Guangdong Basic and Applied Basic Research Foundation (2024A1515510021), and Open Grants of the China Meteorological Administration Aerosol-Cloud and Precipitation Key Laboratory (KDW2414). We acknowledge the High Performance Computing Center of Nanjing University of Information Science & Technology for the support of this work.

Conflict of Interest

The authors declare no conflicts of interest relevant to this study.

Open Research

Data Availability Statement

The data set used in this study is available from the Level-1 and Atmosphere Archive and Distribution System (LAADS) Distributed Active Archive Center, https://dx.doi.org/10.5067/VIIRS/CLDPROP_L2_VIIRS_SNPP.011 (Platnick et al., 2017), the National Center for Atmospheric Research (NCAR), https://doi.org/10.5065/D6M043C6 (National Centers for Environmental Prediction, 2000), the NOAA, https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html (Huang et al., 2021). Figures are made with MATLAB version R2024a, available at https://www.mathworks.com/products/matlab.html (MathWorks, 2024).

Supporting Information

Filename	Description
2025GL117775-sup-0001-Supporting Information SI-S01.pdf1.9 MB	Supporting Information S1

References

Abdollahi-Arpanahi, R., Gianola, D., & Peñagaricano, F. (2020). Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genetics Selection Evolution, 52(1), 12. https://doi.org/10.1186/s12711-020-00531-z
10.1186/s12711-020-00531-z
PubMed Web of Science® Google Scholar
Arakawa, A. (2004). The cumulus parameterization problem: Past, present, and future. Journal of Climate, 17(13), 2493–2525. https://doi.org/10.1175/1520-0442(2004)017<2493:Ratcpp>2.0.Co;2
10.1175/1520-0442(2004)017<2493:RATCPP>2.0.CO;2
ADS Web of Science® Google Scholar
Arakawa, A., & Schubert, W. H. (1974). Interaction of a cumulus cloud ensemble with the large-scale environment, part I. Journal of the Atmospheric Sciences, 31(3), 674–701. https://doi.org/10.1175/1520-0469(1974)031<0674:IOACCE>2.0.CO;2
10.1175/1520-0469(1974)031<0674:IOACCE>2.0.CO;2
ADS Web of Science® Google Scholar
Bechtold, P., Köhler, M., Jung, T., Doblas-Reyes, F., Leutbecher, M., Rodwell, M. J., et al. (2008). Advances in simulating atmospheric variability with the ECMWF model: From synoptic to decadal time-scales. Quarterly Journal of the Royal Meteorological Society, 134(634), 1337–1351. https://doi.org/10.1002/qj.289
10.1002/qj.289
ADS Web of Science® Google Scholar
Bechtold, P., Semane, N., Lopez, P., Chaboureau, J.-P., Beljaars, A., & Bormann, N. (2014). Representing equilibrium and nonequilibrium convection in large-scale models. Journal of the Atmospheric Sciences, 71(2), 734–753. https://doi.org/10.1175/JAS-D-13-0163.1
10.1175/JAS-D-13-0163.1
ADS Web of Science® Google Scholar
Böing, S. J., Siebesma, A. P., Korpershoek, J. D., & Jonker, H. J. J. (2012). Detrainment in deep convection. Geophysical Research Letters, 39(20), L20816. https://doi.org/10.1029/2012gl053735
10.1029/2012GL053735
ADS Web of Science® Google Scholar
Bony, S., & Dufresne, J. L. (2005). Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models. Geophysical Research Letters, 32(20), L20806. https://doi.org/10.1029/2005GL023851
10.1029/2005GL023851
ADS Web of Science® Google Scholar
Bretherton, C. S., McCaa, J. R., & Grenier, H. (2004). A new parameterization for shallow cumulus convection and its application to marine subtropical cloud-topped boundary layers. Part I: Description and 1D results. Monthly Weather Review, 132(4), 864–882. https://doi.org/10.1175/1520-0493(2004)132<0864:ANPFSC>2.0.CO;2
10.1175/1520-0493(2004)132<0864:ANPFSC>2.0.CO;2
ADS Web of Science® Google Scholar
Bush, S. J., Turner, A. G., Woolnough, S. J., Martin, G. M., & Klingaman, N. P. (2015). The effect of increased convective entrainment on Asian monsoon biases in the MetUM general circulation model. Quarterly Journal of the Royal Meteorological Society, 141(686), 311–326. https://doi.org/10.1002/qj.2371
10.1002/qj.2371
ADS Web of Science® Google Scholar
Christopoulos, C., Lopez-Gomez, I., Beucler, T., Cohen, Y., Kawczynski, C., Dunbar, O. R. A., & Schneider, T. (2024). Online learning of entrainment closures in a hybrid machine learning parameterization. Journal of Advances in Modeling Earth Systems, 16(11), e2024MS004485. https://doi.org/10.1029/2024MS004485
10.1029/2024MS004485
ADS Web of Science® Google Scholar
Colfescu, I., Christensen, H., & Gagne, D. J. (2024). A machine learning-based approach to quantify ENSO sources of predictability. Geophysical Research Letters, 51(13), e2023GL105194. https://doi.org/10.1029/2023GL105194
10.1029/2023GL105194
ADS Web of Science® Google Scholar
Dawe, J. T., & Austin, P. H. (2013). Direct entrainment and detrainment rate distributions of individual shallow cumulus clouds in an LES. Atmospheric Chemistry and Physics, 13(15), 7795–7811. https://doi.org/10.5194/acp-13-7795-2013
10.5194/acp-13-7795-2013
CAS ADS Web of Science® Google Scholar
Del Genio, A. D., Chen, Y., Kim, D., & Yao, M.-S. (2012). The MJO transition from shallow to deep convection in CloudSat/CALIPSO data and GISS GCM simulations. Journal of Climate, 25(11), 3755–3770. https://doi.org/10.1175/JCLI-D-11-00384.1
10.1175/JCLI-D-11-00384.1
ADS Web of Science® Google Scholar
Derbyshire, S. H., Beau, I., Bechtold, P., Grandpeix, J. Y., Piriou, J. M., Redelsperger, J. L., & Soares, P. M. M. (2004). Sensitivity of moist convection to environmental humidity. Quarterly Journal of the Royal Meteorological Society, 130(604), 3055–3079. https://doi.org/10.1256/qj.03.130
10.1256/qj.03.130
ADS Web of Science® Google Scholar
Derbyshire, S. H., Maidens, A. V., Milton, S. F., Stratton, R. A., & Willett, M. R. (2011). Adaptive detrainment in a convective parametrization. Quarterly Journal of the Royal Meteorological Society, 137(660), 1856–1871. https://doi.org/10.1002/qj.875
10.1002/qj.875
ADS Web of Science® Google Scholar
de Rooy, W. C., Bechtold, P., Fröhlich, K., Hohenegger, C., Jonker, H., Mironov, D., et al. (2013). Entrainment and detrainment in cumulus convection: An overview. Quarterly Journal of the Royal Meteorological Society, 139(670), 1–19. https://doi.org/10.1002/qj.1959
10.1002/qj.1959
ADS Web of Science® Google Scholar
Emanuel, K. A. (1991). A scheme for representing cumulus convection in large-scale models. Journal of the Atmospheric Sciences, 48(21), 2313–2329. https://doi.org/10.1175/1520-0469(1991)048<2313:ASFRCC>2.0.CO;2
10.1175/1520-0469(1991)048<2313:ASFRCC>2.0.CO;2
ADS Web of Science® Google Scholar
Eusebi, R., Su, H., Wu, L., Rong, P., Balaguru, K., Leung, R., et al. (2025). Improving tropical cyclone rapid intensification forecasts with satellite measurements of sea surface salinity and calibrated machine learning. Environmental Research Letters, 20(3), 034010. https://doi.org/10.1088/1748-9326/adac7f
10.1088/1748-9326/adac7f
Web of Science® Google Scholar
Gao, S., Lu, C., Zhu, J., Li, Y., Liu, Y., Zhao, B., et al. (2024). Using machine learning to predict cloud turbulent entrainment-mixing processes. Journal of Advances in Modeling Earth Systems, 16(8), e2024MS004225. https://doi.org/10.1029/2024MS004225
10.1029/2024MS004225
ADS Web of Science® Google Scholar
García, M. V., & Aznarte, J. L. (2020). Shapley additive explanations for NO₂ forecasting. Ecological Informatics, 56, 101039. https://doi.org/10.1016/j.ecoinf.2019.101039
10.1016/j.ecoinf.2019.101039
Web of Science® Google Scholar
Gregory, D. (2001). Estimation of entrainment rate in simple models of convective clouds. Quarterly Journal of the Royal Meteorological Society, 127(571), 53–72. https://doi.org/10.1002/qj.49712757104
10.1002/qj.49712757104
ADS Web of Science® Google Scholar
Gregory, D., & Rowntree, P. (1990). A mass flux convection scheme with representation of cloud ensemble characteristics and stability-dependent closure. Monthly Weather Review, 118(7), 1483–1506. https://doi.org/10.1175/1520-0493(1990)118<1483:AMFCSW>2.0.CO;2
10.1175/1520-0493(1990)118<1483:AMFCSW>2.0.CO;2
ADS Web of Science® Google Scholar
Gu, J.-F., Plant, R. S., Holloway, C. E., Jones, T. R., Stirling, A., Clark, P. A., et al. (2020). Evaluation of the bulk mass flux formulation using large-eddy simulations. Journal of the Atmospheric Sciences, 77(6), 2115–2137. https://doi.org/10.1175/JAS-D-19-0224.1
10.1175/JAS-D-19-0224.1
ADS Web of Science® Google Scholar
Han, J., & Pan, H.-L. (2011). Revision of convection and vertical diffusion schemes in the NCEP global forecast system. Weather and Forecasting, 26(4), 520–533. https://doi.org/10.1175/WAF-D-10-05038.1
10.1175/WAF-D-10-05038.1
ADS Web of Science® Google Scholar
Huang, B., Liu, C., Banzon, V., Freeman, E., Graham, G., Hankins, B., et al. (2021). Improvements of the daily optimum interpolation sea surface temperature (DOISST) version 2.1. Journal of Climate, 34(8), 2923–2939. https://doi.org/10.1175/JCLI-D-20-0166.1
10.1175/JCLI-D-20-0166.1
ADS Web of Science® Google Scholar
Jensen, M. P., & Del Genio, A. D. (2006). Factors limiting convective cloud-top height at the ARM Nauru island climate research facility. Journal of Climate, 19(10), 2105–2117. https://doi.org/10.1175/JCLI3722.1
10.1175/JCLI3722.1
ADS Web of Science® Google Scholar
Jiang, X., Maloney, E., & Su, H. (2020). Large-scale controls of propagation of the Madden-Julian Oscillation. npj Climate and Atmospheric Science, 3(1), 29. https://doi.org/10.1038/s41612-020-00134-x
10.1038/s41612-020-00134-x
Web of Science® Google Scholar
Kain, J. S. (2004). The kain-fritsch convective parameterization: An update. Journal of Applied Meteorology, 43(1), 170–181. https://doi.org/10.1175/1520-0450(2004)043<0170:TKCPAU>2.0.CO;2
10.1175/1520-0450(2004)043<0170:TKCPAU>2.0.CO;2
ADS Web of Science® Google Scholar
Kain, J. S., & Fritsch, J. M. (1990). A one-dimensional entraining/detraining plume model and its application in convective parameterization. Journal of the Atmospheric Sciences, 47(23), 2784–2802. https://doi.org/10.1175/1520-0469(1990)047<2784:AODEPM>2.0.CO;2
10.1175/1520-0469(1990)047<2784:AODEPM>2.0.CO;2
ADS Web of Science® Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.
Google Scholar
Li, J., Lu, C., Chen, J., Li, J., Yang, J., Xu, X., et al. (2025). Development and application of a new convective entrainment rate parameterization for improving precipitation simulation over the Tibetan Plateau and its surrounding areas. Journal of Advances in Modeling Earth Systems, 17(1), e2024MS004543. https://doi.org/10.1029/2024MS004543
10.1029/2024MS004543
ADS Web of Science® Google Scholar
Li, R., Shao, W., Guo, J., Fu, Y., Wang, Y., Liu, G., et al. (2019). A simplified algorithm to estimate latent heating rate using vertical rainfall profiles over the Tibetan Plateau. Journal of Geophysical Research: Atmospheres, 124(2), 942–963. https://doi.org/10.1029/2018JD029297
10.1029/2018JD029297
ADS Web of Science® Google Scholar
Lin, C. (1999). Some bulk properties of cumulus ensembles simulated by a cloud-resolving model. Part II: Entrainment profiles. Journal of the Atmospheric Sciences, 56(21), 3736–3748. https://doi.org/10.1175/1520-0469(1999)056<3736:SBPOCE>2.0.CO;2
10.1175/1520-0469(1999)056<3736:SBPOCE>2.0.CO;2
ADS Web of Science® Google Scholar
Lu, B., & Ren, H.-L. (2016). Improving ENSO periodicity simulation by adjusting cumulus entrainment in BCC_CSMs. Dynamics of Atmospheres and Oceans, 76, 127–140. https://doi.org/10.1016/j.dynatmoce.2016.10.005
10.1016/j.dynatmoce.2016.10.005
ADS Web of Science® Google Scholar
Lu, C., Liu, Y., Zhang, G. J., Wu, X., Endo, S., Cao, L., et al. (2016). Improving parameterization of entrainment rate for shallow convection with aircraft measurements and large-eddy simulation. Journal of the Atmospheric Sciences, 73(2), 761–773. https://doi.org/10.1175/jas-d-15-0050.1
10.1175/JAS-D-15-0050.1
ADS Web of Science® Google Scholar
Lu, C., Sun, C., Liu, Y., Zhang, G. J., Lin, Y., Gao, W., et al. (2018). Observational relationship between entrainment rate and environmental relative humidity and implications for convection parameterization. Geophysical Research Letters, 45(24), 13495–13504. https://doi.org/10.1029/2018gl080264
10.1029/2018GL080264
ADS Web of Science® Google Scholar
Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774. https://doi.org/10.48550/arXiv.1705.07874
10.48550/arXiv.1705.07874
Google Scholar
Luo, Z. J., Liu, G. Y., & Stephens, G. L. (2010). Use of A-Train data to estimate convective buoyancy and entrainment rate. Geophysical Research Letters, 37(9), L09804. https://doi.org/10.1029/2010gl042904
10.1029/2010GL042904
ADS Google Scholar
Ma, W., Qiu, Z., Song, J., Li, J., Cheng, Q., Zhai, J., & Ma, C. (2018). A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta, 248(5), 1307–1318. https://doi.org/10.1007/s00425-018-2976-9
10.1007/s00425-018-2976-9
CAS PubMed Web of Science® Google Scholar
MathWorks. (2024). MATLAB (R2024a) [Software]. MathWorks. Retrieved from https://www.mathworks.com/products/matlab.html
Google Scholar
Mieslinger, T., Horváth, Á., Buehler, S. A., & Sakradzija, M. (2019). The dependence of shallow cumulus macrophysical properties on large-scale meteorology as observed in ASTER imagery. Journal of Geophysical Research: Atmospheres, 124(21), 11477–11505. https://doi.org/10.1029/2019JD030768
10.1029/2019JD030768
ADS Web of Science® Google Scholar
Moorthi, S., & Suarez, M. J. (1992). Relaxed arakawa-schubert. A parameterization of moist convection for general circulation models. Monthly Weather Review, 120(6), 978–1002. https://doi.org/10.1175/1520-0493(1992)120<0978:RASAPO>2.0.CO;2
10.1175/1520-0493(1992)120<0978:RASAPO>2.0.CO;2
ADS Web of Science® Google Scholar
National Centers for Environmental Prediction. (2000). NCEP FNL operational model global tropospheric analyses, continuing from July 1999 (Updated daily) [Dataset]. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. https://doi.org/10.5065/D6M043C6
10.5065/D6M043C6
Google Scholar
Neggers, R. A., Neelin, J. D., & Stevens, B. (2007). Impact mechanisms of shallow cumulus convection on tropical climate dynamics. Journal of Climate, 20(11), 2623–2642. https://doi.org/10.1175/JCLI4079.1
10.1175/JCLI4079.1
ADS Web of Science® Google Scholar
Nie, J. (2013). Probing the dynamics of shallow cumulus convection. Doctoral dissertation thesis. Harvard University.
Google Scholar
Norris, J. R. (1998). Low cloud type over the ocean from surface observations. Part II: Geographical and seasonal variations. Journal of Climate, 11(3), 383–403. https://doi.org/10.1175/1520-0442(1998)011<0383:LCTOTO>2.0.CO;2
10.1175/1520-0442(1998)011<0383:LCTOTO>2.0.CO;2
ADS Web of Science® Google Scholar
Platnick, S., Meyer, K. G., Heidinger, A. K., & Holz, R. (2017). VIIRS atmosphere L2 cloud properties product. Version-1 [Dataset]. NASA Level-1 and Atmosphere Archive & Distribution System (LAADS) Distributed Active Archive Center (DAAC). https://doi.org/10.5067/VIIRS/CLDPROP_L2_VIIRS_SNPP.001
10.5067/VIIRS/CLDPROP_L2_VIIRS_SNPP.001
Google Scholar
Romps, D. M. (2010). A direct measure of entrainment. Journal of the Atmospheric Sciences, 67(6), 1908–1927. https://doi.org/10.1175/2010JAS3371.1
10.1175/2010JAS3371.1
ADS Web of Science® Google Scholar
Sanderson, B., Piani, C., Ingram, W. J., Stone, D. A., & Allen, M. R. (2008). Towards constraining climate sensitivity by linear analysis of feedback patterns in thousands of perturbed-physics GCM simulations. Climate Dynamics, 30(2–3), 175–190. https://doi.org/10.1007/s00382-007-0280-7
10.1007/s00382-007-0280-7
ADS Web of Science® Google Scholar
Seifert, A., & Rasp, S. (2020). Potential and limitations of machine learning for modeling warm-rain cloud microphysical processes. Journal of Advances in Modeling Earth Systems, 12(12), e2020MS002301. https://doi.org/10.1029/2020MS002301
10.1029/2020MS002301
ADS Web of Science® Google Scholar
Shin, J., & Baik, J.-J. (2022). Parameterization of stochastically entraining convection using machine learning technique. Journal of Advances in Modeling Earth Systems, 14(5), e2021MS002817. https://doi.org/10.1029/2021MS002817
10.1029/2021MS002817
ADS Web of Science® Google Scholar
Siebesma, A. (1998). Shallow cumulus convection. In E. J. Plate, E. E. Fedorovich, X. V. Viegas, & J. C. Wyngaard (Eds.), Buoyant convection in geophysical flows (pp. 441–486). Kluwer Academic Publishers. https://doi.org/10.1007/978-94-011-5058-3_19
10.1007/978-94-011-5058-3_19
Web of Science® Google Scholar
Siebesma, A. P., & Cuijpers, J. W. M. (1995). Evaluation of parametric assumptions for shallow cumulus convection. Journal of the Atmospheric Sciences, 52(6), 650–666. https://doi.org/10.1175/1520-0469(1995)052<0650:eopafs>2.0.co;2
10.1175/1520-0469(1995)052<0650:EOPAFS>2.0.CO;2
ADS Web of Science® Google Scholar
Simpson, J. (1971). On cumulus entrainment and one-dimensional models. Journal of the Atmospheric Sciences, 28(3), 449–455. https://doi.org/10.1175/1520-0469(1971)028<0449:OCEAOD>2.0.CO;2
10.1175/1520-0469(1971)028<0449:OCEAOD>2.0.CO;2
ADS Web of Science® Google Scholar
Soares, P., Miranda, P., Siebesma, A., & Teixeira, J. (2004). An eddy-diffusivity/mass-flux parametrization for dry and shallow cumulus convection. Quarterly Journal of the Royal Meteorological Society, 130(604), 3365–3383. https://doi.org/10.1256/qj.03.223
10.1256/qj.03.223
ADS Web of Science® Google Scholar
Squires, P., & Turner, J. S. (1962). An entraining jet model for cumulo-nimbus updraughts. Tellus, 14(4), 422–434. https://doi.org/10.1111/j.2153-3490.1962.tb01355.x
10.1111/j.2153-3490.1962.tb01355.x
ADS Web of Science® Google Scholar
Stanfield, R. E., Su, H., Jiang, J. H., Freitas, S. R., Molod, A. M., Luo, Z. J., et al. (2019). Convective entrainment rates estimated from aura CO and CloudSat/CALIPSO observations and comparison with GEOS-5. Journal of Geophysical Research: Atmospheres, 124(17–18), 9796–9807. https://doi.org/10.1029/2019JD030846
10.1029/2019JD030846
ADS Web of Science® Google Scholar
Stirling, A. J., & Stratton, R. A. (2012). Entrainment processes in the diurnal cycle of deep convection over land. Quarterly Journal of the Royal Meteorological Society, 138(666), 1135–1149. https://doi.org/10.1002/qj.1868
10.1002/qj.1868
ADS Web of Science® Google Scholar
Stratton, R. A., & Stirling, A. J. (2012). Improving the diurnal cycle of convection in GCMs. Quarterly Journal of the Royal Meteorological Society, 138(666), 1121–1134. https://doi.org/10.1002/qj.991
10.1002/qj.991
ADS Web of Science® Google Scholar
Su, H., Wu, L., Jiang, J. H., Pai, R., Liu, A., Zhai, A. J., et al. (2020). Applying satellite observations of tropical cyclone internal structures to rapid intensification forecast with machine learning. Geophysical Research Letters, 47(17), e2020GL089102. https://doi.org/10.1029/2020GL089102
10.1029/2020GL089102
ADS Web of Science® Google Scholar
Szczodrak, M., Austin, P. H., & Krummel, P. (2001). Variability of optical depth and effective radius in marine stratocumulus clouds. Journal of the Atmospheric Sciences, 58(19), 2912–2926. https://doi.org/10.1175/1520-0469(2001)058<2912:VOODAE>2.0.CO;2
10.1175/1520-0469(2001)058<2912:VOODAE>2.0.CO;2
ADS Web of Science® Google Scholar
Takahashi, H., Luo, Z. J., & Stephens, G. (2021). Revisiting the entrainment relationship of convective plumes: A perspective from global observations. Geophysical Research Letters, 48(6), e2020GL092349. https://doi.org/10.1029/2020GL092349
10.1029/2020GL092349
ADS Web of Science® Google Scholar
Tiedtke, M. (1989). A comprehensive mass flux scheme for cumulus parameterization in large-scale models. Monthly Weather Review, 117(8), 1779–1800. https://doi.org/10.1175/1520-0493(1989)117<1779:ACMFSF>2.0.CO;2
10.1175/1520-0493(1989)117<1779:ACMFSF>2.0.CO;2
ADS Web of Science® Google Scholar
Tokioka, T., Yamazaki, K., Kitoh, A., & Ose, T. (1988). The equatorial 30-60 day oscillation and the arakawa-schubert penetrative cumulus parameterization. Journal of the Meteorological Society of Japan. Ser. II, 66(6), 883–901. https://doi.org/10.2151/jmsj1965.66.6_883
10.2151/jmsj1965.66.6_883
ADS Web of Science® Google Scholar
Turner, J. S. (1962). The “starting plume” in neutral surroundings. Journal of Fluid Mechanics, 13(3), 356–368. https://doi.org/10.1017/S0022112062000762
10.1017/S0022112062000762
ADS Web of Science® Google Scholar
Villalba-Pradas, A., & Tapiador, F. J. (2022). Empirical values and assumptions in the convection schemes of numerical models. Geoscientific Model Development, 15(9), 3447–3518. https://doi.org/10.5194/gmd-15-3447-2022
10.5194/gmd-15-3447-2022
ADS Web of Science® Google Scholar
Wang, Y., Zhu, Y., Wang, M., Cao, Y., & Rosenfeld, D. (2023). Robust susceptibility of cloud cover and radiative effects to biases in retrieved droplet concentrations. Journal of Geophysical Research: Atmospheres, 128(22), e2023JD039145. https://doi.org/10.1029/2023jd039145
10.1029/2023JD039145
ADS Web of Science® Google Scholar
Wang, Y., Zhu, Y., Wang, M., Rosenfeld, D., Gao, Y., Yao, X., et al. (2021). Validation of satellite-retrieved CCN based on a cruise campaign over the polluted northwestern Pacific Ocean. Atmospheric Research, 260, 105722. https://doi.org/10.1016/j.atmosres.2021.105722
10.1016/j.atmosres.2021.105722
Web of Science® Google Scholar
Wang, Z. (2020). A method for a direct measure of entrainment and detrainment. Monthly Weather Review, 148(8), 3329–3340. https://doi.org/10.1175/MWR-D-20-0046.1
10.1175/MWR-D-20-0046.1
ADS Web of Science® Google Scholar
Wickramasinghe, A. M. K., Boer, M. M., Cunningham, C. X., Nolan, R. H., Bowman, D. M. J. S., & Williamson, G. J. (2024). Modeling the probability of dry lightning-induced wildfires in Tasmania: A machine learning approach. Geophysical Research Letters, 51(16), e2024GL110381. https://doi.org/10.1029/2024GL110381
10.1029/2024GL110381
ADS Web of Science® Google Scholar
Wu, Z., Zhang, Y., Zhang, L., & Zheng, H. (2023). Interaction of cloud dynamics and microphysics during the rapid intensification of super-typhoon nanmadol (2022) based on multi-satellite observations. Geophysical Research Letters, 50(15), e2023GL104541. https://doi.org/10.1029/2023GL104541
10.1029/2023GL104541
ADS Web of Science® Google Scholar
Xu, X., Sun, C., Lu, C., Liu, Y., Zhang, G. J., & Chen, Q. (2021). Factors affecting entrainment rate in deep convective clouds and parameterizations. Journal of Geophysical Research: Atmospheres, 126(15), e2021JD034881. https://doi.org/10.1029/2021JD034881
10.1029/2021JD034881
ADS Web of Science® Google Scholar
Yan, J., Xu, Y., Cheng, Q., Jiang, S., Wang, Q., Xiao, Y., et al. (2021). LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biology, 22(1), 271. https://doi.org/10.1186/s13059-021-02492-y
10.1186/s13059-021-02492-y
PubMed Web of Science® Google Scholar
Yang, B., Wang, M., Zhang, G. J., Guo, Z., Huang, A., Zhang, Y., & Qian, Y. (2021). Linking deep and shallow convective mass fluxes via an assumed entrainment distribution in CAM5-CLUBB: Parameterization and simulated precipitation variability. Journal of Advances in Modeling Earth Systems, 13(5), e2020MS002357. https://doi.org/10.1029/2020MS002357
10.1029/2020MS002357
ADS Web of Science® Google Scholar
Yeom, J. M., Fahandezh Sadi, H., Anderson, J. C., Yang, F., Cantrell, W., & Shaw, R. A. (2025). Cloud microphysical response to entrainment of dry air containing aerosols. npj Climate and Atmospheric Science, 8(1), 8. https://doi.org/10.1038/s41612-024-00889-7
10.1038/s41612-024-00889-7
Web of Science® Google Scholar
Zhang, G. J., Wu, X., Zeng, X., & Mitovski, T. (2016). Estimation of convective entrainment properties from a cloud-resolving model simulation during TWP-ICE. Climate Dynamics, 47(7–8), 2177–2192. https://doi.org/10.1007/s00382-015-2957-7
10.1007/s00382-015-2957-7
ADS Web of Science® Google Scholar
Zhang, H., Zheng, Y., & Li, Z. (2024). Improving low-cloud fraction prediction through machine learning. Geophysical Research Letters, 51(15), e2024GL109735. https://doi.org/10.1029/2024GL109735
10.1029/2024GL109735
ADS Web of Science® Google Scholar
Zhao, C., Yang, Y., Chi, Y., Sun, Y., Zhao, X., Letu, H., & Xia, Y. (2023). Recent progress in cloud physics and associated radiative effects in China from 2016 to 2022. Atmospheric Research, 293, 106899. https://doi.org/10.1016/j.atmosres.2023.106899
10.1016/j.atmosres.2023.106899
Web of Science® Google Scholar
Zhao, M., Golaz, J.-C., Held, I. M., Guo, H., Balaji, V., Benson, R., et al. (2018a). The GFDL global atmosphere and land model AM4.0/LM4.0: 1. Simulation characteristics with prescribed SSTs. Journal of Advances in Modeling Earth Systems, 10(3), 691–734. https://doi.org/10.1002/2017ms001208
10.1002/2017MS001208
ADS Web of Science® Google Scholar
Zhao, M., Golaz, J.-C., Held, I. M., Guo, H., Balaji, V., Benson, R., et al. (2018b). The GFDL global atmosphere and land model AM4.0/LM4.0: 2. Model description, sensitivity studies, and tuning strategies. Journal of Advances in Modeling Earth Systems, 10(3), 735–769. https://doi.org/10.1002/2017ms001209
10.1002/2017MS001209
ADS Web of Science® Google Scholar
Zhao, M., Held, I. M., & Lin, S.-J. (2012). Some counterintuitive dependencies of tropical cyclone frequency on parameters in a GCM. Journal of the Atmospheric Sciences, 69(7), 2272–2283. https://doi.org/10.1175/JAS-D-11-0238.1
10.1175/JAS-D-11-0238.1
ADS Web of Science® Google Scholar
Zhao, Y., Wang, X., Liu, Y., & Wu, G. (2024a). Are parameterized entrainment rates as scale-dependent as those estimated from cloud resolving model simulations? Geophysical Research Letters, 51(19), e2024GL110735. https://doi.org/10.1029/2024GL110735
10.1029/2024GL110735
ADS Web of Science® Google Scholar
Zhao, Y., Wang, X., Liu, Y., Wu, G., & Liu, Y. (2024b). Shallow convection dataset simulated by three different large eddy models. Advances in Atmospheric Sciences, 41(4), 754–766. https://doi.org/10.1007/s00376-023-3106-6
10.1007/s00376-023-3106-6
ADS Web of Science® Google Scholar
Zheng, Y., & Rosenfeld, D. (2015). Linear relation between convective cloud base height and updrafts and application to satellite retrievals. Geophysical Research Letters, 42(15), 6485–6491. https://doi.org/10.1002/2015GL064809
10.1002/2015GL064809
ADS Web of Science® Google Scholar
Zhu, L., Lu, C., Chen, J., Wang, Y., He, X., Li, J., et al. (2024a). Aircraft observations reveal the relationship between cumulus entrainment rate and aerosol loading. Geophysical Research Letters, 51(19), e2024GL110881. https://doi.org/10.1029/2024GL110881
10.1029/2024GL110881
ADS Web of Science® Google Scholar
Zhu, L., Lu, C., Xu, X., Li, Y., Luo, S., He, X., et al. (2024b). Evaluation of a new approach for entrainment and detrainment rate estimation. Journal of Geophysical Research: Atmospheres, 129(13), e2024JD040789. https://doi.org/10.1029/2024JD040789
10.1029/2024JD040789
ADS Web of Science® Google Scholar
Zhu, L., Lu, C., Yan, S., Liu, Y., Zhang, G. J., Mei, F., et al. (2021). A new approach for simultaneous estimation of entrainment and detrainment rates in non-precipitating shallow cumulus. Geophysical Research Letters, 48(15), e2021GL093817. https://doi.org/10.1029/2021GL093817
10.1029/2021GL093817
ADS Web of Science® Google Scholar
Zhu, L., Wang, Y., Zhu, Y., He, X., Li, J., Wang, Y., et al. (2025). Estimation of entrainment and detrainment rates in cumulus clouds using global satellite observations. Geophysical Research Letters, 52(4), e2024GL113780. https://doi.org/10.1029/2024GL113780
10.1029/2024GL113780
Web of Science® Google Scholar
Zou, L., & Zhou, T. (2011). Sensitivity of a regional ocean-atmosphere coupled model to convection parameterization over Western north Pacific. Journal of Geophysical Research, 116(D18), D18106. https://doi.org/10.1029/2011jd015844
10.1029/2011JD015844
Google Scholar

References From the Supporting Information

Allen, D. M. (1971). Mean square error of prediction as a criterion for selecting variables. Technometrics, 13(3), 469–475. https://doi.org/10.1080/00401706.1971.10488811
10.1080/00401706.1971.10488811
Web of Science® Google Scholar
Cort, J. W., & Kenji, M. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/cr030079
10.3354/cr030079
Web of Science® Google Scholar
David, I. P., & Sukhatme, B. V. (1974). On the bias and mean square error of the ratio estimator. Journal of the American Statistical Association, 69(346), 464–466. https://doi.org/10.1080/01621459.1974.10482975
10.1080/01621459.1974.10482975
Google Scholar
Hancock, G. R., & Freeman, M. J. (2001). Power and sample size for the root mean square error of approximation test of not close fit in structural equation modeling. Educational and Psychological Measurement, 61(5), 741–758. https://doi.org/10.1177/00131640121971491
10.1177/00131640121971491
Web of Science® Google Scholar
Nevitt, J., & Hancock, G. R. (2000). Improving the root mean square error of approximation for nonnormal conditions in structural equation modeling. The Journal of Experimental Education, 68(3), 251–268. https://doi.org/10.1080/00220970009600095
10.1080/00220970009600095
Web of Science® Google Scholar
So, H. C., Chan, Y. T., Ho, K. C., & Chen, Y. (2013). Simple Formulae for Bias and Mean Square Error Computation [DSP Tips and Tricks]. IEEE Signal Processing Magazine, 30(4), 162–165. https://doi.org/10.1109/MSP.2013.2254600
10.1109/MSP.2013.2254600
ADS Google Scholar

Parameterization of Shallow Cumulus Entrainment and Detrainment Rates Using Global Satellite Observations: Physical and Machine Learning Approaches