Innovative approach for estimating evapotranspiration and gross primary productivity by integrating land data assimilation, machine learning, and multi-source observations

The integration of data assimilation (DA) and machine learning (ML) methods helps to incorporate multi-source observations into physical models, enabling more accurate estimation of evapotranspiration (ET) and gross primary productivity (GPP). Therefore, in this study, the ML-based soil moisture (SM) bias-correction model and solar-induced chlorophyll fluorescence (SIF) observation operator are incorporated into the land data assimilation system (LDAS). Thereafter, remotely sensed leaf area index (LAI), SM, land surface temperature (LST), and SIF data are assimilated to improve the performance of the Noah-MP model. The LDAS-ML framework is developed and evaluated in the Heihe River Basin of China. Analytical results suggest that the LDAS-ML system can fully exploit information from remotely sensed LAI, LST, and SIF data, along with multi-source SM observations, to enhance the accuracy of ET and GPP estimations. The root mean square errors (RMSEs) of daily ET (GPP) estimates from LDAS-ML at the Arou, Daman, and Sidaoqiao sites are 27.27 % (59.35 %), 51.71 % (56.28 %), and 61.07 % (53.73 %) lower than those of Noah-MP, respectively. Comparisons of the daily ET and GPP retrievals from the LDAS-ML method with three ET (GLEAM, ET-Monitor, and HiTLL) and GPP (GLASS, GOSIFGPP, and VPM) products indicate that the LDAS-ML method outperforms the remote sensing products, yielding estimates with higher accuracy and lower relative uncertainty. Additionally, in arid and sparsely vegetated areas, the improvements in land surface models are more pronounced from integrating multi-source SM observations than vegetation information. This study suggests that ML methods can effectively exploit multi-source observations to improve the performance of LDAS and provide more accurate estimates of land surface variables.