Refine Extreme Hot Day Predictions With the Sea Surface Temperature Tendency

https://doi.org/10.1029/2025GL116339
2025-09-17
Geophysical Research Letters . Volume 52 , issue 18
Hui Tan, Zhiwei Zhu, Fenghua Ling, Bin Wang

Abstract

The extreme high temperature in western North America (WNA) exerts profound impacts on industrial and agricultural production, and trigger catastrophic wildfires. Exploring the underlying mechanisms influencing extreme hot days over WNA (WEHDs) and improving the seasonal prediction are of great scientific and social significance. This study reveals that two independent precursor signals, the persistent negative sea surface temperature (SST) anomalies in tropical eastern Pacific and the cooling tendency in tropical North Atlantic SST during springtime exhibit significant influence on WEHDs. A physics-based empirical model constructed using these two predictors exhibits robust independent prediction skills. Guided by the underlying physical mechanisms, we integrate SST tendency fields as critical input features into convolutional neural network (CNN) to further enhance the prediction accuracy. The physically informed CNN achieves significantly improved performance and successfully predicts the extreme WEHD events of 2021. The results emphasize the pivotal role of physical cognition in advancing deep learning-based climate prediction.

Plain Language Summary

Western North America stands as one of the most vulnerable regions globally to extreme high temperature, rendering the accurate seasonal prediction for such events a matter of critical practical significance. While deep learning has demonstrated considerable success in weather forecast and climate prediction, its application to extreme events prediction remains relatively underexplored. Furthermore, the integration of physical understanding of extreme events with deep learning to improve prediction skills warrants deeper investigation. This study identifies that, aside from the traditional persistent sea surface temperature (SST) precursor, the tendency of the SST independently influences summer extreme hot days in western North America (WNA). By incorporating SST tendency fields as additional input fields in convolutional neural network achieves superior independent skill in prediction of the seasonal mean WNA extreme hot days. The results offer a novel perspective and a methodological guidance for operational agencies and the climate-hydrology research community in leveraging deep learning for seasonal climate prediction.

Key Points

  • The sea surface temperature (SST) cooling tendency over the north tropical Atlantic contributes to the summer hot days in western North America (WNA)

  • Integrating SST tendency as critical input features refined the convolutional neural network (CNN) prediction skills of the hot days in WNA

  • The improved prediction skill of CNN relies on the successfully capturing physically meaningful predictor of Atlantic SST tendency

1 Introduction

North America has emerged as one of the regions severely impacted by extreme heatwaves (Grotjahn et al., 2016). For example, the unprecedented western North America (WNA) heatwave during June-July 2021 has caused nearly 1,400 deaths (Lin et al., 2022; X. Zhang et al., 2023), widespread infrastructure damage, and economic losses amounting to worth several billion USD (Lucarini et al., 2023). Under global warming, this region is projected to experience more frequent, prolonged, and intense extreme heat events (Bartusek et al., 2022; Thompson et al., 2022). Concurrent with the increasing frequency of extreme heatwaves, North America has witnessed a marked escalation in the frequency and intensity of wildfires, affecting millions of acres of land and leading to extensive property destruction, ecological degradation, deteriorated air quality, and severe haze pollution (White et al., 2023). Therefore, understanding the physical processes controlling the variability of summer extreme hot days (EHDs) in North America and improving its seasonal prediction are of profound scientific significance and practical implications.

At the seasonal timescale, a tropospheric high-pressure (anticyclonic) anomaly leads to the formation of a heat dome, which triggers heatwaves through clear-sky radiative forcing (Pfahl & Wernli, 2012), subsidence-induced adiabatic heating (Nabizadeh et al., 2021) or low-level anomalously warm horizontal advection (Miralles et al., 2014; Woollings et al., 2018). The formation of a high-pressure anomaly is physically linked to large-scale teleconnection patterns and remote underlying boundary forcing. For example, anomalous convection over the tropical western Pacific and Maritime Continent region (Jong et al., 2020; Luo & Lau, 2020), the Indian Ocean (Song et al., 2024; B. Wang et al., 2022), along with East Asian summer monsoon (Lopez et al., 2019; Zhu & Li, 2016, 2018), can excite cross-Pacific Rossby wave trains in different shapes, ultimately generating a high-pressure anomaly over North America. The sea surface temperature (SST) anomalies (SSTA) in the tropical Atlantic can induce meridional circulation anomalies that produce anomalous subsidence over North America, reducing local cloud cover and enhancing downward shortwave radiation, thereby favoring heatwaves (Lopez et al., 2022; Ruprich-Robert et al., 2018). The reduced Arctic sea ice can induce a Rossby wave train propagating southward, leading to the high-pressure (anticyclonic) anomaly over North America (Neal et al., 2022; H. Wang et al., 2022). Additionally, the occurrence of EHDs over North America are also modulated by circum-global teleconnection (Luo & Lau, 2020), Arctic Oscillation (Loikith & Broccoli, 2014), Atlantic Multidecadal Oscillation (Ruprich-Robert et al., 2018), Pacific Decadal Oscillation (Hulley et al., 2020), and anthropogenic aerosol forcing (Bercos-Hickey et al., 2022).

The physics-based empirical models (PEM) leveraging these mechanisms have demonstrated moderate success in seasonal prediction (B. Wang et al., 2013, 2015). For instance, Wei et al. (2023) effectively predicted autumn temperature in central North America by incorporating spring western Pacific SST and preceding winter ENSO indices. However, PEM exhibits certain limitations. When the climate system undergoes significant interdecadal changes, previously identified predictors may no longer impact the predictand, leading to substantial degradation in prediction skills (Li et al., 2023; B. Wang et al., 2015). Furthermore, as PEM relys on linear combinations of predictors, the prediction skills significantly diminishes when nonlinear processes start to play roles (Camps-Valls et al., 2025; Y. Wang et al., 2023).

In recent years, deep learning approaches present a promising alternative, with convolutional neural network (CNN) demonstrating particular success in capturing spatial-temporal patterns for climate prediction (Ham et al., 2019; Hwang et al., 2019). Unlike PEMs, these deep learning models autonomously extract nonlinear predictive signals from raw input fields without requiring explicit physical assumptions. A notable example is the application of deep learning models in predicting the Indian Ocean Dipole (IOD) index, where the model autonomously focuses on distinct regional information during different IOD phases, consequently achieving good prediction skills (Ling et al., 2022). However, the relatively limited observational data restricts the generalization ability of deep learning in the ever-changing climate system (Oh & Ham, 2024; Shen et al., 2024), and their “black box” nature obscures physical interpretability—a critical barrier for improvements in prediction skills and scientific insight (M. Chen et al., 2024).

Consequently, the integration of physical understanding with deep learning approaches to genuinely improve climate prediction emerges as a critical frontier in scientific research. Therefore, this study addresses the following pivotal questions: What are the physical processes affecting summer EHDs in WNA? What prediction skills can be achieved by the a PEM and a classical CNN architecture? How to improve the prediction skills of CNN based on physical cognition?

2 Data and Methods

2.1 Data

The data applied in this study includes: The daily maximum temperature data from the Climate Prediction Center (CPC) global temperature data set with a horizontal resolution of 0.5° × 0.5°; Monthly SST data from the National Oceanic and Atmospheric Administration’s (NOAA) Extended Reconstructed SST data set version 5 (ERSST.v5) with a horizontal resolution of 2° × 2° (Huang et al., 2017a); Monthly precipitation data from NOAA's Precipitation Reconstruction (PREC) with a horizontal resolution of 2.5° × 2.5° (M. Chen et al., 2002); Monthly sea-ice concentration data from the Hadley Center Sea Ice and SST data set (HadISST) with a horizontal resolution of 1° × 1° (Rayner et al., 2003); Monthly soil moisture data from ERA5 reanalysis data sets with a horizontal resolution of 1° × 1° (Hersbach et al., 2020); monthly atmospheric variables, including zonal and meridional winds, geopotential height, 2-m air temperature (T2M), and sea level pressure (SLP) from the National Centers for Environmental Prediction/National Center for Atmospheric Research Reanalysis-1 (NCEP/NCAR) with a horizontal resolution of 2.5° × 2.5° (Kalnay et al., 1996); The historical simulations from Coupled Model Intercomparison Project Phase 6 (CMIP6) (Eyring et al., 2016) (see Table S1 in Supporting Information S1 for detail information). The studying period is 1979–2023.

2.2 Definition of EHDs

In this study, a EHD is defined as daily maximum temperature exceeding 35°C. Clearly, the considerable climatological EHDs and year-by-year standard deviation of EHDs are primarily observed from June to August (Figure S1 in Supporting Information S1). Therefore, we will focus on the summer season (JJA) in the following study. While maximum centers of summer EHDs appear in both western and southern North America (Figure 1a), the correlation map between the maximum EHDs point and the EHDs in each grid shows that the coherent variation of EHDs are mainly confined to WNA (Figure 1b). Therefore, a local index for EHDs over WNA (WEHDs) is defined as the area-mean EHDs over 30°–50°N and 125°–105°W.

Details are in the caption following the image

The definition, time series of the extreme hot days over western North America (WEHDs) index and the simultaneous associated anomalous fields. (a) The distribution of climatological mean summer EHDs (shading, units: days month−1) over North America. (b) The correlation coefficient between the year-to-years variations of EHDs in the maximum climatological EHDs point (34.5°N, 116°W) and in each grid. (c) The time series of original and detrend WEHDs. The dashed gray line is the trend of the WEHDs. (d) The regressed 200 hPa geopotential height (shading; units: gpm) and wind (black vectors; units: m s−1), and the associated wave activity flux (purple vectors; units: m2 s−2) onto the standardized detrend WEHDs. Panels (e) and (f) are same as (d) but for the sea surface temperature and 2-m air temperature (shading; units: °C), 500 hPa geopotential height (contours; units: gpm) and wind (black vectors; units: m s−1), and the precipitation (shading; units: mm d−1), 850 hPa geopotential height (contours; units: gpm) and wind (black vectors; units: m s−1). Values that are significant at 90% confidence level by a two-tailed Student's t-test are dotted.

2.3 Wave Activity Flux

The phase-independent wave activity flux (WAF) is calculated based on the following formula (Takaya & Nakamura, 2001):
W = 1 2 | U | u ψ x 2 ψ ψ x x + v ψ x ψ y ψ ψ x y u ψ x ψ y ψ ψ x y + v ψ y 2 ψ ψ y y , $W=\frac{1}{2\vert \overline{U}\vert }\left[\begin{array}{@{}c@{}}\overline{u}\left({\psi }_{x}^{\prime 2}-{\psi }^{\prime }{\psi }_{xx}^{\prime }\right)+\overline{v}\left({\psi }_{x}^{\prime }{\psi }_{y}^{\prime }-{\psi }^{\prime }{\psi }_{xy}^{\prime }\right)\\ \overline{u}\left({\psi }_{x}^{\prime }{\psi }_{y}^{\prime }-{\psi }^{\prime }{\psi }_{xy}^{\prime }\right)+\overline{v}\left({\psi }_{y}^{\prime 2}-{\psi }^{\prime }{\psi }_{yy}^{\prime }\right)\end{array}\right],$
where an overbar and a prime represent the climatological mean and anomaly, respectively; ψ and U = (u, v) represent the stream function and the horizontal wind, respectively; and W denotes the two-dimensional Rossby WAF.

2.4 Physics-Based Empirical Model

The PEM is established based on understanding of the physical linkage between the predictors and the predictand (B. Wang et al., 2013, 2015) using the data sets from 1979 to 2013. It is established using the physically meaningful and independent predictors, and an independent forecast is made for the remaining 10 years from 2014 to 2023. The detailed description is shown in Text S1 in Supporting Information S1.

Four verification metrics are used to evaluate the prediction skills: The temporal correlation coefficient (TCC), the root mean squared error (RMSE), the mean square skill score (MSSS) and the same sign rate (SSR). The detailed calculation of these metrics is shown in Text S2 in Supporting Information S1.

2.5 CNN Model and Explainability

Following the same methodology of Ham et al. (2019), we construct a CNN model to predict the WEHDs index. The SST, T2M, and SLP anomalies from January to May over the region north of the 30°S are used as the input layer. The CNN model has three convolutional layers and three max-pooling layers, and the last pooling layer is linked to a fully-connected layer and the final output. We also apply the transfer learning, the CNN model is first trained using the CMIP6 data, and then the trained weights are used as initial weights to train the final CNN model with the reanalysis (Ham et al., 2019). A dropout layer with a rate of 0.2 is added on the fully-connected layer not only during training, but also during the testing phase (Oh & Ham, 2024; Srivastava et al., 2014). Additionally, we employ weight decay in the Adam optimizer as a L2 regularization technique to further mitigate overfitting (Kingma & Ba, 2014). To ensure the robustness of the model predictions, we trained the CNN model with 12 different seeds, representing different ensemble members. Notably, to ensure an equitable comparison with the PEM, training and independent prediction periods are strictly maintained for the CNN model (detailed data division is shown in Table S2 in Supporting Information S1).

Gradient-weighted Class Activation Mapping (Grad-CAM) is used to visualize the contribution of input signals for output in CNN model (Selvaraju et al., 2020). The key regions that contribute to the prediction can be identified through heat maps. More detailed descriptions of model hyperparameter setting and Grad-CAM calculation are shown in Text S3 in Supporting Information S1.

2.6 Model Experiments

To justify the physical connections between the selected predictors and WEHDs in PEM, a linear baroclinic model (LBM) (Watanabe & Kimoto, 2000) is employed. This model is run at a horizontal resolution of triangular truncation of T21 with 20 sigma levels in the vertical. The 1981–2010 climatological JJA mean of the NCEP/NCAR reanalysis is used as a realistic mean state in the model. LBM has been employed to investigate the atmospheric circulation responses to diabatic heating or vorticity forcing. The integration of the model span 30 days, and the averaged output in the last 10 days represents the equilibrium state of atmospheric responses.

3 Results

3.1 Simultaneous Anomalous Fields Associated With WEHDs

The WEHDs index exhibits a significant increasing trend of 0.06 days year−1 (Figure 1c). The detrended WEHDs index demonstrates pronounced interannual variability. Notably, the WEHDs in 2021 remains an extreme outlier in both the original and detrended indices, indicating that the record-breaking EHDs was influenced not only by the global warming but also by internal climate variability.

To reveal the physical mechanisms driving the variations of WEHDs, the simultaneous circulation and SST fields are regressed onto the detrended WEHDs index. Over the WNA, a quasi-barotropic anomalous high-pressure system, acting as a heat dome, is the dominant circulation pattern. This feature enhances local temperatures and increasing the frequency of EHDs. Tracing the origins of this anomalous high-pressure system, the associated upper-level WAF shows a cross-Pacific Rossby wave train (Figure 1d), consistent with the negative phase of the Asia-North America teleconnection pattern (Zhu & Li, 2016, 2018). Simultaneously, negative SSTA appear over the tropical eastern Pacific (TPS), while positive SSTA emerge over the western Pacific (Figure 1e). The SSTA pattern corresponds to the enhanced precipitation over the western North Pacific and suppressed precipitation over the subtropical East Asia, presenting a pattern opposite to that of the enhanced East Asian Meiyu front (Figure 1f). Then, a critical question arises: What precursor signals could potentially induce these simultaneous circulation anomalies?

3.2 Selection of the Predictors and Their Physical Processes

Based on the stepwise regression and the physical cognition validated by the numerical simulations, two precursors are selected as predictors for enhanced WEHDs from 44 potential precursors (Figures S2 and S3 in Supporting Information S1), including the persistent negative SSTA in the TPS during April-May (Figure 2a) and the cooling tendency of SST over the tropical North Atlantic (CSA) from March to May (Figure 2b). While other precursors are excluded either because of their dependency with the selected predictors or their unexplainable linkage with WEHDs, the two selected predictors are independent with each other and represent different physical processes toward the WEHDs (Table S3 in Supporting Information S1).

Details are in the caption following the image

The selected predictors and establishment of P-E models. Correlation between detrend extreme hot days over western North America (WEHDs) and (a) April-May mean sea surface temperature (SST), (b) May minus March SST. Values that are significant at 90% confidence level by a two-tailed Student's t-test are dotted. The red boxes denote the domains used for the definition of each predictor: TPS (10°S–10°N, 120°W–80°W), CSA (7°N–34°N, 70°W–25°W). (c) Time series of observed (black line), simulated (blue line), cross-validated reforecast (green line), and independently predicted (red line) detrend WEHDs. The temporal correlation coefficient, root mean squared error, mean square skill score and same sign rate during training period (blue for simulated, and green for cross-validated reforecast) and independent prediction period (red) are showed. Panel (d) as in (c), but for original WEHDs.

The persistent negative SSTA in the TPS during April and May for the first predictor TPS persist into the following summer (Figure 3b), which leads to negative diabatic heating anomalies in the central and western equatorial Pacific (Figure 3c). As a Gill-type response to the negative diabetic heating anomalies, an upper level cyclonic anomaly appears to the southeast of Japan (Figure 3a). The easterlies over the northern flank of the cyclonic anomaly may further perturb the East Asian subtropical westerly jet and leading to a downstream quasi-barotropic Rossby wave train (Figure 3a) with an anticyclonic anomaly over the WNA and enhanced WEHDs (Qian et al., 2022; W. Zhang et al., 2020).

Details are in the caption following the image

The physical processes of each predictor impacts the extreme hot days over western North America (WEHDs). Regressions of the JJA mean (a) geopotential height (shading; units: gpm) and wave activity flux (vectors; units: m2 s−2) at 200 hPa, (b) sea surface temperature and 2-m air temperature (shading; units: °C), geopotential height (contours; units: gpm) and wind (vectors; units: m s−1) at 500 hPa, (c) precipitation (shading; units: mm d−1), geopotential height (contours; units: gpm) and wind (vectors; units: m s−1) at 850 hPa onto the predictor TPS. (d–f) Similar with (a–c) but onto the predictor CSA. Values that are significant at 90% confidence level by a two-tailed Student's t-test are dotted.

The aforementioned physical mechanism can be well reproduced by the LBM. Whether the numerical experiment is conducted with the TPS-related negative diabatic heating (Figure S4 in Supporting Information S1), or with the positive vorticity forcing associated with the cyclonic anomaly over the subtropical western Pacific (Figure S5 in Supporting Information S1), the steady atmospheric responses in two simulations both show a downstream cross-Pacific Rossby wave train and an anticyclonic anomaly over WNA, consistent with the observation.

The cooling tendency of SST over the tropical North Atlantic from March to May for the second predictor CSA is related to a quasi-barotropic anticyclonic anomaly over the mid-latitude Atlantic (Figures 3d–3f). The northerly wind anomalies along the eastern flank of the anomalous anticyclone tend to enhance climatological northerly winds in the tropical North Atlantic, thereby facilitating the local SST cooling via the wind-evaporation-SST feedback mechanism (Figures 3e and 3f). This is also confirmed by the heat budget analysis on the mixed layer ocean temperature (Jin et al., 2003), suggesting that the net surface heat flux plays a dominant role in driving the negative tendency of the mixed layer temperature in the tropical North Atlantic (Figure S6 in Supporting Information S1). Therefore, CSA reflects the maintenance and intensification of the mid-latitude Atlantic anticyclone anomaly coupling with SSTA. In the following summer, the negative SSTA over the tropical North Atlantic induces easterly wind anomalies to its west, resulting in the anticyclonic wind shear over the WNA. In addition, the Atlantic anticyclonic anomaly can act as a Rossby wave source to excite a downstream quasi-barotropic Rossby wave train, leading to a cyclone anomaly over the western Pacific. The easterlies along the northern flank of the anomalous cyclone (Figure 3f) further perturb a downstream cross-Pacific Rossby wave train, with an anticyclone anomaly over the WNA (Figures 3d–3f).

This physical mechanism can be also successfully reproduced by a two-step process in LBM. First, when the LBM is forced by prescribed negative vorticity over the mid-latitude Atlantic, which represents the CSA's related anticyclonic anomaly (Figures S7a–S7c in Supporting Information S1). The steady atmospheric response shows a quasi-barotropic Rossby wave train from the Atlantic across Eurasian continent to the western Pacific, with an cyclonic wind shear over the western Pacific. However, because the LBM cannot simulate localized air-sea coupling phenomena, an additional LBM experiment is forced by prescribed positive vorticity induced by the cyclonic anomaly over the western Pacific (Figures S7d–S7f in Supporting Information S1). The positive vorticity forcing perturbs the westerly jet and induce a downstream quasi-barotropic Rossby wave train with an anticyclonic anomaly over the WNA, similar with the observation.

Based on these two physically meaningful predictors, the PEM is conducted using regressions in the training period of 1979–2013 with the simulation equation of WEHDs = 0.22 × TPS + 0.35 × CSA. In the training period, the PEM obtained a TCC of 0.65 (p < 0.01), RMSE of 0.53, MSSS of 0.42 and SSR of 71%, while they are 0.49, 0.47, 0.16, and 80% in independent prediction period (Figure 2c). Besides, incorporating the trend component of the WEHDs further improves the prediction skill during both the training and independent prediction periods (Figure 2d).

However, while the PEM effectively captures the primary physical mechanisms influencing WEHDs, it struggles to capture the extreme value of the 2021 EHDs over WNA. Such extreme records are often driven by intricate and nonlinear feedbacks that are not fully encapsulated within the linear regression (LR) framework of the PEM (Bartusek et al., 2022). Therefore, to better account for these nonlinear dynamics and improve the prediction of extreme events, we explore how to improve the prediction framework using CNN.

3.3 Prediction of WEHDs With Convolutional Neural Network

We construct a CNN model similar to Ham et al. (2019) to predict the WEHDs index. The SST, T2M, and SLP anomalies from January to May over the region north of the 30°S are used as the input layer. Note that both the input and output data retain their original trends, which enables the CNN to autonomously identify and incorporate global warming signals in its predictions.

While the CNN model demonstrates nearly perfect prediction skill during the training period (Figure 4a), its independent prediction performance with TCC of 0.48, RMSE of 0.54 and MSSS of 0.14 falls short of the PEM. Meanwhile, the CNN exhibits limited prediction performance for the values in 2014 and 2017–2019, and crucially, fails to capture the extreme record in 2021.

Details are in the caption following the image

The convolutional neural network (CNN) prediction, explain ability and architecture. (a) Time series of observed (black line), CNN simulated (solid blue line) and independently forecasted (solid red line) extreme hot days over western North America (WEHDs). The shaded areas indicate the CNN ensemble spread. The temporal correlation coefficient, root mean squared error, mean square skill score, and same sign rate during training period (blue) and independent prediction period (red) are showed. (b) Grad-CAM map during training period. Only the values statistically significant at 10% level are displayed. (c) Architecture of the CNN model used for predicting WEHDs. The monthly fields are three variables (sea surface temperature (SST), 2-m air temperature, sea level pressure (SLP)) from January to May. The tendency fields include May minus April, May minus March, April-May minus December-January SST/2-m air temperature and SLP. The total input channel of input layer is 24. Panels (d)–(e) as in (a)–(b), but for CNN_AT model which use both monthly mean and tendency fields as input fields.

To understand why the prediction skill of CNN model in independent period is limited, we produce the composite heatmap for the training period (Figure 4b). The heatmap quantifies the contributions of the variables at each grid to the predictand, and a larger value denotes a greater contribution to the WEHDs. It shows that large values mainly appear over the eastern Pacific and along the western North American coast, indicating that the CNN model assigns greater importance to the variables in these regions. This may imply that the model may have recognized the physically meaningful predictor TPS. However, the absence of large values over the Atlantic suggests that the CNN model may fails to capture the other physically meaningful predictor CSA.

Constrained by limited model parameters and observational data, the current CNN model struggles to identify critical signals and achieve robust prediction skills. Then, the understanding of the physical processes provides important insights for feature engineering within existing frameworks. Because the CNN missed the ocean surface tendency predictor CSA, we directly integrate SST tendency fields as critical input features into the CNN, guided by the underlying physical mechanisms of WEHDs.

3.4 Improve the CNN Prediction From Physical Cognition

Based on the understanding of the physical processes influencing WEHDs, both the monthly mean fields and tendency fields are used as input fields to train the CNN model (Figure 4c). This refined CNN model that adds SST tendency fields (CNN_AT) shows significant improvement in independent prediction skill (Figure 4d, and Table S4 in Supporting Information S1) in both TCC (from 0.48 to 0.80) and MSSS (from 0.14 to 0.33). Additionally, the CNN_AT model exhibits encouraging performance for the extreme WEHDs record in both 2021 and 2018. The difference in performance is significant and consistent with different random seeds (Figures S8 and S9 in Supporting Information S1). Among the 12 CNN ensemble members, several members can perfectly predict these extreme events (Figures S9i and 9l in Supporting Information S1).

The heatmap of the newly developed CNN_AT shows large values in both the eastern Pacific, Atlantic and the Eurasian Continent (Figure 4e), suggesting that the model may capture both the physically meaningful persistent signal in tropical Pacific (TPS), and tendency signal in Atlantic (CSA) and its downstream Rossby wave trains over Eurasian Continent, thereby contributing to the substantial improvement in prediction skill of the CNN_AT compared to the original CNN.

One may doubt that the improved performance and the appearance of Atlantic signals in heatmap are purely because of more input data, not the physical process understanding. To clarify this issue, an ablation experiment is conducted with using monthly fields and noise fields as input fields. It shows a decrease of prediction skill and no large values appear over Atlantic and Eurasian Continent in heatmap (Figure S10 in Supporting Information S1). The results confirm the physical information of the SST tendency fields improves the prediction skill of the CNN model.

4 Conclusions

The present study reveals two independent precursor signals influencing WEHDs, each representing different physical processes. The first predictor TPS represents the persistent negative SSTA in the TPS. It leads to the negative diabatic heating anomalies and further stimulates a cyclonic anomaly over the western Pacific. The cyclonic anomaly perturbs the East Asian subtropical westerly jet, leading to a downstream quasi-barotropic Rossby wave train with an anticyclonic anomaly over the WNA. The second predictor CSA reflects the cooling tendency of the SSTA over the tropical North Atlantic, which is related to an anomalous anticyclone over the Atlantic. It acts as a Rossby wave source, exciting a quasi-barotropic Rossby wave train that leads to a high-level anomalous cyclone over the western Pacific, which further perturbs the East Asian jet stream and a downstream cross-Pacific Rossby wave train toward WNA.

A CNN model similar to Ham et al. (2019) is constructed, which utilizes monthly mean fields from January to May as input fields to predict WEHDs. However, it shows limited prediction skill. Then, based on the understanding of the physical processes impacting WEHDs, we further incorporate the tendency fields as additional input fields. The newly developed CNN_AT model shows significantly improved prediction skills and successfully predicts of the 2021 extreme WEHDs. While both CNN and CNN_AT exhibit nearly perfect prediction skills during training, the CNN dropped its skills during independent period. Conversely, CNN_AT has significantly better skills in the independent prediction period, which overcomes the overfitting problem as shown in CNN. This comparison highlights that incorporating physically meaningful SST tendency signals effectively alleviates overfitting and enhances independent prediction accuracy. These findings highlight the importance of integrating physical understanding into machine learning-based seasonal prediction frameworks by feature engineering to enrich the physical information of the input data.

To further confirm the importance of the SST tendency predictors not only applicable to CNN, we also use the LR and two widely used tree-based models, Random Forest (RF, Breiman, 2001) and Extreme Gradient Boosting (XGBoost, T. Chen & Guestrin, 2016) to predict WEHDs. For the linear regression model, although the incorporation of tendency signals did not result in a significant improvement in the prediction skills, it still exhibited improved capability in capturing the interannual variability of WEHDs (Figures S11a and S11b in Supporting Information S1), suggesting the importance of the Atlantic Ocean tendency predictor. For the tree-based models, we use Bayesian Optimization (Shahriari et al., 2015) to find optimal hyperparameters for each ML method. It shows that adding the Atlantic Ocean tendency predictor could significantly improve the prediction skills in the independent prediction period. For RF, the TCC skill improves from 0.32 to 0.61 (Figures S11c and S11d in Supporting Information S1). The XGBoost exhibited much greater improvement with the TCC skill increasing from 0.27 to 0.72 (Figures S11e and S11f in Supporting Information S1). In sum, the tendency signals are universally important for other types of machine learning models because they include the key physical processes for the formation of WEHDs.

It is noted that the heatmap of the CNN_AT also shows large values over Alaska and the Indian Ocean (Figure 4e). This raises critical questions: why does the CNN_AT focus on these regions? Are they related to the Arctic signals or Eurasian teleconnection? Do these regions reflect genuine physical processes or merely serve as statistical artifacts for model fitting? These questions merit further investigation to elucidate the implication of the heatmap in these identified regions.

In this study, a straightforward three-layer CNN model was utilized for the prediction of EHDs. However, the potential improvement in predictive performance and ability to capture physically meaningful signals through more complex deep learning architectures, such as Residual Neural Networks and Long Short-Term Memory Networks, warrants further investigation.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (42175033 & 42375045). We acknowledge the High-Performance Computing Center at Nanjing University of Information Science & Technology for their computing support.

    Conflict of Interest

    The authors declare no conflicts of interest relevant to this study.

    Data Availability Statement

    The CPC data are from https://psl.noaa.gov/data/gridded/data.cpc.globaltemp.html. ERSST v5 data and ERA5 reanalysis are respectively available at Huang et al. (2017a, 2017b), Hersbach et al. (2020, 2023). NOAA's Precipitation Reconstruction (PREC) data is from M. Chen et al. (2002) at https://psl.noaa.gov/data/gridded/data.prec.html. HadISST data is from Rayner et al. (2003) obtained from https://www.metoffice.gov.uk/hadobs/hadisst/data/download.html. NCEP/NCAR Reanalysis is from Kalnay et al. (1996) at https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html. The CMIP6 models used in this study are listed in Table S1 in Supporting Information S1 and are available at Eyring et al. (2016) (https://esgf-node.ipsl.upmc.fr/projects/cmip6-ipsl/).