author: Chen Yutong, Huang Gang, Wang Ya, et al.
作者:陈昱同,黄刚,汪亚,等.
keywords: bias correction, deep learning, data imbalance, extreme rainfall
关键词:偏差校正 深度学习 数据不均衡 极端降水
Abstract
Weather forecasting has been playing an important role in socio-economics. However, operational numerical weather prediction (NWP) is insufficiently accurate in terms of precipitation forecasting, especially for heavy rainfalls. Previous works on NWP bias correction utilizing deep learning (DL) methods mostly focused on a local region, and the China-wide precipitation forecast correction had not been attempted. Meanwhile, earlier studies imposed no particular focus on strong rainfalls despite their severe catastrophic impacts. In this study, we propose a DL model called weighted U-Net (WU-Net) that incorporates sample weights for various precipitation events to improve the forecasts of intensive precipitation in China. It is found that WU-Net can further improve the forecasting skill of heaviest rainfall comparing with the ordinary U-Net and ECMWF-IFS. Further analysis shows that this improvement increases with growing lead time, and distributes mainly in the eastern parts of China. This study suggests that a DL model considering the imbalance of the meteorological data could further improve the precipitation forecasting generated by numerical weather prediction.
天气预报在社会经济生活中扮演者重要的角色。然而,数值天气预报(NWP)业务对降水、特别是强降水的预报还不够准确。过去利用深度学习(DL)对NWP进行校正的工作大多聚焦于局地,还没有校正整个中国范围内降水整体预报的尝试。同时,前人的研究也没有着重处理能引起严重灾害的强降水。本研究利用U-net算法,通过在训练时赋予降水事件不同的权重,构建了一个样本加权U-net(WU-net),以期改善对中国范围内强降水的预报。研究发现,相比于NWP乃至一般的U-net,WU-net能够进一步提升对强降水的预报技巧。后续分析亦显示此改善随着预报提前时间的增长而持续提高。从空间分布上看,改善主要集中在中国东部。这些发现表明,预置了样本不均衡信息的DL模型有能力改善NWP的降水预报。https://doi.org/10.3389/fenvs.2023.1116672
Point 1. Data imbalance
The quantity of precipitation has a long-tailed distribution. Too few extreme samples will mislead DL, as a statistical method, to neglecting them. As a result, DL usually performs poorly in prediction heavy rainfall.
不同降水量的发生频率呈长尾分布——极端大的降水事件极少发生,因而在整个数据集中只有很少的样本。因此,作为一种具有统计性质的方法,深度学习往往会忽略这些极端降水的信息,导致强降水预报效果较差。
Point 2. U-net
U-net is a popular DL architecture in computer vision (CV), first proposed for biomedical segmentation. It gets its name from its U-shaped architecture. The down-branch functions as an encoder to extract the features from the input data. The up-branch is opposite, as a decoder, restoring the low-dimension features to prognostic fields. The skip connections tranmit features of different levels from the down-branch to the up-branch. In this study, a U-net was built to capture the features of input meteorologic data and produce weather predictions.
U-net是计算机视觉(CV)领域常用的一种深度学习架构,最早是为解决医学图像分割问题而提出的。U-net得名于它特殊的U型结构。其中,下行支作为编码器,提取输入数据的特征;上行支则作为解码器,把低维特征还原为预报场。在两者之间存在跳连,把不同层次的特征从编码器传递给解码器。本研究即建立了一个U-net模型,用以捕捉输入的气象数据的特征,并以此作出预报。
Point 3. Sample weight
As mentioned in Point 1, data imbalance will aggravate the performance of DL on heavy precipitation. Therefore, the precipitation was graded by grid and assigned different weights according to:
$$ w_{i} = \frac{S}{n \times s_{i}} \tag{3.1}\label{3.1} $$where $ S $ is the number of all the sample grid, $ n $ is the number of grades and $ s_{i} $ is the number of the sample gird of grade i. Grades with fewer occurrence would get larger weights, making the DL model learned more sufficiently about them. These weights was actually implemented in the loss function:
$$ Loss(x) = -w_{arg y_{j} = 1} \sum_{j = 1}^{n} y_{i} log P_{j}(x) \tag{3.2}\label{3.2} $$among which $ j $ represents each dimension of the probability vector (the output of the softmax layer), and $ w_{arg y_{j} = 1} $ means the weight values $ w_{j} $ when the ground truth of the sample grid is in grade j. The U-net trained with this loss function is referred as the weighted U-Net (WU-Net).
如Point 1中所述,样本不均衡使得深度学习模型难以较好地预测强降水事件。为了解决这一问题,我们把降水量按大小划分等级,并按式 $ \eqref{3.1} $ 赋予权重。式中 $ S $ 是样本总量(格点数),$ n $ 是总的等级数,$ s_{i} $ 是等级为i的样本总量。样本总量越少的等级,得到的权重越大,在训练时也将被更充分地“学习”。该权重实际是在损失函数$ \eqref{3.2} $中起作用的。式中 $ j $ 表示softmax层输出的概率向量的各维,$ w_{arg y_{j} = 1} $ 说明当样本点的实际降水等级为j时,权重取 $ w_{j} $。用这一加权的损失函数训练的U-net,我们称之为WU-net。Results
Both the U-net and the WU-net improved the NWP. Through sample weight modification, WU-Net outperformed U-Net in terms of heavy precipitation. This superiority maintained with lead-time growing, and concentrated mainly in eastern China.
U-net和WU-net均能改善NWP的降水预报,而做了样本权重修正的WU-net在强降水的预报上又优于一般的U-net。在较长的提前时间下,WU-net仍能保持其优势。从空间分布上看,WU-net对强降水预报的改善主要集中在中国东部。

