main.tex


% Template for Elsevier CRC journal article
% version 1.1 dated 16 March 2010

% This file (c) 2010 Elsevier Ltd.  Modifications may be freely made,
% provided the edited file is saved under a different name

% This file contains modifications for Procedia Computer Science
% but may easily be adapted to other journals

% Changes since version 1.0
% - elsarticle class option changed from 1p to 3p (to better reflect CRC layout)

%-----------------------------------------------------------------------------------

%% This template uses the elsarticle.cls document class and the extension package ecrc.sty
%% For full documentation on usage of elsarticle.cls, consult the documentation "elsdoc.pdf"
%% Further resources available at http://www.elsevier.com/latex

%-----------------------------------------------------------------------------------

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%                                          %%
%% Important note on usage                  %%
%% -----------------------                  %%
%% This file must be compiled with PDFLaTeX %%
%% Using standard LaTeX will not work!      %%
%%                                          %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% The '3p' and 'times' class options of elsarticle are used for Elsevier CRC
\documentclass[3p,times]{elsarticle}

%% Sout text (bar)
\usepackage{soul}

%% Hyperlinks
\usepackage{hyperref}

%% The `ecrc' package must be called to make the CRC functionality available
\usepackage{ecrc}

%% The ecrc package defines commands needed for running heads and logos.
%% For running heads, you can set the journal name, the volume, the starting page and the authors

%% set the volume if you know. Otherwise `00'
\volume{00}

%% set the starting page if not 1
\firstpage{1}

%% Give the name of the journal
\journalname{Computers \& Geosciences}

%% Give the author list to appear in the running head
%% Example \runauth{C.V. Radhakrishnan et al.}
\runauth{P. R. Larraondo et al.}

%% The choice of journal logo is determined by the \jid and \jnltitlelogo commands.
%% A user-supplied logo with the name <\jid>logo.pdf will be inserted if present.
%% e.g. if \jid{yspmi} the system will look for a file yspmilogo.pdf
%% Otherwise the content of \jnltitlelogo will be set between horizontal lines as a default logo

%% Give the abbreviation of the Journal.
\jid{procs}

%% Give a short journal name for the dummy logo (if needed)
\jnltitlelogo{Computers \& Geosciences}

%% Hereafter the template follows `elsarticle'.
%% For more details see the existing template files elsarticle-template-harv.tex and elsarticle-template-num.tex.

%% Elsevier CRC generally uses a numbered reference style
%% For this, the conventions of elsarticle-template-num.tex should be followed (included below)
%% If using BibTeX, use the style file elsarticle-num.bst

%% End of ecrc-specific commands
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% The amssymb package provides various useful mathematical symbols
\usepackage{amssymb}
%% The amsthm package provides extended theorem environments
%% \usepackage{amsthm}

%% The lineno packages adds line numbers. Start line numbering with
%% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
%% for the whole article with \linenumbers after \end{frontmatter}.
\usepackage{lineno}

%% natbib.sty is loaded by default. However, natbib options can be
%% provided with \biboptions{...} command. Following options are
%% valid:

%%   round  -  round parentheses are used (default)
%%   square -  square brackets are used   [option]
%%   curly  -  curly braces are used      {option}
%%   angle  -  angle brackets are used    <option>
%%   semicolon  -  multiple citations separated by semi-colon
%%   colon  - same as semicolon, an earlier confusion
%%   comma  -  separated by comma
%%   numbers-  selects numerical citations
%%   super  -  numerical citations as superscripts
%%   sort   -  sorts multiple citations according to order in ref. list
%%   sort&compress   -  like sort, but also compresses numerical citations
%%   compress - compresses without sorting
%%
%% \biboptions{comma,round}

% \biboptions{}

% if you have landscape tables
\usepackage[figuresright]{rotating}

% put your own definitions here:
%   \newcommand{\cZ}{\cal{Z}}
%   \newtheorem{def}{Definition}[section]
%   ...

% add words to TeX's hyphenation exception list
%\hyphenation{author another created financial paper re-commend-ed Post-Script}

% declarations for front matter

\begin{document}

\begin{frontmatter}

%% Title, authors and addresses

%% use the tnoteref command within \title for footnotes;
%% use the tnotetext command for the associated footnote;
%% use the fnref command within \author or \address for footnotes;
%% use the fntext command for the associated footnote;
%% use the corref command within \author for corresponding author footnotes;
%% use the cortext command for the associated footnote;
%% use the ead command for the email address,
%% and the form \ead[url] for the home page:
%%
%% \title{Title\tnoteref{label1}}
%% \tnotetext[label1]{}
%% \author{Name\corref{cor1}\fnref{label2}}
%% \ead{email address}
%% \ead[url]{home page}
%% \fntext[label2]{}
%% \cortext[cor1]{}
%% \address{Address\fnref{label3}}
%% \fntext[label3]{}

\dochead{}
%% Use \dochead if there is an article header, e.g. \dochead{Short communication}

\title{Insights into Precipitation Estimation from Geostationary Satellites using Deep Learning Models}

%% use optional labels to link authors explicitly to addresses:
%% \author[label1,label2]{<author name>}
%% \address[label1]{<address>}
%% \address[label2]{<address>}

\author[1]{Pablo R. Larraondo}
\author[1]{Luigi J. Renzullo}
\author[1]{Albert I. J. M. van Dijk}

\address[1]{Fenner School of Environment and Society. Australian National University. Canberra, Australia}

\begin{abstract}
This work explores the application of convolutional neural networks for estimating precipitation using geostationary satellite images. Using data from Himawari 8 and the IMERG precipitation product over Australia, we train a large number of models to identify the combinations of Himawari 8 bands that result in the most accurate models at estimating precipitation. The proposed resulting models are compared with baseline machine learning and physics based models to provide evidence of their improved ability to estimate precipitation. The results shown in this work provide new insights and challenge common practices in previous machine learning and physics based methodologies in this area.
\end{abstract}

\begin{keyword}
remote sensing \sep machine learning \sep weather forecasting

%% MSC codes here, in the form: \MSC code \sep code
%% or \MSC[2008] code \sep code (2000 is the default)
\end{keyword}

\end{frontmatter}

\footnotetext{P. R. Larraondo carried out the research and wrote the manuscript.}
\footnotetext{L. J. Renzullo contributed designing the experiments and writing the manuscript.}
\footnotetext{A. I. J. M. van Dijk contributed structuring and writing the manuscript.}

%%
%% Start line numbering here if you want
%%
\linenumbers

%% main text
\section{Introduction}
\label{}

Reliable and accurate estimation of surface precipitation over continuous geographical areas remains an active and challenging research topic, with implications for predicting Earth surface processes driven by rainfall, such as water resources generation, flooding and vegetation growth, among others. Models of these processes are very sensitive to precipitation input data and require accurate representations of precipitation in space and time. Different sensors can be used to measure precipitation, with varying levels of accuracy and resolution. Rain gauges remain the most accurate quantitative measurement method for a given location, but the area for which they are representative can be small. Therefore, other sources of information are required to provide adequate spatial detail. Ground-based rainfall radar, satellite radiometers, space-borne radar instruments and numerical weather prediction models constitute nowadays the main sources to estimate precipitation. Each of these sources has its advantages and limitations associated with their inherent reliability, accuracy, coverage and spatiotemporal resolution. Combining or merging different precipitation estimates to generate improved estimations constitutes an important and active area of research in atmospheric science. \citep{gourley2002exploratory,sideris2014real,nerini2015comparative,hasan2016merging}. An important challenge is to identify and incorporate the strengths of each information source through optimal preprocessing, weighting and merging schemes (e.g., \citep{beck2019mswep,beck2020evaluation}). 

Geostationary satellites provide a frequent and reliable source of information describing the state and evolution of the atmosphere over fixed regions of the Earth. Passive geostationary radiometers do not measure precipitation directly; it is indirectly inferred using its relationship with the reflected and emitted signals captured by the sensor. The main advantage of geostationary satellites is their high spatial resolution (0.5-5 km) and frequency (5-15 minutes) compared to other sources.

So far, precipitation retrievals from geostationary satellites have typically been generated using models based on the physics of the atmosphere that relate precipitation to observed radiation in particular spectral bands. Specifically, higher cloud tops from convective cells show colder temperatures in the Infra-Red (IR) channel which are normally related with higher rain rates \citep{scofield1987nesdis,vicente1998operational}. Other bands in the Water Vapour (WV) channels have also been shown to indicate certain types of precipitation. For example, heavy precipitation from deep convective cells can be detected using the difference between IR and WV channels \citep{kurino1997satellite,schmetz1997monitoring}. A disadvantage of these methodologies is the need for comparatively complex methods to tune model parameters for a specific region and precipitation type \citep{aemetsaf2013}. Furthermore, the accuracy of derived precipitation estimates tends to be low specially when evaluated at small scales \citep{arkin1987relationship}.

An alternative approach is to derive a relationship between the observed multi-spectral measurements and the instantaneous precipitation measured using an independent source or methodology. Available methods include machine-learning models that are able to identify and learn the relationship between predictor and precipitation fields. A notable contribution is the PERSIANN-MSA model \citep{behrangi2009persiann}, which has demonstrated the benefit of including other input multi-spectral bands using Artificial Neural Networks (ANNs). More recent studies have built upon the same idea. For example,  \citep{beusch2018satellite} compared the accuracy of generalized linear models and ANNs to estimate precipitation using geostationary satellite data. 

During the last decade, advances in machine learning have spawned a new generation of models based on neural networks that are commonly referred to as Deep Learning \citep{lecun2015deep}. These models have been able to demonstrate a unique capacity to model complex non-linear processes by defining ANNs with a large number of hidden layers. Convolutional Neural Networks (CNNs) are one type of these networks and have been very successful in the field of computer vision. Since their popularisation in the early 2010s, CNNs have achieved unprecedented results in other scientific domains, such as medical imaging \citep{ronneberger2015u} or weather forecasting \citep{larraondo2019data,rasp2020weatherbench}. The advantage of CNN models is their ability to integrate the spatial context that is inherent in image data into the model. Atmospheric processes associated with precipitation usually have well-defined structures. Therefore, the understanding of spatial structure in images beyond the "per-pixel" approach inherent to CNNs could potentially make this type of models very suitable for deriving precipitation from geostationary satellites. 

The application of CNNs for modelling precipitation from geostationary satellites was recently explored in the context of nowcasting \citep{lebedev2019precipitation}. Their findings demonstrate the advantage of using CNNs to incorporate the spatial context by comparing inferior results with those from simpler models trained on individual pixels. Similar CNN models, that introduce temporal dimension, have been explored to enhance precipitation forecasts \citep{sonderby2020metnet}. Unfortunately, neither study provided the models and training data or described them in sufficient detail to be understood or reproduced. However, it appears both studies followed a similar methodological approach, i.e., discretizing the output precipitation estimates into categories, and reducing the problem to a classification task. 
We wished to pursue a different approach by using a similar CNN model to produce quantitative precipitation estimates through a continuous or regression approach. Furthermore, we used publicly available data and published the source code online to ensure reproducibility of the results. Moving beyond the ‘black box’ use of ANNs, we sought to use them to explore the relationship between each spectral band and precipitation, and to identify the combination of bands that provides the most accurate estimates. Specifically, we sought to answer two questions: 
1) Can a CNN and passive geostationary observations be combined to produce precipitation estimates with an accuracy that is commensurate to that of other methods?
2) Which combination of spectral bands produces the most accurate precipitation estimates?


\section{Data and methodology}

This section describes the data sources and preprocessing steps performed to create the datasets and the experimental design followed to train and evaluate the models. In a nutshell, this work explores the use of U-net, a type of CNN model, to learn the relationship between geostationary Himawari 8 spectral bands and estimated precipitation coming from the IMERG satellite product. To evaluate the models we consider three regions in Australia and use the F1 categorical measurement, which combines precision and recall in a single metric. Accuracy of models trained using different combinations of the input Himawari 8 spectral bands is compared for identifying the set of bands that produces the most accurate precipitation estimations. The accuracy of the resulting model is also compared with a popular machine learning methodology and an operational physical model. The rest of this section provides a detailed description of the data and experimental design.

\subsection{Data}

Himawari 8 is a geostationary satellite operated by the Japan Meteorological Agency \citep{bessho2016introduction}. The satellite is positioned over 140.7$^\circ$ E, and is equipped with a 16-band multi-spectral sensor, the Advanced Himawari Imager (AHI), spanning the wavelength range between [0.4-13.4] \textmu m. The spatial resolution of the generated images varies for the spectral bands and ranges between 0.5 and 2 km and there is a new full-disk image available every 10 minutes.

The spectral bands of Himawari 8 are able to sense the reflected and emitted radiations of the Earth. For this work we focus on the emission bands exclusively as our purpose is to develop a methodology that is independent of time-of-day. Specifically, the spectral region of interest is between 6 - 13 \textmu m, which corresponds to Bands 8 to 16 in Himawari 8. These 9 bands provide consistent information about the Earth's surface and atmosphere conditions during day and night. These are the input data we use to train the models for estimating precipitation.

The Integrated Multi-satellitE Retrievals (IMERG) is an algorithm that combines information from the Global Precipitation Measurement (GPM) satellite constellation \citep{huffman2015integrated}. IMERG provides precipitation estimates for the majority of the Earth's surface with 10-km resolution at 30 minute intervals.

There are three versions of the IMERG product [``Early'', ``Late'', ``Final''] which are processed with increasing  latency and accuracy. The ``Final'' product is a merged satellite-gauge product and becomes available with a latency of around 4 months after observation. IMERG ``Final'' uses both forward and backward morphing and includes monthly gauge measurements into the algorithm resulting in the most accurate estimations of precipitation from all IMERG products. We use the ``Final'' version of IMERG used to carry out the experiments in this work.

The 9 emittance bands of Himawari 8 constitute the predictor variables and the calibrated precipitation from IMERG is the predictand in this work.

\subsection{Data preparation}

Himawari 8 and IMERG products are made available from their providers using different spatial and temporal resolutions and map projections. To train our models, we need to transform the data into a common coordinate reference system and resolution, so grid points represent the same locations on both products. A desirable characteristic in scientific applications is to represent geospatial gridded data using coordinate reference systems that preserve area of pixels regardless of their location. This is specially important when using machine learning methods to derive physical parameters such as precipitation. Non-equal area projections would report different amounts of precipitation depending on the area, misleading the learning process of the models. In this work we choose the Australian Albers Equal Area projection (EPSG:3577) as the common projection for both datasets. We use the GDAL library \citep{warmerdam2008geospatial} to reproject the datasets using bilinear interpolation. IMERG is reprojected into 8km side pixels whereas Himawari 8 is reprojected using 2km side pixels. These numbers approximately match their original resolutions and allow for an even training of the CNN model as they are both powers of 2 -- this is better explained in the section that introduces the model.

The IMERG-Final product is generated every 30 minutes aggregating satellite data gathered during half-hour to create an estimate of the precipitation accumulated during each period in millimeters per hour [mm/h]. Himawari 8 is available every 10 minutes. To match the temporal dimensions in both datasets we choose the Himawari 8 image that is closer to the centre of the accumulation period in IMERG. This means using the [10, 40] minutes timestamps of every hour in Himawari 8 to match the [00-30, 30-60] minutes accumulations of IMERG respectively.

For this study we consider three regions of equal area in different parts of Australia and a period of 4 months comprising from the 1st November 2018 until the 28th February 2019. Each region is a square of 1024 km on the side covering the South-Eastern region (SE), Northern Territory (NT) and Western Australia (WA). Figure \ref{regions} represents the position and extent of these three regions in Australia.

\begin{figure}%
    \includegraphics[width=12cm]{regions.png}
    \caption{Geographical location of the three regions used in the experiments.}%
    \label{regions}%
\end{figure}

Each of the regions present different climatic conditions and precipitation patterns. The 4 months period comprising November to March covers the majority of the rainy season, when large convective cells develop in the continent. The resulting Himawari 8 dataset is composed by a set of cubes with 512 points along the northing and easting dimensions and 5760 points along the temporal dimension. We considered 9 bands and 3 regions, which results in 27 cubes of data. In the case of IMERG, equivalent preprocessing steps result in 3 cubes, one for each region, with 128 points along the northing and easting dimensions and the same 5760 points in time.

\begin{figure}%
    \includegraphics[width=14cm]{dataset.png}
    \caption{Sample of Himawari 8 bands and corresponding GPM precipitation (bottom-right) for the SE region.}%
    \label{dataset}%
\end{figure}

Figure \ref{dataset} shows one random sample in the for the SE region dataset representing the Himawari 8 emission bands and the corresponding IMERG precipitation. As an example of the different sensitivity shown by different parts of the spectrum, the precipitation structure at the bottom part of Figure \ref{dataset} is only noticeable for the spectral bands above 8 \textmu m. 

\subsection{Model}

To carry out the experiments in this work we use a type of CNN model called U-net \citep{ronneberger2015u}. U-net is an implementation of CNN encoder-decoder which has demonstrated state-of-the-art capacity to learn complex non-linear mappings between image domains generating models that are able to take into account the spatial information and context in the images. Although U-net was originally purposed to segment medical images, later on has found applications in other domains and tasks, such as image regression. In this work we use U-net to learn a model that is able to map between Himawari 8 emission bands and the corresponding IMERG precipitation estimation. U-net models have been used to estimate precipitation from Numerical Weather Prediction (NWP) gridded geopotential fields \citep{larraondo2019data}. In this work, we use a similar approach applied to higher resolution remotely sensed data for estimating precipitation.

U-nets, and CNN encoder-decoder architectures in general, combine convolution with subsampling (upsampling) operations to compress (decompress) the image space between two domains. In this work we use U-net to learn the relationship between the Himawari 8 emittance domain and the precipitation domain of the IMERG product. Figure \ref{model_cmp} represents the U-net architecture proposed in this work. Himawari 8 images are the input to the network, on the left of the figure, and IMERG precipitation is the target variable or output, represented on the right side. The intermediate, or hidden, layers perform convolutions to first reduce the spatial extent and expand the number of channels in the image (encoder) and then to expand the spatial extent and decrease the number of channels (decoder). The image size is halved in each hidden layer of the encoder section, allowing convolution kernels to double their spatial domain and find relationships between pixels that are progressively further away. The middle part of the network (the so-called bottleneck) contains a compressed latent representation of the relationship between the input and output domains. The direct connections between hidden layers in the encoder and decoder parts, also referred to as skip connections, concatenate the encoder hidden layers into the decoder allowing the network to recover the spatial information that would be otherwise lost during the subsampling process performed by the encoder.

\begin{figure}%
    \includegraphics[width=14cm]{unet.png}
    \caption{CNN U-net architecture transforming Himawari 8 input images into IMERG precipitation estimations.}%
    \label{model_cmp}%
\end{figure}

The U-net model used in this work contains a few modifications with respect to its basic implementation. First of all, images from Himawari 8 and IMERG, resulting from the preprocessing step previously described, have different resolutions and require an asymmetrical version of U-net in which the encoder has two more hidden layers than the decoder, as represented in Figure \ref{model_cmp}. Another modification introduced in the proposed U-net model is the application of a Rectified Linear Unit (RELU) operation in the last layer of the network to eliminate negative values in the output of the network. This operation constraints the model to generate only positive values in accordance with the physical nature of precipitation. Finally, we use the Mean Squared Error (MSE) loss function to train a regression model that generates quantitative precipitation, as opposed to segmentation a kind of classification task that U-net was initially designed for. The model presented in this work is implemented using \citep{chollet2018keras}.

\subsection{Experimental design}

To evaluate the performance of the proposed model we propose two types of experiments. First we use the previously described U-net model to perform a variable selection process iterating over the input Himawari 8 bands. The objective is to identify the bands that are more related to precipitation in order of importance. The second part builds upon these results and performs a comparison between the accuracy the proposed model with other baseline machine learning and physics based methodologies.

The variable selection identifies the Himawari 8 bands that result in the most accurate models at estimating precipitation, as measured by IMERG. To do this, we train identical models using different combinations of the input bands as input and the corresponding IMERG precipitation as output and compare their accuracy. Accuracy is measured using categorical statistics using the 0.2 mm/h precipitation threshold to ensure an independent evaluation metric from MSE, used to train the values. We use the F1 score which is combines recall (aka sensitivity, hit rate or probability of detection) and precision (aka positive prediction value or success ratio) into one single metric.

$$
F1 = 2 * \frac{Precision * Recall}{Precision + Recall}
$$

To ensure fairness in the evaluation of the models, we propose using a 4-fold cross-validation, leaving out one month of data each time for testing and training with the remaining three months. Validating on complete months, as opposed to shuffling the entire dataset, is particularly important in this case because of the high correlation between temporally close samples. The models are trained using a form of stochastic gradient descent, that shuffles and splits in batches the training dataset. This form of training is not deterministic and results in models that can show slightly different results for the same input. To minimise the influence of particular input configurations, we propose to train five versions of each model using different seeds to randomise the order of the input data. As a result, we train 60 models to evaluate each combination of input bands, which is the product of evaluating 3 geographical locations, 4 cross-validation experiments and 5 random number seeds.

The number of combinations of the input bands grows quickly as the number of elements increases. If the number of input bands is one, there are 9 possible models, one for each Himawari 8 input band. For the case of two input bands there are 36 possible combinations and this number increases to 84 when we consider three input bands. Given that evaluating each set of input bands would require training 60 models, and the high computational cost of training each of these models, an exhaustive search through all the possible combinations of input bands results infeasible. Alternatively, we propose a methodology for identifying the best combination of input bands incrementally, selecting and adding one band to the set in each iteration. This method starts by identifying the single band that generates the most accurate model. In a second iteration, a new set of models are trained using the previously identified band adding each of the remaining bands, one at a time. The process continues adding new bands to the set until there is no improvement in the accuracy of the resulting models. This process assumes that the relative improvement in accuracy shown by individual bands remains consistent when new bands are added to the set. We acknowledge the drawbacks of this non-exhaustive methodology, discussed in section 4, but we benefit from this simplification to make the search process feasible and to identify the set on input Himawari 8 bands that result in the most accurate models. Given the large number of models and the size of the data, we train the U-net models in parallel using the CSIRO Bracewell HPC cluster equipped with NVIDIA Tesla P100 GPU accelerators. 

To understand the relative improvement in accuracy of U-net models, we perform a comparison with one operational physics based model and an alternative machine learning methodology. First, we compare the output of the resulting model with Convective Rain Rate (CRR) \citep{aemetsaf2013}. CRR is a product originally developed by the Spanish national weather agency (AEMet) to generate convective precipitation retrievals from the Meteosat geostationary satellite. This methodology was recently adapted by the Australian Bureau of Meteorology to work with Himawari 8, using the IR band 8 and the (IR-WV) difference bands 8 and 13. The comparison also includes a regression Random Forest (RF) model \citep{breiman2001random} trained on the same data but on a per pixel basis. This should provide an indication of the relative importance of the spatial context in CNN models.


\section{Results}

\subsection{Himawari 8 band selection process}

Following the process described in the previous section, we start the input variable selection process by training a set of models using each of the Himwari 8 bands as input to the models. Evaluating 9 input bands requires training 540 models -- 60 models per input combination. The total training time of this experiment exceeds 1000 hours on the GPU cluster. Figure \ref{experiment_1a} represents a) recall scores b) precision scores and c) F1 scores summarising the previous two scores for the three considered regions.

The results presented in Figure \ref{experiment_1a} combine the categorical statistics of individual models trained with each band. Each box plot in the figure is generated by aggregating 20 scores -- 4 cross-validation process using 5 random number seeds. The box plots represent the statistical distribution of the accuracy results for each band and location using the 0.2 mm/h threshold. Higher F1 values imply better accuracy of the models

\begin{figure}%
    \centering
    \subfloat[Precision scores]{{\includegraphics[width=8cm]{figure1a.png} }}%
    \quad
    \subfloat[Recall scores]{{\includegraphics[width=8cm]{figure1b.png} }}%
    % Leave one line blank for new row
    
    \subfloat[F1 scores]{{\includegraphics[width=14cm]{figure1c.png} }}%
    \caption{Comparison between (a) precision, (b) recall and (c) F1 scores for each of the three regions evaluated using a precipitation threshold of 0.2 mm/h}%
    \label{experiment_1a}%
\end{figure}


To facilitate the interpretation of the results we combine the F1 scores of the three regions considered. Due to the relative difference in accuracy in each region, the F1 scores are normalised before the aggregation, dividing by the F1 score of the median value of best performing band for each location, band 11 [8.44–8.76] \textmu m, in the previous figure. Figure \ref{experiment_1b} represents the normalised F1 scores aggregating the three locations. This figure shows a pattern with higher F1 scores for the bands at the centre of spectrum and a decrease in performance towards the sides. Bands 11 [8.44–8.76] \textmu m and 13 [10.3–10.6] \textmu m show as the best performing combination of bands. We choose band 11 as the single best band to continue the variable selection process although we acknowledge that the small statistical difference between the scores of bands 11 and 13. Band 13 remains as a candidate to be picked up in the next iteration of the variable selection process.

\begin{figure}%
    \includegraphics[width=14cm]{figure2.png}
    \caption{Comparison between the combined and normalised F1 scores [0.2 mm/h] for the models trained with each Himawari 8 band.}%
    \label{experiment_1b}%
\end{figure}

We continue the experiment performing a similar iteration with two Himawari 8 bands as input to the models this time. As explained in the previous section, we limit the variable search space by fixing band 11, based on our previous results, and adding the rest of the bands one at a time. Similarly to the first iteration, we train another batch of 520 models to test the accuracy of the models.

\begin{figure}%
    \includegraphics[width=14cm]{figure3.png}
    \caption{Comparison between the combined and normalised F1 scores [0.2 mm/h] for the models resulting from fixing band 11 and iterating through the rest of the bands.}%
    \label{experiment_2}%
\end{figure}

Figure \ref{experiment_2} represents the validation results for the models trained using two Himawari 8 input bands and normalised using the same median values used in the previous Figure \ref{experiment_1b}. Overall, we observe an increase in the scores compared with the previous figure -- the horizontal gray dotted line represents the best score in Figure \ref{experiment_1b} as reference. This means that adding new bands provides useful information to the models improving their precipitation estimates. 

Figure \ref{experiment_2} indicates that models identify more valuable information in the bands that are further away in the spectrum to band 11. For example, band 8 [6.06–6.43] \textmu m, on the left-hand side, and band 16 [13.2–13.4] \textmu m, on the right, seem to better complement the information provided by band 11 [8.44–8.76] \textmu m than closer bands. Using this plot we identify the combination of bands 11 and 16 as the best performing based on their F1 scores.

The next iteration of models trained with three input bands results in a new batch of 500 models. These models use bands 11 and 16, and iterate through the remaining bands one at a time. Figure \ref{experiment_3} represents the F1 scores of the models trained using the different combinations of three input bands -- the horizontal gray dotted line represents the best score in the previous experiment considering two bands. The F1 scores for this iteration show similar or lower values compared to the previous one. This serves as an indication that adding an extra bands to the combination of bands 11 and 16 does not improve the accuracy of the considered models.

\begin{figure}%
    \includegraphics[width=14cm]{figure4.png}
    \caption{Comparison between the combined and normalised F1 scores [0.2 mm/h] for the models resulting from fixing Bands 11 \& 16 and iterating through the rest of the bands.}%
    \label{experiment_3}%
\end{figure}

\subsection{Methodology comparison}

To perform this comparison we train the RF models using a similar cross-validation methodology used for the proposed CNN models. Bands 8, 11, 13 and 16 are used as inputs to match the information used by CRR and the CNN models. The precipitation estimations generated by the resulting models and CRR are then compared using the F1 score at different thresholds. These scores are aggregated for the different locations and cross validations. To perform the comparison we use the mean of the ensemble estimations for each pixel.

Figure \ref{model_cmp} offers a visual comparison between the F1 scores obtained for the different models. The CNN model performs significantly better than the other methodologies in the comparison. This is justified by the larger capacity of the model but also its ability to incorporate the spatial context. This is specially noticeable at the higher threshold values, where the superior accuracy of the CNN model is likely to be related to the model's understanding of the size and structure of convective cells.

\begin{figure}%
    \includegraphics[width=8cm]{model_cmp.png}
    \caption{Comparison between F1 scores between machine learning methods at different rainfall thresholds for the SE region}%
    \label{model_cmp}%
\end{figure}


\section{Discussion}

To motivate the discussion in this section we start presenting a visual comparison between some of the validation outputs of the models previously trained together with the corresponding reference IMERG precipitation. Figure \ref{output_cmp} compares the output of three U-net models trained using band 8 [6.06–6.43] \textmu m, band 11 [8.44–8.76] \textmu m and the combination of bands 11 and 16 [8.44–8.76], [13.2–13.4] \textmu m showing one sample for each region. The column on the left, corresponds to the model trained using band 8 [6.06–6.43] \textmu m, which was identified as the one with the lowest performance in the first iteration of our experiment. The precipitation generated using this model is able to identify the main areas of precipitation but largely underestimates the amount and spatial extent of rain. The second column, corresponds to the outputs of the models trained using band 11 [8.44–8.76] \textmu m, identified as the best single performing band. These models offer a clear improvement in both the amount and extent of estimated precipitation compared to the previous one. Finally, the third column represents the combination of bands 11 and 16 [8.44–8.76], [13.2–13.4] \textmu m. These models present similar extents to the ones coming from band 11, but significantly improve the detection of intense precipitation areas.

\begin{figure}%
    \includegraphics[width=14cm]{output_cmp.png}
	\caption{Visual comparison between the rain estimated by different models trained with band 8, band 11 and bands 11 \& 16. The image on the right represents the reference IMERG precipitation}%
    \label{output_cmp}%
\end{figure}

As shown in the results section, adding more than two Himawari 8 bands to train precipitation models does not lead to more accurate precipitation estimations. This result challenges the practice of using all the bands to train models seen in several works in the reviewed literature. Given the high data volume and computational cost of training CNN models, we think that this is a step forward in delivering computationally efficient and accurate precipitation models. In addition, the bands that we identify in this work challenge the traditional practice of using specific IR and WV bands, which opens the door to explore the physical interpretations of the results.

In general, the precipitation models trained with Himawari 8 are not able to reproduce the sharp contours and detailed spatial structure of precipitation systems detected by IMERG. In the presence of clouds, geostationary satellites sense thermal emissions coming mainly from the top part of the clouds. The relationship between emission bands from cloud tops and precipitation seems to be partial and not enough to estimate accurate precipitation considering these results and methodologies described.

Similarly to the previous figure, Figure \ref{baseline_cmp} compares the output of U-net with baseline machine learning and physically based models. The column on the right represents the output of the RF models trained using 4 Himawari 8 bands. Although this simple models are able to detect precipitation areas slightly overestimating their extent, they lack structure and the capacity to identify the areas with high rates of precipitation. The physics based model CRR, represented on the second column from the right, shows a large inconsistency at identifying the intensity of precipitation cells. CRR uses a series of coefficients to estimate precipitation. The coefficients used to generate the output in this study have been produced by the Australian Bureau of Meteorology fitting the CRR model with Himawari 8 data and the local precipitation conditions in Australia. The results in this work seem to indicate that these coefficients might need to be fitted considering regional and type of precipitation considerations. The difference in performance between the study areas (WA, NT, SE) might suggest a need for local model fitting, which can readily be done with NN as opposed to the more traditional calibrated physical modelling approaches. U-net shows an overall improvement at estimating both the structure and quantity of precipitation compared to the baseline models. This figure also provides a visual proof of the capacity of U-nets to incorporate the spatial structure in precipitation compared to other methodologies that operate on individual pixels.

\begin{figure}%
    \includegraphics[width=14cm]{baseline_cmp.png}
	\caption{Visual comparison between the output generated by U-net and baseline models for different precipitation situations.}%
    \label{baseline_cmp}%
\end{figure}

While we focused on geostationary data, we know that there is a fundamental limitation of the approach given uncertainty in cloud-top temperature association with rainfall rate in terms of quantity and location. The method we propose can readily incorporate additional independent data, such as inputs from numerical weather models \citep{larraondo2019data}.

We also bring up into the discussion the metrics and methodologies used to train and validate high-resolution, gridded precipitation data. Metrics such as MSE suffer from the double penalty effect penalising models that detect the right amounts of precipitation but fail to place it in the right location \citep{anthes1983regional,mass2002does}. This is specially relevant when combining data from different sensors where pixels are not necessarily acquired at the same time -- this is the case in this work. The proposed U-net models are trained using the MSE loss function which penalise differences on a pixel per pixel basis. Other loss functions based on optimal transport metrics that account for spatial displacements, such as the Wasserstein distance, have been proposed in the literature \citep{farchi2016using} and might be better suited for this problem. This a promising area and something we want to explore in the future.

Another avenue for improving the results presented in this work is to use models that can incorporate the temporal dimension in the data. The atmosphere and precipitation evolves through time following well-defined physical laws. From a purely data-driven consideration, successive images show high correlations which add valuable information to a model for generating estimates that are accurate and consistent through time. There are several approaches that allow incorporating the temporal dimension into neural networks, such as recurrent architectures or 3-dimensional convolutions. The application of these and other architectures and their capacity to incorporate the temporal dimension into the model is something that remains to be studied. 

\section{Conclusion}  %% \conclusions[modified heading if necessary]

This work explores the application of U-net, a deep learning CNN, to estimate precipitation using Himawari 8 retrievals. We propose using the IMERG ``Final'' precipitation data to train and evaluate the models at estimating precipitation. The models are evaluated considering three locations in Australia and a period of four months between 2018 and 2019.

Traditional methodologies for rainfall estimation from geostationary satellites are mainly based on capturing cloud physical processes characterized by the differential absorption of IR radiation from water vapor and the atmospheric window bands. This work provides a purely data-driven method for assessing these relationships and perhaps suggest alternative groups of bands. U-nets, and CNNs in general, are shown to be able of capturing spatial relationships between pixels that are relevant for identifying cloud structures and precipitation patterns.

In summary, we demonstrate that U-nets offer an advantage at estimating quantitative precipitation from geostationary satellite data when compared to other machine learning and physics based methodologies.


\section{Code availability}

The code used to run the experiments presented in this manuscript are available at: \href{https://github.com/prl900/unet_him8_gpm}{https://github.com/prl900/unet\_him8\_gpm} under an Apache License 2.0.

\section{Acknowledgements}

The authors acknowledge the financial support given by the Earth Systems Science Organization, Ministry of Earth Science, Government of India (Grant No.IITM/MM-II/ANU/2018/INT-8) to conduct this research under the Monsoon Mission.

%% The Appendices part is started with the command \appendix;
%% appendix sections are then done as normal sections
%% \appendix

%% \section{}
%% \label{}

%% References
%%
%% Following citation commands can be used in the body text:
%% Usage of \cite is as follows:
%%   \cite{key}         ==>>  [#]
%%   \cite[chap. 2]{key} ==>> [#, chap. 2]
%%

%% References with BibTeX database:

\bibliographystyle{elsarticle-num}
\bibliography{references.bib}

%% Authors are advised to use a BibTeX database file for their reference list.
%% The provided style file elsarticle-num.bst formats references in the required Procedia style

%% For references without a BibTeX database:

% \begin{thebibliography}{00}

%% \bibitem must have the following form:
%%   \bibitem{key}...
%%

% \bibitem{}

% \end{thebibliography}

\end{document}

%%
%% End of file `ecrc-template.tex'.