2m Temperature Forecast by Deep Learning

Version 2.0

Gong, Bing; Langguth, Michael; Ji, Yan; Mozaffari, Amirpasha; Stadtler, Scarlet; Mache, Karim; Schultz, Martin G., 2022, "2m Temperature Forecast by Deep Learning", https://doi.org/10.26165/JUELICH-DATA/X5HPXP, Jülich DATA, V2

Learn about Data Citation Standards.

Dataset Metrics

1,162 Downloads

Description	This repository provides the preprocessed datasets, which are used in the study Temperature forecasting by deep learning methods by Gong et al. (2022). This allows the user to reproduce the presented results without running the preprocessing chain from the raw ERA5 data. Data description The datasets used to train, validate, and test the deep neural networks are based on the ERA5 reanalysis data provided by the European Centre for Medium-range Weather Forecast (ECMWF). Five different datasets have been created. All incorporate data between the years 2007 and 2019, but cover slightly varying domains over Central Europe and include different meteorological variables. The datasets are made available in compressed tar-archives (see Storage Location URL below). The file names thereby encapsulate some meta-information using the following naming convention: `ERA5-Y[yyyy]-[yyyy]M[mm]to[mm]-[nx]x[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3]` where - `Y[yyyy]-[yyyy]M[mm]to[mm]` denotes the years and the months describing the data period, - `[nx]x[ny]` is the number of grid points/pixels of the target domain in longitude and latitude direction, - `[nn.nn]N[ee.ee]E` stands for the geographical coordinates in degree of the target domain's south-west corner and - `[var1]_[var2]_[var3]` denote the short names of the variables according to ECMWF's parameter database In particular, the following datasets are provided: 1) `era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t850.tar.bz2`: The target domain extends from 38.4°N to 54.9°N and 0.0°E to 27.3°E (92x56 grid points). The 2m-temperature (2t), the total cloud cover (tcc), and the 850 hPa temperature (t_850) are included as variables. This data corresponds to Datasets ID 1-3 in table A1 of the manuscript. 2) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t_tcc_t850.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t, tcc, and the t_850 are included as variables. This data corresponds to Dataset ID 4 in table A1 of the manuscript. 3) `era5-Y2007-2019M01to12-72x44-4020N0300E-2t_tcc_t_850.tar.bz2`: The target domain extends from 40.2°N to 53.1°N and 3.0°E to 24.3°E (72x44 grid points). The 2t, tcc, and t_850 are included as variables. This data corresponds to Dataset ID 5 in table A1 of the manuscript. 4) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t_t850.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t and the t_850 are the only variables included. This data set is actually a subset of No. 2. This data corresponds to Dataset ID 6 in table A1 of the manuscript. 5) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). 2t is exclusively included. This data set is also a subset of No. 2. This data corresponds to Dataset ID 7 in table A1 of the manuscript. Data creation The original ERA5 data can be retrieved from the ( MARS archive). Once access is granted, data can be downloaded by specifying a resolution of 0.3° in the retrieval script. The datasets provided in this repository are the processed ERA5 data after the extraction and the two preprocessing steps using the Atmospheric Machine learning Benchmarking System (AMBS) workflow tool (more details are provided in the README of the corresponding code repository). The data is available in TFRecords format that is used directly in the training step. Data access and decompression Data are stored in the archived and compressed format `tar.bz2` and available via: https://datapub.fz-juelich.de/esde/esde-nfs/online_publication/2mT_by_DL/. After downloading, the compressed archives can be unpacked on Linux using `tar xjf [filename].tar.bz2`. On Windows, decompressing can be performed using WinZip. Dataset content After decompressing, the following subdirectory structure is created from each compressed tar-archive: - `tfrecords_seq_len_[sequence_length]`: This folder holds the TFRecords files that are streamed to the deep neural networks during training and postprocessing. Each TFRecord file contains 10 samples, where each sample comprises a sequence over `[sequence_length]` hours. - `pickle`: This folder contains the normalized hourly data saved in monthly pickle files ( `X_[month].pkl`). The corresponding timestamps are included in `T_[month].pkl`. Furthermore, statistical information for each month is provided in the files `stat_[month].json`. - `metadata.json`: This file provides important meta information including the coordinates of the target domain, the included variables (e.g. 2t and t_850) and the origin of the processed data. - `statsitic.json`: This file includes the statistical information (maximum, minimum, and average values) used for normalizing the data. It also includes other information such as the total number of the timestamps ( `nfiles`) and the list of JSON files ( `stat_[month].json`) to compute the statistics. Data integrity and verification The tar-archives have been recursively checksummed with the `md5` hash function. The generated file is uploaded to ensure the integrity of the files and no alteration to the dataset. To verify the integrity of the downloaded data, use the following snippet: `find -type f -exec md5sum '{}' \; > md5sum.txt` It will generate a single text file that should be identical to the file in this entry. License Original data by ECMWF Copyright " © 2022 European Centre for Medium-Range Weather Forecasts (ECMWF)". Source www.ecmwf.int. This data is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. https://creativecommons.org/licenses/by/4.0/ Contact Bing Gong ( b.gong@fz-juelich.de)
Subject	Earth and Environmental Sciences
Related Publication	Gong, B., Langguth, M., Ji, Y., Mozaffari, A., Stadtler, S., Mache, K., and Schultz, M. G.: Temperature forecasting by deep learning methods, Geosci. Model Dev. Discuss. [preprint], doi:10.5194/gmd-2021-430, in review, 2022.
Institute	JSC
Storage Location URL	https://datapub.fz-juelich.de/esde/esde-nfs/online_publication/2mT_by_DL/

	1 File
	md5sum.txt Plain Text - 471 B - Aug 31, 2022 - 1,162 Downloads SHA-256: 63f2e461ba77a72bb4c47e4dc600c02015209f175e6849f18b0fad3f7cb6a859 md5 hash values for data stored on datapub	Download

Citation Metadata

Dataset Persistent ID	doi:10.26165/JUELICH-DATA/X5HPXP
Publication Date	2022-07-26
Title	2m Temperature Forecast by Deep Learning
Subtitle	Central Europe datasets for 2m Temperature forecast by Deep Learning for the period of 2007 -2019
Author	Gong, Bing (FZJ/JSC) Langguth, Michael (FZJ/JSC) Ji, Yan (FZJ/JSC) Mozaffari, Amirpasha (FZJ/JSC) Stadtler, Scarlet (FZJ/JSC) Mache, Karim (FZJ/JSC) Schultz, Martin G. (FZJ/JSC)
Contact	Use email button above to contact. Mozaffari, Amirpasha (FZJ/JSC)
Description	This repository provides the preprocessed datasets, which are used in the study Temperature forecasting by deep learning methods by Gong et al. (2022). This allows the user to reproduce the presented results without running the preprocessing chain from the raw ERA5 data. Data description The datasets used to train, validate, and test the deep neural networks are based on the ERA5 reanalysis data provided by the European Centre for Medium-range Weather Forecast (ECMWF). Five different datasets have been created. All incorporate data between the years 2007 and 2019, but cover slightly varying domains over Central Europe and include different meteorological variables. The datasets are made available in compressed tar-archives (see Storage Location URL below). The file names thereby encapsulate some meta-information using the following naming convention: `ERA5-Y[yyyy]-[yyyy]M[mm]to[mm]-[nx]x[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3]` where - `Y[yyyy]-[yyyy]M[mm]to[mm]` denotes the years and the months describing the data period, - `[nx]x[ny]` is the number of grid points/pixels of the target domain in longitude and latitude direction, - `[nn.nn]N[ee.ee]E` stands for the geographical coordinates in degree of the target domain's south-west corner and - `[var1]_[var2]_[var3]` denote the short names of the variables according to ECMWF's parameter database In particular, the following datasets are provided: 1) `era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t850.tar.bz2`: The target domain extends from 38.4°N to 54.9°N and 0.0°E to 27.3°E (92x56 grid points). The 2m-temperature (2t), the total cloud cover (tcc), and the 850 hPa temperature (t_850) are included as variables. This data corresponds to Datasets ID 1-3 in table A1 of the manuscript. 2) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t_tcc_t850.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t, tcc, and the t_850 are included as variables. This data corresponds to Dataset ID 4 in table A1 of the manuscript. 3) `era5-Y2007-2019M01to12-72x44-4020N0300E-2t_tcc_t_850.tar.bz2`: The target domain extends from 40.2°N to 53.1°N and 3.0°E to 24.3°E (72x44 grid points). The 2t, tcc, and t_850 are included as variables. This data corresponds to Dataset ID 5 in table A1 of the manuscript. 4) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t_t850.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t and the t_850 are the only variables included. This data set is actually a subset of No. 2. This data corresponds to Dataset ID 6 in table A1 of the manuscript. 5) `era5-Y2007-2019M01to12-80x48-3960N0180E-2t.tar.bz2`: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). 2t is exclusively included. This data set is also a subset of No. 2. This data corresponds to Dataset ID 7 in table A1 of the manuscript. Data creation The original ERA5 data can be retrieved from the ( MARS archive). Once access is granted, data can be downloaded by specifying a resolution of 0.3° in the retrieval script. The datasets provided in this repository are the processed ERA5 data after the extraction and the two preprocessing steps using the Atmospheric Machine learning Benchmarking System (AMBS) workflow tool (more details are provided in the README of the corresponding code repository). The data is available in TFRecords format that is used directly in the training step. Data access and decompression Data are stored in the archived and compressed format `tar.bz2` and available via: https://datapub.fz-juelich.de/esde/esde-nfs/online_publication/2mT_by_DL/. After downloading, the compressed archives can be unpacked on Linux using `tar xjf [filename].tar.bz2`. On Windows, decompressing can be performed using WinZip. Dataset content After decompressing, the following subdirectory structure is created from each compressed tar-archive: - `tfrecords_seq_len_[sequence_length]`: This folder holds the TFRecords files that are streamed to the deep neural networks during training and postprocessing. Each TFRecord file contains 10 samples, where each sample comprises a sequence over `[sequence_length]` hours. - `pickle`: This folder contains the normalized hourly data saved in monthly pickle files ( `X_[month].pkl`). The corresponding timestamps are included in `T_[month].pkl`. Furthermore, statistical information for each month is provided in the files `stat_[month].json`. - `metadata.json`: This file provides important meta information including the coordinates of the target domain, the included variables (e.g. 2t and t_850) and the origin of the processed data. - `statsitic.json`: This file includes the statistical information (maximum, minimum, and average values) used for normalizing the data. It also includes other information such as the total number of the timestamps ( `nfiles`) and the list of JSON files ( `stat_[month].json`) to compute the statistics. Data integrity and verification The tar-archives have been recursively checksummed with the `md5` hash function. The generated file is uploaded to ensure the integrity of the files and no alteration to the dataset. To verify the integrity of the downloaded data, use the following snippet: `find -type f -exec md5sum '{}' \; > md5sum.txt` It will generate a single text file that should be identical to the file in this entry. License Original data by ECMWF Copyright " © 2022 European Centre for Medium-Range Weather Forecasts (ECMWF)". Source www.ecmwf.int. This data is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. https://creativecommons.org/licenses/by/4.0/ Contact Bing Gong ( b.gong@fz-juelich.de)
Subject	Earth and Environmental Sciences
Related Publication	Gong, B., Langguth, M., Ji, Y., Mozaffari, A., Stadtler, S., Mache, K., and Schultz, M. G.: Temperature forecasting by deep learning methods, Geosci. Model Dev. Discuss. [preprint], doi:10.5194/gmd-2021-430, in review, 2022. https://gmd.copernicus.org/preprints/gmd-2021-430/
Grant Information	Bundesministerium für Bildung und Forschung (BMBF): 01 IS18047A European Union H2020 and Bundesministerium für Bildung und Forschung (BMBF): 955513 ERC Advanced grant: 787576 GCS/NIC: ESM : DeepACF
Depositor	Mozaffari, Amirpasha
Deposit Date	2022-06-20

FZJ Metadata

Institute	JSC
PoF IV topic	Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-5111)
Storage Location URL	https://datapub.fz-juelich.de/esde/esde-nfs/online_publication/2mT_by_DL/

Waiver

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the Dataverse.

CC0 - "Public Domain Dedication" Creative Commons CC0 1.0 Public Domain Dedication icon

Guestbook

No guestbook is assigned to this dataset, you will not be prompted to provide any information on file download.

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Dataset	Summary	Contributors	Published
No records found.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files to be downloaded.

Select File(s)

Please select a file or files for access request.

Select File(s)

Please select a file or files to be deleted.

Select File(s)

Please select unrestricted file(s) to be restricted.

Select File(s)

Please select restricted file(s) to be unrestricted.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access to this file.

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

- -

Download URL

https://data.fz-juelich.de/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

Compute Batch

Clear Batch

Dataset	Dataset Persistent ID

Compute Batch

File Restrictions

Terms of Access

Request Access

Enable access request

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to publish this dataset? Once you do so it must remain published.

Publish Dataset

This dataset cannot be published until Earth System Science is published. Would you like to publish both right now?

Once you publish this dataset it must remain published.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (2.1)

Major Release (3.0)

Publish Dataset

This dataset cannot be published until Earth System Science is published by its administrator.

Publish Dataset

This dataset cannot be published until Earth System Science and Institute for Advanced Simulation (IAS) – Jülich Supercomputing Centre (JSC) are published.

Return to Author

Return this dataset to contributor for modification.