Replication Data for: Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation (ICPSR doi:10.26165/JUELICH-DATA/KXDWII)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication Data for: Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation

Identification Number:

doi:10.26165/JUELICH-DATA/KXDWII

Distributor:

Jülich DATA

Date of Distribution:

2025-07-02

Version:

2

Bibliographic Citation:

Sczyrba, Alexander; Belmann, Peter; Osterholz, Benedikt; Kleinbölting, Nils; Pühler, Alfred; Schlüter, Andreas, 2025, "Replication Data for: Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation", https://doi.org/10.26165/JUELICH-DATA/KXDWII, Jülich DATA, V2

Study Description

Citation

Title:

Replication Data for: Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation

Identification Number:

doi:10.26165/JUELICH-DATA/KXDWII

Authoring Entity:

Sczyrba, Alexander (Forschungszentrum Jülich - IBG-5)

Belmann, Peter (Forschungszentrum Jülich - IBG-5)

Osterholz, Benedikt (Forschungszentrum Jülich - IBG-5)

Kleinbölting, Nils (Forschungszentrum Jülich - IBG-5)

Pühler, Alfred (Bielefeld University)

Schlüter, Andreas (Bielefeld University)

Software used in Production:

Metagenomics-Toolkit

Distributor:

Jülich DATA

Access Authority:

Sczyrba, Alexander

Depositor:

Sczyrba, Alexander

Date of Deposit:

2025-06-27

Study Scope

Keywords:

Computer and Information Science, Earth and Environmental Sciences

Abstract:

The metagenome analysis of complex environments with thousands of datasets, such as those in the Sequence Read Archive, requires substantial computational resources for it to be completed within a reasonable time frame. Efficient use of infrastructure is essential, and analyses must be fully reproducible with publicly available workflows to ensure transparency. Here, we introduce the Metagenomics-Toolkit, a scalable, data-agnostic workflow that automates the analysis of short and long metagenomic reads from Illumina and Oxford Nanopore Technology devices, respectively. The Metagenomics-Toolkit provides standard features such as quality control, assembly, binning, and annotation, along with unique capabilities including plasmid identification, recovery of unassembled microbial community members, and discovery of microbial interdependencies through dereplication, co-occurrence, and genome-scale metabolic modeling. Additionally, the Metagenomics-Toolkit includes a machine learning-optimized assembly step that adjusts peak RAM usage to match actual requirements, reducing the need for high-memory hardware. It can be executed on user workstations and includes optimizations for efficient cloud-based cluster execution. We compare the Metagenomics-Toolkit with five widely used metagenomics workflows and demonstrate its capabilities on 757 sewage metagenome datasets to investigate a possible sewage core microbiome. The Metagenomics-Toolkit is open source and available at https://github.com/metagenomics/metagenomics-tk.

Notes:

S3 Endpoint: https://s3.bi.denbi.de; S3 Paths: s3://mgtk/data/; s3://mgtk/aggregated_data/10.26165/JUELICH-DATA/KXDWII/

Methodology and Processing

Sources Statement

Data Access

Notes:

CC0 Waiver

Other Study Description Materials

Related Materials

https://github.com/metagenomics/wastewater-study https://github.com/metagenomics/metagenomics-tk

Related Studies

https://doi.org/10.1038/s41467-022-34312-7

Other Study-Related Materials

Label:

aggregated_data_access.txt

Notes:

text/plain

Other Study-Related Materials

Label:

sra_run_accession_access.txt

Notes:

text/plain

Other Study-Related Materials

Label:

SRA_RUN_ACCESSIONS.txt

Text:

Processed SRA run accessions

Notes:

text/plain

Other Study-Related Materials

Label:

wastewater-study-0.4.0.tar.gz

Notes:

application/gzip