Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia (ICPSR doi:10.26165/JUELICH-DATA/ABTNID)

View:

Part 1: Document Description
Part 2: Study Description
Entire Codebook

(external link)

Document Description

Citation

Title:

Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia

Identification Number:

doi:10.26165/JUELICH-DATA/ABTNID

Distributor:

Jülich DATA

Date of Distribution:

2026-01-26

Version:

1

Bibliographic Citation:

Göpfert, Jan; Kuckertz, Patrick; Weinand, Jann M.; Stolten, Detlef, 2026, "Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia", https://doi.org/10.26165/JUELICH-DATA/ABTNID, Jülich DATA, V1

Study Description

Citation

Title:

Wiki-Quantities and Wiki-Measurements: Datasets of Quantities and their Measurement Context from Wikipedia

Identification Number:

doi:10.26165/JUELICH-DATA/ABTNID

Authoring Entity:

Göpfert, Jan (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Jülich Systems Analysis; RWTH Aachen University, Faculty of Mechanical Engineering)

Kuckertz, Patrick (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Jülich Systems Analysis)

Weinand, Jann M. (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Jülich Systems Analysis)

Stolten, Detlef (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems, Jülich Systems Analysis; RWTH Aachen University, Faculty of Mechanical Engineering)

Distributor:

Jülich DATA

Access Authority:

Göpfert, Jan

Depositor:

Göpfert, Jan

Date of Deposit:

2026-01-23

Study Scope

Keywords:

Computer and Information Science, Engineering, NLP, information extraction, quantitative information extraction, measurement extraction, Wikipedia, Wikidata

Abstract:

The task of extracting quantitative information from text is typically approached in a pipeline manner, where quantities are identified before their individual measurement context is extracted. To support the development and evaluation of systems for measurement extraction, we present two large datasets that correspond to the two tasks: Wiki-Quantities, a dataset for identifying quantities, and Wiki-Measurements, a dataset for extracting the measurement context for given quantities. The datasets are heuristically generated from Wikipedia articles and Wikidata facts.

Methodology and Processing

Sources Statement

Data Access

Notes:

CC BY-SA 4.0, except for Wikidata facts which are CC0 1.0

Other Study Description Materials

Related Publications

Citation

Identification Number:

10.1038/s41597-025-05499-3

Bibliographic Citation:

Göpfert, J., Kuckertz, P., Weinand, J.M., and Stolten, D. Wiki-Quantities and Wiki-Measurements: Datasets of quantities and their measurement context from Wikipedia. Sci Data 12, 1277 (2025). https://doi.org/10.1038/s41597-025-05499-3