NoSta-D / GermEval 2014
German Named Entity Recognition Data
Source: This example was kindly contributed by Darina Benikova, Language Technology Lab, University of Duisburg-Essen, Germany
WebAnno was used to built the dataset NoSta-D which contains German Named Entity annotations with the following properties:
- The data was sampled from German Wikipedia and News Corpora as a collection of citations.
- The dataset covers over 31,000 sentences corresponding to over 590,000 tokens.
- The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as
[ORG FC Kickers [LOC Darmstadt]]
.
The GermEval 2014 NER Shared Task built on NoSta-D.
The data are available for download below. This data set is distributed under a CC-BY license.
Publications
-
Darina Benikova, Chris Biemann, Marc Reznicek. NoSta-D Named Entity Annotation for German: Guidelines and Dataset. Proceedings of LREC 2014, Reykjavik, Iceland [PDF] [BIB]
-
Darina Benikova, Chris Biemann, Max Kisselew, Sebastian Pado (2014): GermEval 2014 Named Entity Recognition Shared Task: Companion Paper. In Proceedings of the KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany [PDF]