WebAnno - NoSta-D / GermEval 2014

NoSta-D / GermEval 2014

German Named Entity Recognition Data

Source: This example was kindly contributed by Darina Benikova, Language Technology Lab, University of Duisburg-Essen, Germany

WebAnno was used to built the dataset NoSta-D which contains German Named Entity annotations with the following properties:

  • The data was sampled from German Wikipedia and News Corpora as a collection of citations.
  • The dataset covers over 31,000 sentences corresponding to over 590,000 tokens.
  • The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as [ORG FC Kickers [LOC Darmstadt]].

The GermEval 2014 NER Shared Task built on NoSta-D.

The data are available for download below. This data set is distributed under a CC-BY license.

  • Darina Benikova, Chris Biemann, Marc Reznicek. NoSta-D Named Entity Annotation for German: Guidelines and Dataset. Proceedings of LREC 2014, Reykjavik, Iceland [PDF] [BIB]

  • Darina Benikova, Chris Biemann, Max Kisselew, Sebastian Pado (2014): GermEval 2014 Named Entity Recognition Shared Task: Companion Paper. In Proceedings of the KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany [PDF]