Coreferential Relations in Basque: The Annotation Process |
| |
Authors: | Klara Ceberio Itziar Aduriz Arantza Díaz de Ilarraza Ines Garcia-Azkoaga |
| |
Affiliation: | 1.IXA Group,Faculty of Informatics, UPV-EHU,Donostia,Spain;2.IXA Group, Department of Catalan Philology and General Linguistics,Universitat de Barcelona,Barcelona,Spain;3.Department of Basque Language and Communication,UPV-EHU,Vitoria-Gasteiz,Spain |
| |
Abstract: | In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|