首页 | 本学科首页   官方微博 | 高级检索  
     


Coreferential Relations in Basque: The Annotation Process
Authors:Klara Ceberio  Itziar Aduriz  Arantza Díaz de Ilarraza  Ines Garcia-Azkoaga
Affiliation:1.IXA Group,Faculty of Informatics, UPV-EHU,Donostia,Spain;2.IXA Group, Department of Catalan Philology and General Linguistics,Universitat de Barcelona,Barcelona,Spain;3.Department of Basque Language and Communication,UPV-EHU,Vitoria-Gasteiz,Spain
Abstract:In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号