Information Density and Syntactic Repetition |
| |
Authors: | David Temperley Daniel Gildea |
| |
Affiliation: | 1. Eastman School of MusicUniversity of Rochester;2. Computer Science DepartmentUniversity of Rochester |
| |
Abstract: | In noun phrase (NP) coordinate constructions (e.g., NP and NP), there is a strong tendency for the syntactic structure of the second conjunct to match that of the first; the second conjunct in such constructions is therefore low in syntactic information. The theory of uniform information density predicts that low‐information syntactic constructions will be counterbalanced by high information in other aspects of that part of the sentence, and high‐information constructions will be counterbalanced by other low‐information components. Three predictions follow: (a) lexical probabilities (measured by N‐gram probabilities and head‐dependent probabilities) will be lower in second conjuncts than first conjuncts; (b) lexical probabilities will be lower in matching second conjuncts (those whose syntactic expansions match the first conjunct) than nonmatching ones; and (c) syntactic repetition should be especially common for low‐frequency NP expansions. Corpus analysis provides support for all three of these predictions. |
| |
Keywords: | Information Syntax Coordination Language production Probabilistic models |
|