Abstract: | The human sentence processor is able to make rapid predictions about upcoming linguistic input. For example, upon hearing the verb eat, anticipatory eye‐movements are launched toward edible objects in a visual scene (Altmann & Kamide, 1999). However, the cognitive mechanisms that underlie anticipation remain to be elucidated in ecologically valid contexts. Previous research has, in fact, mainly used clip‐art scenes and object arrays, raising the possibility that anticipatory eye‐movements are limited to displays containing a small number of objects in a visually impoverished context. In Experiment 1, we confirm that anticipation effects occur in real‐world scenes and investigate the mechanisms that underlie such anticipation. In particular, we demonstrate that real‐world scenes provide contextual information that anticipation can draw on: When the target object is not present in the scene, participants infer and fixate regions that are contextually appropriate (e.g., a table upon hearing eat). Experiment 2 investigates whether such contextual inference requires the co‐presence of the scene, or whether memory representations can be utilized instead. The same real‐world scenes as in Experiment 1 are presented to participants, but the scene disappears before the sentence is heard. We find that anticipation occurs even when the screen is blank, including when contextual inference is required. We conclude that anticipatory language processing is able to draw upon global scene representations (such as scene type) to make contextual inferences. These findings are compatible with theories assuming contextual guidance, but posit a challenge for theories assuming object‐based visual indices. |