期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Eva Portelance;Michael C. Frank;Dan Jurafsky; 《Cognitive Science》2024,48(5):e13448

Interpreting a seemingly simple function word like “or,” “behind,” or “more” can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network-based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learned by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spatial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning as well as early evidence that they are sensitive to alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on the frequency of models' input. Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in a visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning. 相似文献

2.

Audiovisual speech from emotionally expressive and lateralized faces

《Quarterly journal of experimental psychology (2006)》2013,66(4):730-750

Emotional expression and how it is lateralized across the two sides of the face may influence how we detect audiovisual speech. To investigate how these components interact we conducted experiments comparing the perception of sentences expressed with happy, sad, and neutral emotions. In addition we isolated the facial asymmetries for affective and speech processing by independently testing the two sides of a talker's face. These asymmetrical differences were exaggerated using dynamic facial chimeras in which left- or right-face halves were paired with their mirror image during speech production. Results suggest that there are facial asymmetries in audiovisual speech such that the right side of the face and right-facial chimeras supported better speech perception than their left-face counterparts. Affective information was also found to be critical in that happy expressions tended to improve speech performance on both sides of the face relative to all other emotions, whereas sad emotions generally inhibited visual speech information, particularly from the left side of the face. The results suggest that approach information may facilitate visual and auditory speech detection. 相似文献

3.

《Transportation Research Part F: Traffic Psychology and Behaviour》2022

The driver of a conditionally automated vehicle equivalent to level 3 of the SAE is obligated to accept a takeover request (TOR) issued by the vehicle. Considerable research has been conducted on the TOR, especially in terms of the effectiveness of multimodal methods. Therefore, in this study, the effectiveness of various multimodalities was compared and analyzed. Thirty-six volunteers were recruited to compare the effects of the multimodalities, and vehicle and physiological data were obtained using a driving simulator. Eight combinations of TOR warnings, including those implemented through LED lights on the A-pillar, earcon, speech message, or vibrations in the back support and seat pan, were analyzed to clarify the corresponding effects. When the LED lights were implemented on the A-pillar, the driver reaction was faster (p = 0.022) and steering deviation was larger (p = 0.024) than those in the case in which no LED lights were implemented. The speech message resulted in a larger steering deviation than that in the case of the earcon (p = 0.044). When vibrations were provided through the haptic seat, the reaction time (p < 0.001) was faster, and the steering deviation (p = 0.001) was larger in the presence of vibrations in the haptic seat than no vibration. An interaction effect was noted between the visual and auditory modalities; notably, the earcon resulted in a small steering deviation and skin conductance response amplitude (SCR amplitude) when implemented with LED lights on the A-pillar, whereas the speech message led to a small steering deviation and SCR amplitude without the LED lights. In the design of a multimodal warning to be used to issue a TOR, the effects of each individual modality and corresponding interaction effects must be considered. These effects must be evaluated through application to various takeover situations. 相似文献

4.

Linda Drijvers Julija Vaitonyt&#x; Asli zyürek 《Cognitive Science》2019,43(10)

Visual information conveyed by iconic hand gestures and visible speech can enhance speech comprehension under adverse listening conditions for both native and non‐native listeners. However, how a listener allocates visual attention to these articulators during speech comprehension is unknown. We used eye‐tracking to investigate whether and how native and highly proficient non‐native listeners of Dutch allocated overt eye gaze to visible speech and gestures during clear and degraded speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6‐band noise‐vocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cued‐recall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when speech was degraded for all listeners, but it was stronger for native listeners. Both native and non‐native listeners mostly gazed at the face during comprehension, but non‐native listeners gazed more often at gestures than native listeners. However, only native but not non‐native listeners' gaze allocation to gestures predicted gestural benefit during degraded speech comprehension. We conclude that non‐native listeners might gaze at gesture more as it might be more challenging for non‐native listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by visible speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for non‐native compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus non‐native listeners. 相似文献

5.

Minh Hao Nguyen Julia C. M. van Weert Nadine Bol Eugène F. Loos Kristien M. A. J. Tytgat Anthony W. H. van de Ven Ellen M. A. Smets 《人类交流研究》2017,43(1):102-126

Previous studies have mainly focused on tailoring message content to match individual characteristics and preferences. This study investigates the effect of a website tailored to individual preferences for the mode of information presentation, compared to 4 nontailored websites on younger and older adults' attention and recall of information, employing a 5 (condition: tailored vs. text, text with illustrations, audiovisual, combination) × 2 (age: younger [25–45] vs. older [≥65] adults) design (N = 559). The mode‐tailored condition (relative to nontailored conditions) improved attention to the website and, consequently, recall in older adults, but not in younger adults. Younger adults recalled more from nontailored information such as text only or text with illustrations, relative to tailored information. 相似文献

6.

《Infant behavior & development》2021

相似文献

7.

《Cognitive Systems Research》2021

Representing a world or a physical/social environment in an agent’s cognitive system is essential for creating human-like artificial intelligence. This study takes a story-centered approach to this issue. In this context, a story refers to an internal representation involving a narrative structure, which is assumed to be a common form of organizing past, present, future, and fictional events and situations. In the artificial intelligence field, a story or narrative is traditionally treated as a symbolic representation. However, a symbolic story representation is limited in its representational power to construct a rich world. For example, a symbolic story representation is unfit to handle the sensory/bodily dimension of a world. In search of a computational theory for narrative-based world representation, this study proposes the conceptual framework of a Cogmic Space for a comic strip-like representation of a world. In the proposed framework, a story is positioned as a mid-level representation, in which the conceptual and sensory/bodily dimensions of a world are unified. The events and their background situations that constitute a story are unified into a sequence of panels. Based on this structure, a representation (i.e., a story) and the represented environment are connected via an isomorphism of their temporal, spatial, and relational structures. Furthermore, the framework of a Cogmic Space is associated with the generative aspect of representations, which is conceptualized in terms of unconscious- and conscious-level processes/representations. Finally, a proof-of-concept implementation is presented to provide a concrete account of the proposed framework. 相似文献

8.

Drew H. Abney Rick Dale Max M. Louwerse Christopher T. Kello 《Cognitive Science》2018,42(4):1297-1316

Recent studies of naturalistic face‐to‐face communication have demonstrated coordination patterns such as the temporal matching of verbal and non‐verbal behavior, which provides evidence for the proposal that verbal and non‐verbal communicative control derives from one system. In this study, we argue that the observed relationship between verbal and non‐verbal behaviors depends on the level of analysis. In a reanalysis of a corpus of naturalistic multimodal communication (Louwerse, Dale, Bard, & Jeuniaux, 2012 ), we focus on measuring the temporal patterns of specific communicative behaviors in terms of their burstiness. We examined burstiness estimates across different roles of the speaker and different communicative modalities. We observed more burstiness for verbal versus non‐verbal channels, and for more versus less informative language subchannels. Using this new method for analyzing temporal patterns in communicative behaviors, we show that there is a complex relationship between verbal and non‐verbal channels. We propose a “temporal heterogeneity” hypothesis to explain how the language system adapts to the demands of dialog. 相似文献

9.

汉语语句重音的分类和分布 总被引：9，自引：0，他引：9

下载免费PDF全文

王韫佳初敏贺琳《心理学报》2003,35(6):734-742

通过两个独立进行的重音标注实验对汉语语句重音的分类和分布进行了初步探讨。实验l是由60位普通被试参加的音节重音突显度的知觉实验。实验2是由本文三位作者参加的重音类别标注实验,在此实验中语句重音被划分为节奏重音和语义重音。实验2中对于语句重音的分类性标注结果得到了实验l中普通被试对音节重音突显度知觉结果的支持,这说明人们确实能够感知到两种不同类型的重音。实验结果还表明,节奏重音倾向于出现在较大韵律单元内的最末韵律词的末音节上,并且与适当的停延相伴生,语义重音的分布则与语句的韵律结构的关系不大。相似文献

10.

句子重音对汉语同音异义词的解歧作用

下载免费PDF全文

仲晓波吕士楠《心理学报》2003,35(3):333-339

通过针对包含两对汉语同音异义词(“qui(4)shi(4)”(“趣事”或“去世”和“sheng(1)xue(2)”(“声学”或“升学”))的10个发音人语料的声学特征分析和知觉实验,结果发现：消除这两对同音异义词歧义的不是词重音而是句子重音。这个研究结果说明：汉语中句子重音能够通过选择强调同音异义词的某个音节来消除它们的歧义。相似文献