首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A lack of human interaction and environmental control in Internet-based data collection have been suggested as possible antecedents of careless responding, which occurs when participants respond to survey items without regard for item content. To address these possible antecedents, this study investigated whether survey proctoring deterred careless response in an undergraduate sample by reducing environmental distractions. The study randomly assigned respondents to one of three proctoring conditions: remote online un-proctored, remote online virtually proctored, and in-person classroom proctored. Data quality was examined via nine careless response indicators. Analyses indicated that proctor presence had effects on a small number of careless response indicators. Virtually proctored participants performed better than un-proctored participants on one of nine careless response indicators, and in-person proctored participants performed better on two careless response indicators compared to un-proctored participants. Environmental distraction fully mediated the relationship between in-person proctor presence and self-reported diligence. Implications for survey administration are discussed.  相似文献   

2.
在认知诊断评估中,评价认知模型与作答数据的拟合非常重要。已有的层级相合性指标(HCI)仅能用于评价连接规则下模型与数据的拟合情况,有必要研究分离规则下相合性指标。HCI假设某项目上正确作答,便推断其子项目上的错误作答为失拟。由于作答反应的随机性,提出基于假设检验的项目相合性指标。该指标可用于区分连接规则和分离规则的作答数据、评价Q矩阵质量和衡量作答数据中的噪音、还可为评价认知模型和选择认知诊断模型提供参考。  相似文献   

3.
汪文义  宋丽红  丁树良 《心理学报》2016,48(12):1612-1624
介绍多维项目反应理论模型下分类准确性和分类一致性指标, 采用蒙特卡罗方法实现复杂决策规则下指标计算, 并从数学上证明分类准确性指标两类估计量在均匀先验和相同决策规则条件下依概率收敛于同一真值。研究结果表明:分类准确性指标可以比较准确地评价分类结果的准确性; 分类一致性指标可以较好地评价分类结果的重测一致性; 在一定条件下, 基于能力量尺的指标优于基于原始总分的指标; 纵使测验维度增加, 估计精度仍比较好; 随着测验长度和维度间相关增加, 分类准确性和分类一致性更高。指标可以用来评价标准参照测验或计算机分类测验的多种决策规则下分类信度和效度。  相似文献   

4.
Medical research has extensively dealt with the estimation of the accuracy (sensitivity and specificity) of a diagnostic test for screening individuals. In this paper we apply the biometric latent class model with random effects by Qu, Tan, and Kutner [(1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics, 52, 797-810] to estimate the response error (careless error and lucky guess) probabilities for dichotomous test items in the psychometric theory of knowledge spaces. The approach is illustrated with simulated data. In particular, we extend this approach to give a generalization of the basic local independence model in knowledge space theory. This allows for local dependence among the indicators given the knowledge state of an examinee and/or for the incorporation of covariates.  相似文献   

5.
This study investigated various measures commonly employed to assess the person reliability of an individual Minnesota Multiphasic Personality Inventory (MMPB protocol. Specifically, relationships among indices of person reliability and the standard MMPI validity scales were examined using the responses of 82 subjects who completed the MMPI on two occasions separated by 1 week. Person reliability indices were based on within-occasion responses to identical and to psychologically similar items, and on three across-occasion response consistency measures. The validity scales, namely, the L, F, K, and Cannot Say scales, showed higher test-retest stability than the within-occasion person reliability indices. Further, the validity scales and person reliability indices appeared to reflect multiple facets of dependable responding. Interestingly, an individual's tendency to change responses to MMPI items from the test to the retest was significantly predictable. Clinical implications of these findings were derived.  相似文献   

6.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

7.
8.
运用规则空间模型识别解题中的认知错误   总被引:13,自引:1,他引:12  
余嘉元 《心理学报》1995,28(2):196-203
运用规则空间模型识别学生解题中的认知错误,该模型将认知心理学,项目反应理论和数据库的代数理论相结合,将被试的二值反应模式映射为笛卡尔乘积空间的一组序偶,然后计算被试和错误规则在该空间的代表性位置之间的马氏距离,并通过贝叶斯决策准则划分被试的错误类型,根据644名中学生对30个数学题目的反应,识别出其中86%的学生可以被划分人18种认知错误类型中。  相似文献   

9.
This study reports a large-scale survey of citizens' attitudes and beliefs toward energy use and conservation in the southwestern United States. A probability sample of 1,000 Texas residents responded to a 10-item telephone survey. Questions concerned issues such as thermal comfort and health, economic benefits of conservation, efficacy of individual efforts, and perceived causes of the current U.S. energy situation. Confirmatory factor analysis replicated previous work by Seligman et al. (1979) and Becker et al. (1981) by identifying the same four principal dimensions underlying energy use attitudes and beliefs: 1) comfort and health, 2) high effort-low payoff, 3) role of individual consumer, and 4) legitimacy of energy problem. In addition, several demographic characteristics were found to moderate consumers' responses to the survey items. The results of this study reinforce the conclusion that future energy conservation campaigns should be sensitive to consumers' concerns about comfort and health. New directions for future research on energy attitudes and conservation behavior are discussed.  相似文献   

10.
Little research has been conducted on the psychometrics of the very short scale (36 items) of the Children's Behavior Questionnaire, and no one-item temperament scale has been tested for use in applied work. In this study, 237 United States caregivers completed a survey to define their child's behavioral patterns (i.e., Surgency, Negative Affectivity Effortful Control) using both scales. Psychometrics of the 36-item Children's Behavior Questionnaire were examined using classical test theory, principal factor analysis, and item response modeling. Classical test theory analysis demonstrated adequate internal consistency and factor analysis confirmed a three-factor structure. Potential improvements to the measure were identified using item response modeling. A one-item (three response categories) temperament scale was validated against the three temperament factors of the 36-item scale. The temperament response categories correlated with the temperament factors of the 36-item scale, as expected. The one-item temperament scale may be applicable for clinical use.  相似文献   

11.
Infrequency scales are becoming a popular mode of data screening, due to their availability and ease of implementation. Recent research has indicated that the interpretation and functioning of infrequency items may not be as straightforward as had previously been thought (Curran & Hauser, 2015), yet there are no empirically based guidelines for implementing cutoffs using these items. In the present study, we compared two methods of detecting random responding with infrequency items: a zero-tolerance threshold versus a threshold that balances classification error rates. The results showed that a traditional zero-tolerance approach, on average, screens data that are less indicative of careless responding than those screened by the error-balancing approach. Thus, the de facto standard of applying a “zero-tolerance” approach when screening participants with infrequency scales may be too stringent, so that meaningful responses may also be removed from analyses. Recommendations and future directions are discussed.  相似文献   

12.
A Web-based coding application was designed to improve coding efficiency and to provide a systematic means of evaluating responses to open-ended assessments. The system was developed for use by multiple raters to assign open-ended responses to predetermined categories. The application provides a software environment for efficiently supervising the work of coders and evaluating the quality of the coding by (1) systematically presenting open-ended responses to coders, (2) tracking each coder’s categorized responses, and (3) assessing interrater consistency at any time in order to identify coders in need of further training. In addition, the application can be set to automatically assign repeated responses to categories previously identified as appropriate for those responses. To evaluate the efficacy of the coding application and to determine the statistical reliability of coding open-ended data within this application, we examined data from two empirical studies. The results demonstrated substantial interrater agreement on items assigned to various categories across free and controlled association tasks. Overall, this new coding application provides a feasible method of reliably coding open-ended data and makes the task of coding these data more manageable.  相似文献   

13.

Purpose  

Responses provided by unmotivated survey participants in a careless, haphazard, or random fashion can threaten the quality of data in psychological and organizational research. The purpose of this study was to summarize existing approaches to detect insufficient effort responding (IER) to low-stakes surveys and to comprehensively evaluate these approaches.  相似文献   

14.
In this study we tested the hypothesis that groups of NEO Personality Inventory-Revised (NEO-PI-R; Costa & McCrae, 1992a) protocols identified as potentially invalid by an inconsistency scale (INC; Schinka, Kinder, & Kremer, 1997) would show reduced reliability and validity according to a series of psychometric tests. Data were obtained from 2 undergraduate student samples, a self-report group (n = 132) who provided NEO-PI-R self-ratings on 2 occasions separated by a 7- to 14-day interval and an informant group (n = 109) who provided ratings of well-known friends or relatives on 2 occasions separated by a 6 month interval. INC scores from the Time 1 protocols were used to divide these samples into low, moderate, and elevated inconsistency groups. In both samples, these 3 groups showed equivalent levels of reliability and validity as measured by: contingency coefficients for the 20 INC item responses across occasions; test-retest intraclass correlations of NEO-PI-R domain scores; convergent correlations with Goldberg's (1992) Bipolar Adjective Scale scores; and discriminant correlations between the 5 NEO-PI-R domain scores. The similarity of results across self-report and informant assessment contexts provides additional evidence that semantic consistency approaches to assessing protocol validity may overestimate the prevalence of random or careless response behavior in standard administration conditions. Several theories are discussed that accommodate the existence of valid inconsistency in structured personality assessment.  相似文献   

15.
This study describes the process of developing a scale to measure the leadership capacity of players in sports teams. Research into sports leadership has focused almost exclusively on the formal leadership of the coach, in which the studies by Chelladurai, with his five-factor model, have become an essential point of reference. Nevertheless, hardly any research has been carried out into the leadership that certain players exercise over the other team members. For this purpose, a sample of 143 male basketball players was used; these participants were asked to evaluate the characteristics of the sports leader over a total of 54 indicators. Firstly, explanatory factor analysis was performed with participants' responses, using principal axis and oblique rotation methods. The factor structure obtained was then subjected to confirmatory factorial analysis, enabling us to propose a Sports Leader Evaluation Scale (EELD, in Spanish) with 18 items grouped into 3 factors, denominated empathy and responsibility, assertiveness, and impulsiveness. Satisfactory fit indices were obtained for the model, for the reliability of items and for the internal consistency of factors.  相似文献   

16.
Classical methods for detecting outliers deal with continuous variables. These methods are not readily applicable to categorical data, such as incorrect/correct scores (0/1) and ordered rating scale scores (e.g., 0, …, 4) typical of multi-item tests and questionnaires. This study proposes two definitions of outlier scores suited for categorical data. One definition combines information on outliers from scores on all the items in the test, and the other definition combines information from all pairs of item scores. For a particular item-score vector, an outlier score expresses the degree to which the item-score vector is unusual. For ten real-data sets, the distribution of each of the two outlier scores is inspected by means of Tukey's fences and the extreme studentized deviate procedure. It is investigated whether the outliers that are identified are influential with respect to the statistical analysis performed on these data. Recommendations are given for outlier identification and accommodation in test and questionnaire data.  相似文献   

17.
Background. Interview‐based research has shown that students in higher education hold a number of different conceptions of learning and of themselves as learners. There is debate about whether these conceptions constitute a developmental hierarchy. Aims. This study evaluated the Mental Models section of Vermunt and van Rijswijk's (1988) Inventory of Learning Styles (ILS) as a measure of students' conceptions of learning and sought to identify conceptions of learning as qualitatively different patterns of scores. Sample. A random sample of 1,000 students who were taking courses by distance learning with the Open University in the UK. Method. A translated and adapted version of the Mental Models section of the ILS was administered in a postal survey. Complete data were obtained from 441 students and were subjected to principal component analysis, cluster analysis and discriminant analysis. Results. The five scales in the Mental Models section of the ILS were homogeneous and achieved a satisfactory level of internal consistency, but two of the five scales could not be differentiated from each other in the students' responses. A cluster analysis identified four subgroups of students who had different patterns of scores on two discriminant functions. Conclusion. The four mental models identified in this study were broadly similar to those identified by Vermunt (1996) in an interview‐based study. However, these do not seem to constitute a developmental hierarchy, and, following Vermunt, it is suggested that they are better interpreted as aspects of four over‐arching ‘learning styles’ or ‘learning patterns’.  相似文献   

18.
A Bayesian random effects model for testlets   总被引:4,自引:0,他引:4  
Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.The authors wish to thank Robert Mislevy, Andrew Gelman and Donald B. Rubin for their helpful suggestions and comments, Ida Lawrence and Miriam Feigenbaum for providing us with the SAT data analyzed in section 5, and to the two anonymous referees for their careful reading and thoughtful suggestions on an earlier draft. We are also grateful to the Educational Testing service for providing the resources to do this research.  相似文献   

19.
The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics, we hypothesized that in many cases, information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. We developed a procedure to separate the semantic influence from attitude strength in individual response patterns, and compared these results to, respectively, the observed sample correlation matrices and the semantic similarity structures arising from text analysis algorithms. This was done with four datasets, comprising a total of 7,787 subjects and 27,461,502 observed item pair responses. As we argued, attitude strength seemed to account for much information about the individual respondents. However, this information did not seem to carry over into the observed sample correlation matrices, which instead converged around the semantic structures offered by the survey items. This is potentially disturbing for the traditional understanding of what survey data represent. We argue that this approach contributes to a better understanding of the cognitive processes involved in survey responses. In turn, this could help us make better use of the data that such methods provide.  相似文献   

20.
Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to identify outliers in MSA. This adaptation involves choices with respect to the algorithm's objective function, selection of items from samples without outliers, and scalability criteria to be used in the forward search algorithm. The application of the adapted forward search algorithm for MSA is demonstrated using real data. Recommendations are given for its use in practical scale analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号