首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.  相似文献   

2.
Personality constructs, attitudes and other non-cognitive variables are often measured using rating or Likert-type scales, which does not come without problems. Especially in low-stakes assessments, respondents may produce biased responses due to response styles (RS) that reduce the validity and comparability of the measurement. Detecting and correcting RS is not always straightforward because not all respondents show RS and the ones who do may not do so to the same extent or in the same direction. The present study proposes the combination of a multidimensional IRTree model with a mixture distribution item response theory model and illustrates the application of the approach using data from the Programme for the International Assessment of Adult Competencies (PIAAC). This joint approach allows for the differentiation between different latent classes of respondents who show different RS behaviours and respondents who show RS versus respondents who give (largely) unbiased responses. We illustrate the application of the approach by examining extreme RS and show how the resulting latent classes can be further examined using external variables and process data from computer-based assessments to develop a better understanding of response behaviour and RS.  相似文献   

3.
Many educational and psychological assessments focus on multidimensional latent traits that often have a hierarchical structure to provide both overall-level information and fine-grained diagnostic information. A test will usually have either separate time limits for each subtest or an overall time limit for administrative convenience and test fairness. In order to complete the items within the allocated time, examinees frequently adopt different test-taking behaviours during the test, such as solution behaviour and rapid guessing behaviour. In this paper we propose a new mixture model for responses and response times with a hierarchical ability structure, which incorporates auxiliary information from other subtests and the correlation structure of the abilities to detect rapid guessing behaviour. A Markov chain Monte Carlo method is proposed for model estimation. Simulation studies reveal that all model parameters could be recovered well, and the parameter estimates had smaller absolute bias and mean squared error than the mixture unidimensional item response theory (UIRT) model. Moreover, the true positive rate of detecting rapid guessing behaviour is also higher than when using the mixture UIRT model separately for each subscale, whereas the false detection rate is much lower than the mixture UIRT model. The deviance information criterion and the logarithm of the pseudo-marginal likelihood are employed to evaluate the model fit. Finally, a real data analysis is presented to demonstrate the practical value of the proposed model.  相似文献   

4.
In real testing, examinees may manifest different types of test‐taking behaviours. In this paper we focus on two types that appear to be among the more frequently occurring behaviours – solution behaviour and rapid guessing behaviour. Rapid guessing usually happens in high‐stakes tests when there is insufficient time, and in low‐stakes tests when there is lack of effort. These two qualitatively different test‐taking behaviours, if ignored, will lead to violation of the local independence assumption and, as a result, yield biased item/person parameter estimation. We propose a mixture hierarchical model to account for differences among item responses and response time patterns arising from these two behaviours. The model is also able to identify the specific behaviour an examinee engages in when answering an item. A Monte Carlo expectation maximization algorithm is proposed for model calibration. A simulation study shows that the new model yields more accurate item and person parameter estimates than a non‐mixture model when the data indeed come from two types of behaviour. The model also fits real, high‐stakes test data better than a non‐mixture model, and therefore the new model can better identify the underlying test‐taking behaviour an examinee engages in on a certain item.  相似文献   

5.
J. O. Ramsay 《Psychometrika》1995,60(3):323-339
The probability that an examinee chooses a particular option within an item is estimated by averaging over the responses to that item of examinees with similar response patterns for the whole test. The approach does not presume any latent variable structure or any dimensionality. But simulated and actual data analyses are presented to show that when the responses are determined by a latent ability variable, this similarity-based smoothing procedure can reveal the dimensionality of ability very satisfactorily.The author wishes to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada through grant A320, and to thank Educational Testing Service for making the data on the Advanced Placement Chemistry Exam available.  相似文献   

6.
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi‐parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non‐parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non‐parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees’ latent speeds; whereas the non‐parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box–Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two‐stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non‐parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested.  相似文献   

7.
In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic classification models (DCMs). DCMs are a newer class of psychometric models that are designed to classify examinees according to levels of categorical latent traits. We examined the invariance property for general DCMs using the log-linear cognitive diagnosis model (LCDM) framework. We conducted a simulation study to examine the degree to which theoretical invariance of LCDM classifications and item parameter estimates can be observed under various sample and test characteristics. Results illustrated that LCDM classifications and item parameter estimates show clear invariance when adequate model data fit is present. To demonstrate the implications of this important property, we conducted additional analyses to show that using pre-calibrated tests to classify examinees provided consistent classifications across calibration samples with varying mastery profile distributions and across tests with varying difficulties.  相似文献   

8.
Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using data from the computer-based version of the PISA-Based Test for Schools, examined the distortive effects of differential engagement on aggregated school-level scores. The results showed that, although there was considerable differential engagement among schools, the school means were highly stable due to two factors. First, any distortive effects of disengagement in a school were diluted by a high proportion of the students exhibiting no non-effortful behavior. Second, and most interestingly, disengagement produced both positive and negative distortion of individual student scores, which tended to cancel out much of the net distortive effect on the school’s mean.  相似文献   

9.
设计项目参数、被试得分已知的测验情境,在两、三、四参数Logistic加权模型下进行能力估计,发现被试得分等级之间的能力步长存在着均匀的步长间距,被试得分能较好的反映多级记分的分数加权作用。两参数Logistic加权模型下会出现被试参数估计扰动现象,猜测现象会导致能力高估现象,失误现象会导致能力低估现象;三参数Logistic加权模型c型下能力高估现象未出现或不明显;三参数Logistic加权模型γ型下能力低估现象未出现或不明显;四参数Logistic加权模型下被试能力高估现象和低估现象都未出现或不明显,四参数Logistic加权模型是被试能力稳健性估计较好的方法。  相似文献   

10.
Abstract

For adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuable information on the processes associated with nonresponse. Bringing together research on item omissions with approaches for modeling response time data, we propose a framework for simultaneously modeling response behavior and omission behavior utilizing timing information for both. As such, the proposed model allows (a) to gain a deeper understanding of response and nonresponse behavior in general and, in particular, of the processes underlying item omissions in LSAs, (b) to model the processes determining the time examinees require to generate a response or to omit an item, and (c) to account for nonignorable item omissions. Parameter recovery of the proposed model is studied within a simulation study. An illustration of the model by means of an application to real data is provided.  相似文献   

11.
目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示:(1)不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;(2)可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;(3)运行的时长和统计估计方法相比大大缩减。  相似文献   

12.
We tested the hypothesis that “good feelings”—the central element of subjective well-being—are associated with interdependence and interpersonal engagement of the self in Japan, but with independence and interpersonal disengagement of the self in the United States. Japanese and American college students (total N = 913) reported how frequently they experienced various emotional states in daily life. In support of the hypothesis, the reported frequency of general positive emotions (e.g. calm, elated) was most closely associated with the reported frequency of interpersonally engaged positive emotions (e.g. friendly feelings) in Japan, but with the reported frequency of interpersonally disengaged positive emotions (e.g. pride) in the United States. Further, for Americans the reported frequency of experience was considerably higher for positive emotions than for negative emotions, but for Japanese it was higher for engaged emotions than for disengaged emotions. Implications for cultural constructions of emotion in general and subjective well-being in particular are discussed.  相似文献   

13.
The Don’t Know (DK) response – taking the form of an omitted response or not-reached at the end of a cognitive test, or explicitly presented as a response option in a social survey – contains important information that is often overlooked. Direct psychometric modeling efforts for DK responses are few and far between. In this article, the linear logistic test model (LLTM) is proposed for delineating the impacts of cognitive operations for a test that contains DK responses. We assume that the DK response is a valid response. The assumption is reasonable for many situations, including low-stakes cognitive tests and attitudinal assessments. By extracting information embedded in the DK response, the method shows how DK can inform the latent construct of interest and the cognitive operations underlying the response to stimuli. Using a proven recoding scheme, the LLTM could be implemented through commonly used programs such as PROC GLIMMIX. Two simulation experiments to evaluate how well the parameters can be recovered were conducted. In addition, two real data examples, from a noncognitive test of health belief assessment and a cognitive test of knowledge in diabetes, are also presented as case studies to illustrate the LLTM for DK response.  相似文献   

14.
Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed.  相似文献   

15.
Students' conceptions of their academic futures, such as completing secondary school, have been found to play a significant role in their current behavior. Indeed, research regarding future time perspectives (FTP) indicates that students with extended FTPs are likely to be more engaged and less disengaged over time. Extended FTPs comprise two critical motivating elements: the cognitive (i.e., importance value) and the dynamic (i.e., school completion aspirations). Although these elements are hypothetically reciprocally related and without temporal limitation to their motivational effects, these claims have largely gone untested. These claims were examined via longitudinal structural equation modelling with cross-lagged panel analysis and invariance testing in a sample of 1327 Australian secondary school students. Findings indicated that importance value is directionally salient over school completion aspirations (such that it may precede school completion aspirations), both are associated with higher engagement and lower disengagement over time, and evidence of temporal limitations on the motivational benefits of the elements of extended FTPs was not found. School-based interventions that focus on improving importance value and school completion aspirations are discussed.  相似文献   

16.
王璞珏  刘红云 《心理学报》2019,51(9):1057-1067
基于推荐系统中协同过滤推荐的思想, 提出两种可以利用已有答题者数据的CAT选题策略:直接基于答题者推荐(DEBR)和间接基于答题者推荐(IEBR)。通过两个模拟研究, 在不同题库和不同长度的测验中, 比较了两种推荐选题策略与两种传统选题策略(FMI和BAS)在测量精度和对题目曝光率控制上的表现, 以及影响推荐选题策略表现的因素。结果发现:两种推荐选题策略对题目曝光率的控制优于两种传统选题策略, 测量精度不亚于BAS方法, 其中DEBR侧重选题精度, IEBR对题目曝光率控制最好。已有答题者数据的特点和质量是影响推荐选题策略表现的主要因素。  相似文献   

17.
The current study was designed to gain insights into shifting school culture by examining perceived peer group norms and social values across elementary and middle school grades. Perceived norms were assessed by asking participants (N = 605) to estimate how many grade mates were academically engaged, disengaged, and antisocial. To capture social values, peer nominations were used to assess “coolness” associated with these behaviors. Perceived norms became gradually more negative from fall to spring and across grades four to eight. Whereas academic engagement was socially valued in elementary school, negative social and academic behaviors were valued in middle school. Additionally, improved social status was associated with increased academic engagement in fifth grade, disengagement in seventh and eighth grades, and antisocial behavior in sixth grade. The findings suggest that differences between elementary and middle school cultural norms and values may shed light on negative behavior changes associated with the transition to middle school.  相似文献   

18.
本文提出一种多级计分项目下的个人拟合统计量R, 考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现, 并与标准化对数似然统计量lzp进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时, R的检测率显著高于lzp; (2) 随着测验长度和被试异常程度的增加, 两种统计量的检测率都会上升; (3) 在一些条件下, Rlzp检测效果接近。实证数据分析进一步展示了R统计量的使用方法和过程, 结果也表明R统计量具有较好的应用前景。  相似文献   

19.
The present study explored the interactive effects of self‐efficacy and increasing/decreasing task difficulty upon engagement and disengagement within a cusp‐catastrophe model framework. Using a closed motor skill aiming task participants (N=60) were required to compete in conditions where task difficulty increased and then decreased (or vice versa) where they were rewarded for good performance but penalized for bad. Participants who reported low levels of self‐efficacy disengage at an earlier level of task difficulty than their high self‐efficacy counterparts. Furthermore, this group did not re‐engage with the task until task difficulty had significantly decreased. Although task disengagement occurred with high difficulty in the high self‐efficacy group, this group re‐engaged in a similar manner in which they disengaged. Findings support and extend those of previous tests of catastrophe models by directly allowing for task disengagement.  相似文献   

20.
The identifiability of item response models with nonparametrically specified item characteristic curves is considered. Strict identifiability is achieved, with a fixed latent trait distribution, when only a single set of item characteristic curves can possibly generate the manifest distribution of the item responses. When item characteristic curves belong to a very general class, this property cannot be achieved. However, for assessments with many items, it is shown that all models for the manifest distribution have item characteristic curves that are very near one another and pointwise differences between them converge to zero at all values of the latent trait as the number of items increases. An upper bound for the rate at which this convergence takes place is given. The main result provides theoretical support to the practice of nonparametric item response modeling, by showing that models for long assessments have the property of asymptotic identifiability. The research was partially supported by the National Institute of Health grant R01 CA81068-01.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号