首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi‐parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non‐parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non‐parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees’ latent speeds; whereas the non‐parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box–Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two‐stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non‐parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested.  相似文献   

2.
The estimation of sensitivity and bias from data collected in a yes/no detection‐theoretic experiment is complicated by the possibility of proportions of 0 or 1 appearing in the resulting contingency table. Inverse normal transforms of these probabilities result in mathematically intractable infinities. Typically, some transformation of the data must be applied prior to parameter estimation. Several transformations have been reviewed in the literature, in terms of both the bias and the variance of the estimates they produce. We propose three generalized transformations, which contain the two most reported transformations as special cases, and consider their performance in terms of the mean square error of the estimates they produce. Results indicate that the ‘1/N ’ and the adaptive log‐linear transformations outperform the others. Guidelines for the application of these transformations are presented.  相似文献   

3.
Background: Although there have been numerous studies conducted on the psychometric properties of Biggs' Learning Process Questionnaire (LPQ), these have involved the use of traditional omnibus measures of scale quality such as corrected item total correlations, internal consistency estimates of reliability, and factor analysis. However, these omnibus measures of scale quality are sample dependent and fail to model item responses as a function of trait level. And since the item trait relationship is typically nonlinear, traditional factor analytic methods are inappropriate. Aims: The purpose of this study was to identify a unidimensional subset of LPQ items and examine the effectiveness of these items and their options in discriminating between changes in the underlying trait level. In addition to assessing item quality, we were interested in assessing overall scale quality with non‐sample dependent measures. Method: The sample was split into two nearly equal halves, and a undimensional subset of items was identified in one of these samples and cross‐validated in the other. The nonlinear relationship between the probability of endorsing an item option and the underlying trait level was modelled using a nonparametric latent trait technique known as kernel smoothing and implemented with the program TestGraf. After item and scale quality were established, maximum likelihood estimates of participants' trait level were obtained and used to examine grade and gender differences. Results: A undimensional subset of 16 deep and achieving items was identified. Slightly more than half of these items needed some of their options combined so that the probability of endorsing an item option as a function of increasing trait level corresponded to the ideal rank ordering of the item options. With this adjustment, scale quality as measured by the information function and standard error function was found to be good. However, no statistically significant gender differences were observed and, although statistically significant grade differences were observed, they were not substantively meaningful. Conclusions: The use of nonparametric kernel‐smoothing techniques is advocated over parametric latent trait methods for the analysis of attitudinal and psychological measures involving polychotomous ordered‐response categories. It is also suggested that latent trait methods are more appropriate than traditional test‐based measures for studying differential item functioning both within and between cultures. Nonparametric kernel‐smoothing techniques hold particular promise in identifying and understanding cross‐cultural differences in student approaches to learning at both the item and scale level.  相似文献   

4.
The simultaneous and nonparametric estimation of latent abilities and item characteristic curves is considered. The asymptotic properties of ordinal ability estimation and kernel smoothed nonparametric item characteristic curve estimation are investigated under very general assumptions on the underlying item response theory model as both the test length and the sample size increase. A large deviation probability inequality is stated for ordinal ability estimation. The mean squared error of kernel smoothed item characteristic curve estimates is studied and a strong consistency result is obtained showing that the worst case error in the item characteristic curve estimates over all items and ability levels converges to zero with probability equal to one.  相似文献   

5.
In the design of common-item equating, two groups of examinees are administered separate test forms, and each test form contains a common subset of items. We consider test equating under this situation as an incomplete data problem—that is, examinees have observed scores on one test form and missing scores on the other. Through the use of statistical data-imputation techniques, the missing scores can be replaced by reasonable estimates, and consequently the forms may be directly equated as if both forms were administered to both groups. In this paper we discuss different data-imputation techniques that are useful for equipercentile equating; we also use empirical data to evaluate the accuracy of these techniques as compared with chained equipercentile equating.A paper presented at the European Meeting of the Psychometric Society, Barcelona, Spain, July, 1993.  相似文献   

6.
采用康春花、孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型探讨了评分者人数和项目个数对被试能力估计准确性的影响。模拟研究的结果表明:(1)随着项目个数的增加,估计值与真值之间的相关也不断增加;(2)评分者人数和项目个数在平均绝对偏差(MAB)和误差均方根(RMSE)上的主效应均显著,两者间的交互效应也显著;(3)简单效应分析发现,当项目较少时,3个评分者条件下的能力估计准确性最好; 随着项目个数的增加,4个评分者的估计误差迅速下降,且表现变为最好。  相似文献   

7.
A linking design typically consists of a data collection procedure together with an item linking procedure that places item parameters calibrated from multiple test forms onto a common scale. This study considered 2 potentially useful item response theory linking designs. The first one is characterized by selecting a single set of common items across all multiple test forms, the precalibrated item parameters of which are kept fixed while the unknown parameters of the other items are being estimated. This linking design will be referred to as the fixed common-precalibrated item parameter design. However, data collected under this design could also be analyzed by the characteristic curve method, which constituted an alternative linking procedure. In this study, the relative merits of the 2 linking designs were examined with respect to their robustness against 3 manipulated conditions-namely, when the common items have imprecise estimates, when there is a noticeable difference in the average item difficulty between the common and the noncommon items, and when the examinees are heterogeneous in terms of their abilities. A parameter recovery study was conducted to achieve this purpose. The results indicated that both linking designs were capable of producing accurate linking of items and equivalent estimation of ability parameters under the 3 conditions. When the 2 designs were actually utilized in the development of an item bank, it was found that both linking designs produced quite consistent solutions despite minor differences on some item and ability estimates. Condition under which a linking design is preferred over the other is also provided in the Discussion section of this article.  相似文献   

8.
9.
Fifty items from Goldberg's International Personality Item Pool were compiled to form a public‐domain measure of personality, the Australian Personality Inventory (API). Data from a random community sample (N = 7615) and a university‐based sample (N = 271) were used to explore psychometric properties of this 50‐item measure of the five‐factor model of personality (FFM). In both samples, internal reliabilities were adequate. In the university‐based sample an appropriate pattern of convergent and divergent relationship was found between scale scores and domain scores from the NEO Five‐Factor Inventory. After adjusting for an apparent response set (mean response across items), exploratory factor analyses clearly retrieved the FFM in both samples. It is provisionally concluded that raw scale scores from the API provide reliable estimates of the FFM, but adjustment for mean response across the 50 items might clarify the five‐factor structure, especially in less educated samples.  相似文献   

10.
Research on constructing alternate forms of assessment center exercises is very scarce. This study examines the effectiveness of a cloning procedure (incident isomorphic approach) for developing alternate forms of a computerized in‐basket. In this approach, original and alternate items are essentially similar (they are based on the same critical incident), while being superficially different (they are situated in a different context). Results showed there was no significant difference between the overall in‐basket score across the alternate forms. In addition, these overall scores correlated .66, with projected estimates for the full in‐basket approaching .80. Implications and limitations of the use of cloning in designing alternate assessment center exercises are discussed.  相似文献   

11.
It has often been hypothesized that speakers store regularly inflected forms as separate entries in the lexicon. If this hypothesis is true, high-frequency lexical items will have lower error rates on their inflections than will low-frequency lexical items. This is shown to be the case for errors on irregular inflected forms in naturally occurring speech errors. High-frequency regularly inflected forms exhibit a small (but nonsignificant) advantage in naturally occurring errors, and a larger (significant) advantage in a more controlled experimental task in which subjects produced the past-tense forms of regular verbs. These data are best explained by assuming that high frequency inflected forms are stored as separate entries in the lexicon. Consequences of this finding for theories of language production and language learning are discussed.  相似文献   

12.
We analytically derive the fixed‐effects estimates in unconditional linear growth curve models by typical linear mixed‐effects modelling (TLME) and by a pattern‐mixture (PM) approach with random‐slope‐dependent two‐missing‐pattern missing not at random (MNAR) longitudinal data. Results showed that when the missingness mechanism is random‐slope‐dependent MNAR, TLME estimates of both the mean intercept and mean slope are biased because of incorrect weights used in the estimation. More specifically, the estimate of the mean slope is biased towards the mean slope for completers, whereas the estimate of the mean intercept is biased towards the opposite direction as compared to the estimate of the mean slope. We also discuss why the PM approach can provide unbiased fixed‐effects estimates for random‐coefficients‐dependent MNAR data but does not work well for missing at random or outcome‐dependent MNAR data. A small simulation study was conducted to illustrate the results and to compare results from TLME and PM. Results from an empirical data analysis showed that the conceptual finding can be generalized to other real conditions even when some assumptions for the analytical derivation cannot be met. Implications from the analytical and empirical results were discussed and sensitivity analysis was suggested for longitudinal data analysis with missing data.  相似文献   

13.
Cross‐classified random effects modelling (CCREM) is a special case of multi‐level modelling where the units of one level are nested within two cross‐classified factors. Typically, CCREM analyses omit the random interaction effect of the cross‐classified factors. We investigate the impact of the omission of the interaction effect on parameter estimates and standard errors. Results from a Monte Carlo simulation study indicate that, for fixed effects, both coefficients estimates and accompanied standard error estimates are not biased. For random effects, results are affected at level 2 but not at level 1 by the presence of an interaction variance and/or a correlation between the residual of level two factors. Results from the analysis of the Early Childhood Longitudinal Study and the National Educational Longitudinal Study agree with those obtained from simulated data. We recommend that researchers attempt to include interaction effects of cross‐classified factors in their models.  相似文献   

14.
The problem of predicting universe scores for samples of examinees based on their responses to samples of items is treated. A general measurement procedure is described in which multiple test forms are developed from a table of specifications and each form is administered to a different sample of examinees. The measurement model categorizes items according to the cells of such a table, and the linear function derived for minimizing error variance in prediction uses responses to these categories. In addition, some distinctions are drawn between aspects of the approach taken here and the familiar regressed score estimates.The author thanks Robert L. Brennan, Michael J. Kolen, and Richard Sawyer for helpful comments and corrections, and anonymous reviewers for suggested improvements.  相似文献   

15.
Exposure to a few task-relevant numerical facts (seed facts) often improves subsequent numerical estimates. We performed two experiments to investigate the mechanism that produces these seeding effects. In Experiment 1, participants estimated national populations; in Experiment 2, they estimated between-city distances. In both, items were selected so that the actual value of the seed facts (SA) was, on average, below participants' initial estimates for those items (S1) and above the initial estimates for the transfer items (T1). Given this configuration, the anchoring position predicts that the postseeding transfer estimates should be greater than the preseeding transfer estimates (T2 > T1), whereas the feedback/induction position predicts the opposite (T2 < T1). In both experiments, the latter pattern of results emerged, supporting the conclusion that seeds aren't anchors.  相似文献   

16.
In a lexical decision task (LDT) in which list composition is manipulated, a typical finding to date has been a slowdown for easy items (e.g., high-frequency words) but little speedup for hard items (e.g., low-frequency words) when they are mixed together. This asymmetric frequency-blocking effect contrasts with the symmetric pattern (both a speedup for hard items and a slowdown for easy items when they are mixed together) observed with the naming task. In the present study, we investigated the mechanism responsible for the asymmetric blocking effect in the LDT within a model of blocking effect proposed by Mozer, Kinoshita, and Davis (2003), termed the adaptation-to-the-statistics-of-the-environment (ASE) model. Experiments 1A and 1B showed that when the same high- and low-frequency words were used, consistent with the existing literature, an asymmetric blocking effect was found in the LDT and a symmetric blocking effect was found in the naming task. Within the ASE model, a symmetric versus asymmetric blocking effect can be explained in terms of different asymptotic rates in subjective estimates of error probability. Experiments 2 and 3 tested and confirmed a prediction of the model based on this assumption that a speedup of hard items would be observed in an LDT with hard items whose subjective error probability asymptotes near zero (low-frequency words with high familiarity ratings that subjects could be certain were words). Implications of the model for task differences in reaction times are discussed.  相似文献   

17.
The purpose of this study was to examine the extent to which domain‐specific components, such as content and type of task, influence divergent thinking and creativity by comparing the performance of 112 ninth‐grade students on two parallel divergent‐thinking tests. The Verbal Forms of the Torrance Tests of Creative Thinking (TTCT) represented the domain‐independent measures, while two forms of a Creativity in History Test (CHT), whose items corresponded closely to those of the TTCT, served as the content‐specific measures. The results indicated that both content‐specific and task‐specific factors have significant effects on divergent‐thinking and creative performance. Implications concerning the definition of creativity as a construct and its measurement are discussed.  相似文献   

18.
国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

19.
This paper reports on a simulation study that evaluated the performance of five structural equation model test statistics appropriate for categorical data. Both Type I error rate and power were investigated. Different model sizes, sample sizes, numbers of categories, and threshold distributions were considered. Statistics associated with both the diagonally weighted least squares (cat‐DWLS) estimator and with the unweighted least squares (cat‐ULS) estimator were studied. Recent research suggests that cat‐ULS parameter estimates and robust standard errors slightly outperform cat‐DWLS estimates and robust standard errors ( Forero, Maydeu‐Olivares, & Gallardo‐Pujol, 2009 ). The findings of the present research suggest that the mean‐ and variance‐adjusted test statistic associated with the cat‐ULS estimator performs best overall. A new version of this statistic now exists that does not require a degrees‐of‐freedom adjustment ( Asparouhov & Muthén, 2010 ), and this statistic is recommended. Overall, the cat‐ULS estimator is recommended over cat‐DWLS, particularly in small to medium sample sizes.  相似文献   

20.
The secondary distinctiveness effect means that items that are unusual compared to one's general knowledge stored in permanent memory are remembered better than common items. This research studied two forms of secondary‐distinctiveness‐based effects in conjunction: the bizarreness effect and the orthographic distinctiveness (OD) effect. More specifically, an experiment investigated in young adults a possible additive effect of bizarreness and OD effects in free recall performance. Results revealed that in young adults these two secondary‐distinctiveness‐based effects appear to be largely independent and can complement each other to enhance performance. Findings are discussed in light of current distinctiveness theory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号