共查询到20条相似文献,搜索用时 15 毫秒
1.
计算机化分类测验(Computerized Classification Testing, CCT)能够高效地对被试进行分类, 已广泛应用于合格性测验及临床心理学中。作为CCT的重要组成部分, 终止规则决定测验何时停止以及将被试最终划分到何种类别, 因此直接影响测验效率及分类准确率。已有的三大类终止规则(似然比规则、贝叶斯决策理论规则及置信区间规则)的核心思想分别为构造假设检验、设计损失函数和比较置信区间相对位置。同时, 在不同测验情境下, CCT的终止规则发展出不同的具体形式。未来研究可以继续开发贝叶斯规则、考虑多维多类别情境以及结合作答时间和机器学习算法。针对测验实际需求, 三类终止规则在合格性测验上均有应用潜力, 而临床问卷则倾向应用贝叶斯规则。 相似文献
2.
计算机化分类测验(Computerized Classification Testing, CCT)由于具备分类的功能, 目前在职业资格考试、健康与护理问卷等以分类为目的的测验中得到广泛应用。作为CCT的重要组成部分, 终止规则不仅决定测验停止的条件而且直接影响分类准确率及测验效率。然而, 目前少有研究对多维CCT (Mulitidimensional CCT, MCCT)的终止规则进行探索。针对已有MCCT终止规则的不足, 提出两种新的MCCT终止规则(即基于马氏距离的多维序贯似然比规则Mahalanobis-SPRT和随机缩减的多维广义似然比规则M-SCGLR), 并开展模拟研究在不同实验条件下(比如, 不同的题库结构、能力维度间相关及分界函数)考查它们的表现。结果表明:(1)在使用补偿性分界函数的条件下, Mahalanobis-SPRT规则具有较高的分类精度和与同类方法相近的测验长度; (2)在几乎所有实验条件下, M-SCGLR规则不仅在测验精度上大幅优于已有的多维随机缩减规则, 而且具有较短的测验长度。 相似文献
3.
本研究借鉴传统计算机化自适应测验的思想, 并结合认知诊断的特点, 在认知诊断框架下提出了4种变长CD-CAT的终止规则, 分别是属性标准误法(SEA)、邻近后验概率之差法(DAPP)、二等分法(HA)以及混合法(HM)。在未控制曝光和采用不同曝光控制条件下, 与HSU法及KL法进行了比较。研究结果表明:(1) 终止条件越严格, 平均测验长度越长, 按测验长度最大值终止的测验百分比越大, 模式判准率越高。(2) 当未加入曝光控制时, 4种新的终止规则均有较好表现, 与HSU法十分接近。随着最大后验概率预设值的增加或e的减小, 模式判准率呈上升趋势, 平均测验长度逐渐增加, 但在题库使用率方面均较差。(3) 当加入项目曝光控制时, 6种变长终止规则下的题库使用率有了极大的提升, 仍能保持较高的模式判准率, 并且不同的曝光控制方法对终止规则的影响是不同的。其中, 相对标准终止规则极易受到曝光控制方法的影响。(4) 综合来看, SEA、HM以及HA法在各项指标上的表现与HSU法基本一致, 其次为KL法和DAPP法。 相似文献
4.
本文由张华华教授讲座内容整理而成,主要是关于计算机化考试的理论、技术、方法,以及美国教育进展评估给中国教育评估带来的启示.本文对于较全面深入了解计算机考试相关内容的读者具有重要借鉴意义. 相似文献
5.
Scott B. Morris Michael Bass Elizabeth Howard Richard E. Neapolitan 《International Journal of Testing》2020,20(2):146-168
The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT. 相似文献
6.
Lennart Schneider R. Philip Chalmers Rudolf Debelak Edgar C. Merkle 《Multivariate behavioral research》2020,55(5):664-684
AbstractIn this paper, we apply Vuong’s general approach of model selection to the comparison of nested and non-nested unidimensional and multidimensional item response theory (IRT) models. Vuong’s approach of model selection is useful because it allows for formal statistical tests of both nested and non-nested models. However, only the test of non-nested models has been applied in the context of IRT models to date. After summarizing the statistical theory underlying the tests, we investigate the performance of all three distinct Vuong tests in the context of IRT models using simulation studies and real data. In the non-nested case we observed that the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, we observed that the tests typically perform as well as or sometimes better than the traditional likelihood ratio test. Based on these results, we argue that Vuong’s approach provides a useful set of tools for researchers and practitioners to effectively compare competing nested and non-nested IRT models. 相似文献
7.
Indrani Basak 《Journal of Multi-Criteria Decision Analysis》2015,22(3-4):161-166
In analytic hierarchy process (AHP), a ratio scale (π1, π2, ⋯, πt) for the priorities of the alternatives {T1, T2, ⋯, Tt} is used for a decision problem in which πi/πj is used to quantify the ratio of the priority of Ti to that of Tj. In a group decision‐making setup, the subjective estimates of πi/πj are obtained as entries of a pairwise comparison matrix for each member of the group. On the basis of these pairwise comparison matrices, one of the topics of interest in some situation is the total rank ordering of the priorities of the alternatives. In this article, a statistical method is proposed for testing a specific total rank ordering of the priorities of the alternatives. The method developed is then illustrated using numerical examples. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
8.
Xiao Li Jinming Zhang Hua-hua Chang 《The British journal of mathematical and statistical psychology》2020,73(1):88-108
Content balancing is one of the most important issues in computerized classification testing. To adapt to variable-length forms, special treatments are needed to successfully control content constraints without knowledge of test length during the test. To this end, we propose the notions of ‘look-ahead’ and ‘step size’ to adaptively control content constraints in each item selection step. The step size gives a prediction of the number of items to be selected at the current stage, that is, how far we will look ahead. Two look-ahead content balancing (LA-CB) methods, one with a constant step size and another with an adaptive step size, are proposed as feasible solutions to balancing content areas in variable-length computerized classification testing. The proposed LA-CB methods are compared with conventional item selection methods in variable-length tests and are examined with different classification methods. Simulation results show that, integrated with heuristic item selection methods, the proposed LA-CB methods result in fewer constraint violations and can maintain higher classification accuracy. In addition, the LA-CB method with an adaptive step size outperforms that with a constant step size in content management. Furthermore, the LA-CB methods generate higher test efficiency while using the sequential probability ratio test classification method. 相似文献
9.
Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test for homogeneity of means and covariances across missing data patterns. The likelihood approach cannot be implemented adequately if a pattern of missing data contains very few subjects. A generalized least squares rationale is used to develop parallel tests that are expected to be more stable in small samples. Three factors were varied for a simulation: number of variables, percent missing completely at random, and sample size. One thousand data sets were simulated for each condition. The generalized least squares test of homogeneity of means performed close to an ideal Type I error rate for most of the conditions. The generalized least squares test of homogeneity of covariance matrices and a combined test performed quite well also.Preliminary results on this research were presented at the 1999 Western Psychological Association convention, Irvine, CA, and in the UCLA Statistics Preprint No. 265 (http://www.stat.ucla.edu). The assistance of Ke-Hai Yuan and several anonymous reviewers is gratefully acknowledged. 相似文献
10.
选择题中的作答选项能提供额外诊断信息,为充分利用选项信息,研究提出认知诊断计算机自适应测验(CD-CAT)中两种处理选择题选项信息的非参数选题策略和变长终止规则。模拟研究的结果发现:(1)定长条件下两种非参数选题策略的分类准确性整体要高于参数选题策略;(2)两种非参数选题策略较参数选题策略具有更加均衡的题库使用情况;(3)非参数选题策略在两种新的变长终止规则下具有更高的分类准确率;(4)两种非参数选题策略均适用于选择题CD-CAT情境,使用者可任选其一进行测验分析。 相似文献
11.
Two-group classification in latent trait theory: Scores with monotone likelihood ratio 总被引:1,自引:0,他引:1
D. A. Grayson 《Psychometrika》1988,53(3):383-392
This paper deals with two-group classification when a unidimensional latent trait,, is appropriate for explaining the data,X. It is shown that ifX has monotone likelihood ratio then optimal allocation rules can be based on its magnitude when allocation must be made to one of two groups related to. These groups may relate to probabilistically via a non-decreasing functionp(), or may be defined by all subjects above or below a selected value on.In the case where the data arise from dichotomous items, then only the assumption that the items have nondecreasing item characteristic functions is enough to ensure that the unweighted sum of responses (the number-right score or raw score) possesses this fundamental monotone likelihood ratio property. 相似文献
12.
Klaus D. Kubinger 《心理学报》2009,41(10):1024-1036
目前多数人格测验(特别是在中国使用的人格测验)基本上都是人格问卷, 基于实验的行为评估类客观化人格测验应用很少; 而后者近来在德语圈国家中则有复苏的迹象。因此, 本文综述了此类客观测验相对于人格问卷来说所具有的特点和优势, 如, 被试很难在这类客观化人格测验中作伪。本文介绍了维也纳研究小组所做的几个测验, 并讨论了这些测验的心理测量学性质和缺点。最后, 还列举了这些测验的实际应用。 相似文献
13.
提出两种认知诊断计算机自适应测验下平衡属性收敛的新方法(MABI、RTA),模拟研究系统探讨和比较了此二者与已有方法(ABI、IABI和RABI)的表现。结果发现:(1)新方法较不考虑属性收敛的方法有更高的准确率以及更均衡的题目使用率;(2)新方法较ABI和RABI有稍低的准确性,但有更平衡的题目使用率;(3)新方法与IABI的准确性和题目使用率在不同选题策略下各有合优势。总之,两种新方法较好地兼顾测量准确性、题目使用率以及题库曝光情况。 相似文献
14.
提出两种认知诊断计算机自适应测验下平衡属性收敛的新方法(MABI、RTA),模拟研究系统探讨和比较了此二者与已有方法(ABI、IABI和RABI)的表现。结果发现:(1)新方法较不考虑属性收敛的方法有更高的准确率以及更均衡的题目使用率;(2)新方法较ABI和RABI有稍低的准确性,但有更平衡的题目使用率;(3)新方法与IABI的准确性和题目使用率在不同选题策略下各有合优势。总之,两种新方法较好地兼顾测量准确性、题目使用率以及题库曝光情况。 相似文献
15.
本文深入探讨了手机依赖的新的测量方式即计算机化自适应测量,并同时与原手机依赖的纸笔测验量表进行了比较,以探讨在相同测量长度下,新测量方式能在多大程度上提升对手机依赖的测量精度及测量信度。为此本文共进行了两项研究:研究1开发构建了手机依赖的计算机化自适应测量(CAT-MPD),并探讨了其测量的特征与性能。研究2通过对真实数据的模拟CAT测试,以CAT-MPD来源的纸笔测验量表为比较对象,考察CAT-MPD对原纸笔量表测量精度及测量信度的提升幅度。研究结果表明:CAT-MPD具有较理想的测量精度和信度,而且能有效减少被试所测项目量;同时,在同等条件下,CAT-MPD不论是在测量的精度还是测量的信度,均大幅优于手机依赖的纸笔测验量表。总之,本文为手机依赖的实际测量提供了一种新的技术支持。 相似文献
16.
Suzanne B. Shu 《决策行为杂志》2008,21(4):352-377
Decision‐makers with ideal candidates already in mind often extend search beyond optimal endpoints when searching for the best option among a sequential list of alternatives. Extended search is investigated here using three laboratory experiments; individuals in these tasks exhibit future‐bias, delaying choice beyond normative benchmarks. Searchers' behavior is consistent with setting high thresholds based on a focal ideal outcome without full attention to its probability or the value of second‐best alternatives; the behavior is partially debiased by manipulating which outcomes are in the searchers' focal set. Documenting future‐bias in sequential search tasks offers new insights for understanding self‐control and intertemporal choice by providing a situation in which thresholds may be set too high and myopic behavior does not prevail. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
17.
Bhargab Chattopadhyay 《Multivariate behavioral research》2016,51(5):627-648
The coefficient of variation is an effect size measure with many potential uses in psychology and related disciplines. We propose a general theory for a sequential estimation of the population coefficient of variation that considers both the sampling error and the study cost, importantly without specific distributional assumptions. Fixed sample size planning methods, commonly used in psychology and related fields, cannot simultaneously minimize both the sampling error and the study cost. The sequential procedure we develop is the first sequential sampling procedure developed for estimating the coefficient of variation. We first present a method of planning a pilot sample size after the research goals are specified by the researcher. Then, after collecting a sample size as large as the estimated pilot sample size, a check is performed to assess whether the conditions necessary to stop the data collection have been satisfied. If not an additional observation is collected and the check is performed again. This process continues, sequentially, until a stopping rule involving a risk function is satisfied. Our method ensures that the sampling error and the study costs are considered simultaneously so that the cost is not higher than necessary for the tolerable sampling error. We also demonstrate a variety of properties of the distribution of the final sample size for five different distributions under a variety of conditions with a Monte Carlo simulation study. In addition, we provide freely available functions via the MBESS package in R to implement the methods discussed. 相似文献
18.
When some of observed variates do not conform to the model under consideration, they will have a serious effect on the results of statistical analysis. In factor analysis the model with inconsistent variates may result in improper solutions. In this article a useful method for identifying a variate as inconsistent is proposed in factor analysis. The procedure is based on the likelihood principle. Several statistical properties such as the effect of misspecified hypotheses, the problem of multiple comparisons, and robustness to violation of distributional assumptions are investigated. The procedure is illustrated by some examples. 相似文献
19.
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.The authors would like to express their appreciation to a number of individuals who provided assistance during the conduct of this research. Those who deserve recognition include Roger Blashfield, John Crawford, John Gower, James Lingoes, Wansoo Rhee, F. James Rohlf, Warren Sarle, and Tom Soon. 相似文献
20.
Elwood RW 《Neuropsychology review》2001,11(2):89-100
MicroCog: Assessment of Cognitive Functioning version 2.1 (Powell, D. H., Kaplan, E. F., Whitla, D., Catlin, R., and Funkenstein, H. H. (1993). The Psychological corporation, San Antonio, TX.) is one of the first computerized assessment batteries commercially developed to detect early signs of cognitive impairment. This paper reviews its psychometric characteristics and relates them to its clinical utility. It concludes that MicroCog provides an accurate, cost-effective screen for early dementia among elderly subjects living in the community and that it can distinguish dementia from depression. Its ability to detect cognitive decline at other ages or to discriminate dementia from other mental disorders has not been established. MicroCog measures different constructs than do traditional neuropsychological tests, making it difficult to relate test performance to current models of cognitive functioning. The review recommends further development of MicroCog and discusses its implications for the future of computer-based neuropsychological assessment. 相似文献