排序方式: 共有18条查询结果,搜索用时 15 毫秒
1.
Sandip Sinharay 《Psychometrika》2016,81(4):992-1013
The \(l_z\) statistic (Drasgow et al. in Br J Math Stat Psychol 38:67–86, 1985) is one of the most popular person-fit statistics (Armstrong et al. in Pract Assess Res Eval 12(16):1–10, 2007). Snijders (Psychometrika 66:331–342, 2001) derived the asymptotic null distribution of \(l_z\) when the examinee ability parameter is estimated. He also suggested the \(l^*_z\) statistic, which is the asymptotically correct standardized version of \(l_z\). However, Snijders (Psychometrika 66:331–342, 2001) only considered tests with dichotomous items. In this paper, the asymptotic null distribution of \(l_z\) is derived for mixed-format tests (those that include both dichotomous and polytomous items). The asymptotically correct standardized version of \(l_z\), which can be considered as the extension of \(l^*_z\) to such tests, is suggested. The Type I error rate and power of the suggested statistic are examined from several simulated datasets. The suggested statistic is computed using a real dataset. The suggested statistic appears to be a satisfactory tool for assessing person fit for mixed-format tests. 相似文献
2.
We present an interface connecting the ACL2 theorem prover with external deduction tools. The ACL2 logic contains several mechanisms for proof structuring, which are important to the construction of industrial-scale proofs. The complexity induced by these mechanisms makes the design of the interface challenging. We discuss some of the challenges, and develop a precise specification of the requirements on the external tools for a sound connection with ACL2. We also develop constructs within ACL2 to enable the developers of external tools to satisfy our specifications. The interface is available with the ACL2 theorem prover starting from Version 3.2, and we describe several applications of the interface. 相似文献
3.
Sandip Sinharay Matthew S. Johnson 《The British journal of mathematical and statistical psychology》2020,73(3):397-419
According to Wollack and Schoenig (2018, The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage, 260), benefiting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect examinees who may have benefited from item preknowledge for the case when the set of compromised items is known. The asymptotic distribution of the new statistic under no preknowledge is proved to be a simple mixture of two χ2 distributions. We perform a detailed simulation study to show that the Type I error rate of the new statistic is very close to the nominal level and that the power of the new statistic is satisfactory in comparison to that of the existing statistics for detecting item preknowledge based on both item scores and response times. We also include a real data example to demonstrate the usefulness of the suggested statistic. 相似文献
4.
Shelby J. Haberman Lili Yao Sandip Sinharay 《The British journal of mathematical and statistical psychology》2015,68(2):363-385
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE®General Analytical Writing and until 2009 in the case of TOEFL® iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e‐rater®. In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. 相似文献
5.
Sandip Ghosh Chowdhury Pankaj Kumar Arpan Das S.K. Das B. Mahato P.K. De 《Philosophical Magazine Letters》2013,93(6):407-414
The present article deals with the analysis of grain-boundary character distribution (GBCD) and microstructural characteristics after iterative processing of austenitic stainless steel, AISI 316L. The steel was subjected to iterative cold reduction and subsequent annealings. After an initial decrease in the fraction of Σ3 boundaries, the number of these increases in subsequent steps. The results relate the importance of iterative processing and the mechanism of obtaining a higher fraction of Σ3 boundaries. 相似文献
6.
Tatsuoka suggested several extended caution indices and their standardized versions, and these have been used as person-fit statistics by various researchers. However, these indices are only defined for tests with dichotomous items. This paper extends two of the popular standardized extended caution indices for use with polytomous items and mixed-format tests. Two additional new person-fit statistics are obtained by applying the asymptotic standardization of person-fit statistics for mixed-format tests. Detailed simulations are then performed to compute the Type I error rate and power of the four new person-fit statistics. Two real data illustrations follow. The new person-fit statistics appear to be satisfactory tools for assessing person fit for polytomous items and mixed-format tests. 相似文献
7.
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups—for example, those based on gender or ethnicity—on subtests. Several researchers found that the difference in performance between the gender-based subgroups varied over the different subtests. In this article, we examine whether the added values of the subscores vary between subgroups using data from several operational tests, including an international English proficiency test. For these data sets, the added values of the subscores occasionally vary over the subgroups, but the added values of the augmented subscores are invariant over the subgroups. 相似文献
8.
Psychometrika - In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of... 相似文献
9.
10.
Shelby Haberman Dr Sandip Sinharay Gautam Puhan 《The British journal of mathematical and statistical psychology》2009,62(1):79-95
Recently, there has been an increasing level of interest in reporting subscores for components of larger assessments. This paper examines the issue of reporting subscores at an aggregate level, especially at the level of institutions to which the examinees belong. A new statistical approach based on classical test theory is proposed to assess when subscores at the institutional level have any added value over the total scores. The methods are applied to two operational data sets. For the data under study, the observed results provide little support in favour of reporting subscores for either examinees or institutions. 相似文献