全文获取类型
收费全文 | 417篇 |
免费 | 46篇 |
国内免费 | 107篇 |
专业分类
570篇 |
出版年
2024年 | 1篇 |
2023年 | 11篇 |
2022年 | 19篇 |
2021年 | 31篇 |
2020年 | 25篇 |
2019年 | 36篇 |
2018年 | 25篇 |
2017年 | 28篇 |
2016年 | 33篇 |
2015年 | 14篇 |
2014年 | 28篇 |
2013年 | 32篇 |
2012年 | 18篇 |
2011年 | 14篇 |
2010年 | 11篇 |
2009年 | 8篇 |
2008年 | 15篇 |
2007年 | 19篇 |
2006年 | 22篇 |
2005年 | 19篇 |
2004年 | 5篇 |
2003年 | 8篇 |
2002年 | 9篇 |
2001年 | 13篇 |
2000年 | 8篇 |
1999年 | 10篇 |
1998年 | 12篇 |
1997年 | 6篇 |
1996年 | 6篇 |
1995年 | 5篇 |
1994年 | 8篇 |
1993年 | 7篇 |
1992年 | 8篇 |
1991年 | 7篇 |
1990年 | 5篇 |
1989年 | 7篇 |
1988年 | 6篇 |
1987年 | 6篇 |
1986年 | 5篇 |
1985年 | 4篇 |
1984年 | 5篇 |
1983年 | 2篇 |
1982年 | 1篇 |
1981年 | 3篇 |
1979年 | 1篇 |
1977年 | 2篇 |
1976年 | 1篇 |
1975年 | 1篇 |
排序方式: 共有570条查询结果,搜索用时 31 毫秒
561.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。 相似文献
562.
Sonya K. Sterba 《Multivariate behavioral research》2019,54(2):264-287
In structural equation modeling applications, parcels—averages or sums of subsets of item scores—are often used as indicators of latent constructs. Parcel-allocation variability (PAV) is variability in results that arises within sample across alternative item-to-parcel allocations. PAV can manifest in all results of a parcel-level model (e.g., model fit, parameter estimates, standard errors, and inferential decisions). It is a source of uncertainty in parcel-level model results that can be investigated, reported, and accounted for. Failing to do so raises representativeness and replicability concerns. However, in recent methodological literature (Cole, Perkins, &; Zelkowitz, 2016; Little, Rhemtulla, Gibson, &; Shoemann, 2013; Marsh, Ludtke, Nagengast, Morin, &; von Davier, 2013; Rhemtulla, 2016) parceling has been justified and recommended in several situations without quantifying or accounting for PAV. In this article, we explain and demonstrate problems with these rationales. Overall, we find that: (1) using a purposive parceling algorithm for a multidimensional construct does not avoid PAV; (2) passing a test of unidimensionality of the item-level model need not avoid PAV; and (3) a desire to improve power for detecting structural misspecification does not warrant parceling without addressing PAV; we show how to simultaneously avoid PAV and obtain even higher power by comparing item-level models differing in structural constraints. Implications for practice are discussed. 相似文献
563.
564.
Esther Ulitzsch Matthias von Davier Steffi Pohl 《Multivariate behavioral research》2020,55(3):425-453
AbstractFor adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuable information on the processes associated with nonresponse. Bringing together research on item omissions with approaches for modeling response time data, we propose a framework for simultaneously modeling response behavior and omission behavior utilizing timing information for both. As such, the proposed model allows (a) to gain a deeper understanding of response and nonresponse behavior in general and, in particular, of the processes underlying item omissions in LSAs, (b) to model the processes determining the time examinees require to generate a response or to omit an item, and (c) to account for nonignorable item omissions. Parameter recovery of the proposed model is studied within a simulation study. An illustration of the model by means of an application to real data is provided. 相似文献
565.
Michalis P. Michaelides Militsa Ivanova Christiana Nicolaou 《International Journal of Testing》2020,20(3):187-205
The study examined the relationship between examinees’ test-taking effort and their accuracy rate on items from the PISA 2015 assessment. The 10% normative threshold method was applied on Science multiple-choice items in the Cyprus sample to detect rapid guessing behavior. Results showed that the extent of rapid guessing across simple and complex multiple-choice items was on average less than 6% per item. Rapid guessers were identified, and for most items their accuracy was lower than the accuracy for students engaging in solution-based behavior. A number of plausible explanations were graphically evaluated for items for which accuracy was higher for the rapid guessing subgroup. Overall, this empirical investigation presents original evidence on test-taking effort as measured by response time in PISA items and tests propositions of Wise’s (2017) Test-Taking Theory. 相似文献
566.
Yanyan Fu Tyler Strachan Edward H. Ip John T. Willse Shyh-Huei Chen Terry Ackerman 《International Journal of Testing》2020,20(2):169-186
This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and conditions. When a test measured weakly discriminated dimensions, it became harder to recover the latent correlation. Results also showed that increasing the sample size, test length, or using simpler models (i.e., two-parameter logistic rather than three-parameter logistic, compensatory rather than noncompensatory) could improve the recovery of latent correlation. 相似文献
567.
William C. M. Belzak 《Multivariate behavioral research》2020,55(5):722-747
AbstractDifferential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity—the number of model parameters estimated simultaneously—affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples. 相似文献
568.
Personality development research heavily relies on the comparison of scale means across age. This approach implicitly assumes that the scales are strictly measurement invariant across age. We questioned this assumption by examining whether appropriate personality indicators change over the lifespan. Moreover, we identified which types of items (e.g. dispositions, behaviours, and interests) are particularly prone to age effects. We reanalyzed the German Revised NEO Personality Inventory normative sample (N = 11,724) and applied a genetic algorithm to select short scales that yield acceptable model fit and reliability across locally weighted samples ranging from 16 to 66 years of age. We then examined how the item selection changes across age points and item types. Emotion‐type items seemed to be interchangeable and generally applicable to people of all ages. Specific interests, attitudes, and social effect items—most prevalent within the domains of Extraversion, Agreeableness, and Openness—seemed to be more prone to measurement variations over age. A large proportion of items were systematically discarded by the item‐selection procedure, indicating that, independent of age, many items are problematic measures of the underlying traits. The implications for personality assessment and personality development research are discussed. © 2019 European Association of Personality Psychology 相似文献
569.
Jung Aa Moon Sandip Sinharay Madeleine Keehner Irvin R. Katz 《International Journal of Testing》2020,20(2):122-145
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level. 相似文献
570.
Scott B. Morris Michael Bass Elizabeth Howard Richard E. Neapolitan 《International Journal of Testing》2020,20(2):146-168
The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the SE is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative symptoms), and the bank may lack depth for many individuals. In such cases, the predicted standard error reduction (PSER) stopping rule will stop the CAT even if the SE threshold has not been reached and can avoid administering excessive questions that provide little additional information. By tuning the parameters of the PSER algorithm, a practitioner can specify a desired tradeoff between accuracy and efficiency. Using simulated data for the Patient-Reported Outcomes Measurement Information System Anxiety and Physical Function banks, we demonstrate that these parameters can substantially impact CAT performance. When the parameters were optimally tuned, the PSER stopping rule was found to outperform the SE stopping rule overall, particularly for individuals not targeted by the bank, and presented roughly the same number of items across the trait continuum. Therefore, the PSER stopping rule provides an effective method for balancing the precision and efficiency of a CAT. 相似文献