Abstract: | Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non-uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed. |