Affiliation: | 1. School of Economics and Social Sciences, Helmut Schmidt University, Hamburg, Germany;2. Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands Contribution: Conceptualization, Methodology, Writing - review & editing;3. School of Economics and Social Sciences, Helmut Schmidt University, Hamburg, Germany Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA Contribution: Conceptualization, Funding acquisition, Methodology, Project administration, Software, Supervision, Writing - review & editing |
Abstract: | Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided. |