Imputation of missing categorical data by maximizing internal consistency |
| |
Authors: | Stef van Buuren Jan L. A. van Rijckevorsel |
| |
Affiliation: | (1) TNO Institute of Preventive Health Care, PO Box 124, 2300 AC Leiden, The Netherlands |
| |
Abstract: | This paper suggests a method to supplant missing categorical data by reasonable replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.We thank Anneke Bloemhoff of NIPG-TNO for compiling and making the Dutch Life Style Survey data available to use, and Chantal Houée and Thérèse Bardaine, IUT, Vannes, France, exchange students under the COMETT program of the EC, for computational assistance. We also thank Donald Rubin, the Editors and several anonymous reviewers for constructive suggestions. |
| |
Keywords: | missing data correlation ratio optimal scaling |
本文献已被 SpringerLink 等数据库收录! |
|