Package:
missMDA
Category:
Single and multiple imputation
Use-Cases:
Imputation of incomplete continuous, categorical or mixed datasets, …
Popularity:
Description:
Imputation of incomplete continuous or categorical datasets; Missing values are imputed with a principal component analysis (PCA), a multiple correspondence analysis (MCA) model or a multiple factor analysis (MFA) model; Perform multiple imputation with and in PCA or MCA.
Last update:
Algorithms:
- Continuous data (multiple) imputation (with Principal Components Analysis)
- Contingency table imputation (with Correspondence Analysis)
- Mixed data (multiple) imputation (with Factorial Analysis of Mixed Data)
- Categorical data (multiple) imputation (with Multiple Correspondence Analysis)
- Structured data imputation (with Multiple Factor Analysis)
- Multilevel mixed data imputation (with Multilevel Factorial Analysis for Mixed Data)
- Overimputation diagnostic
Datasets:
- gene
- geno
- orange
- ozone
- snorena
- TitanicNA
- vnf
Further Information:
- Josse, J. & Husson, F. (2012). Handling missing values in exploratory multivariate data analysis methods. Journal de la SFdS, 153(2), pp. 79-99. PDF (on HAL)
- Julie Josse, Francois Husson (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1-31. doi:10.18637/jss.v070.i01. PDF (on HAL)
- Audigier, V., Husson, F., and Josse, J. (2016). Multiple imputation for continuous variables using a bayesian principal component analysis. Journal of Statistical Computation and Simulation, 86(11):2140-2156. PDF (on arXiv)
- Audigier, V., Husson, F., and Josse, J. (2016). A principal component method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1):5-26. PDF (on arXiv)
- Audigier, V., Husson, F., and Josse, J. (2017). Mimca: multiple imputation for categorical variables with multiple correspondence analysis. Statistics and Computing, 27(2):501-518. PDF (on arXiv)
- Some videos (on Youtube)
Input:
data.frame, mids
Example:
library(missMDA)
data(orange)
print("print data set with NAs")
print(head(orange))
## First the number of components has to be chosen
## (for the imputation step)
## nb <- estim_ncpPCA(orange,ncp.max=5) ## Time consuming, nb = 2
## Imputation
res.comp <- imputePCA(orange,ncp=2)
print("print data set with the imputations")
print(head(res.comp$completeObs))
Here you can have a interactive look at the example:
https://rdrr.io/snippets/embedding/