Package:
mice
Authors:
Stef van Buuren [aut, cre], Karin Groothuis-Oudshoorn [aut], Gerko Vink [ctb], Rianne Schouten [ctb], Alexander Robitzsch [ctb], Patrick Rockenschaub [ctb], Lisa Doove [ctb], Shahab Jolani [ctb], Margarita Moreno-Betancur [ctb], Ian White [ctb], Philipp Gaffert [ctb], Florian Meinfelder [ctb], Bernie Gray [ctb], Vincent Arel-Bundock [ctb], Mingyang Cai [ctb], Thom Volker [ctb], Edoardo Costantini [ctb], Caspar van Lissa [ctb], Hanne Oberman [ctb], Stephen Wade [ctb]
Category:
Multiple Imputation
Use-Cases:
Multiple imputation for mixes of continuous, binary, unordered categorical and ordered categorical data, Inspect the missing data, Generate simulated incomplete data
Popularity:
Description:
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) doi:10.18637/jss.v045.i03. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Last update:
Algorithms:
pmmPredictive mean matchingmidastouchWeighted predictive mean matchingsampleRandom sample from observed valuescartClassification and regression treesrfRandom forest imputationsmeanUnconditional mean imputationnormBayesian linear regressionnorm.nobLinear regression ignoring model errornorm.bootLinear regression using bootstrapnorm.predictLinear regression, predicted valuesquadraticImputation of quadratic termsriRandom indicator for nonignorable datalogregLogistic regressionlogreg.bootLogistic regression with bootstrappolrProportional odds modelpolyregPolytomous logistic regressionldaLinear discriminant analysis2l.normLevel-1 normal heteroscedastic2l.lmerLevel-1 normal homoscedastic, lmer2l.panLevel-1 normal homoscedastic, pan2l.binLevel-1 logistic, glmer2lonly.meanLevel-2 class mean2lonly.normLevel-2 class normal2lonly.pmmLevel-2 class predictive mean matching
Datasets:
boys(Growth of Dutch boys)brandsma(Brandsma school data, Snijders and Bosker, 2012)employee(Employee selection data)fdd(SE Fireworks disaster data)fdgs(Fifth Dutch growth study,2009)leiden85(Leiden 85+ study)mammalsleep(Mammal sleep data)mgg(Self-reported and measured BMI)nhanes(NHANES example - all variables numerical)pattern(Datasets with various missing data patterns)pattern1(Datasets with various missing data patterns)pattern2(Datasets with various missing data patterns)pattern3(Datasets with various missing data patterns)pattern4(Datasets with various missing data patterns)popmis(Hox pupil popularity data with missing popularity scores)pops(Project on preterm and small for gestational age infants)potthoffroy(Potthoff-Roy data)selfreport(Self-reported and measured BMI)sleep Mammal(sleep data)tbc Terneuzen(birth cohort)walking(disability data)windspeed(Subset of Irish wind speed data)
Further Information:
van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67.
Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.
https://datascienceplus.com/imputing-missing-data-with-r-mice-package/
Input:
data.frame
Example:
# classic MICE/multiple imputation workflow
library("mice")
#Perform imputation - create multiple imputed datasets
imp <- mice(nhanes, maxit = 2, m = 2)
# Fit a lm model on each of the datasets
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
# Pool the models/results
summary(pool(fit))
Here you can have a interactive look at the example:
https://rdrr.io/snippets/embedding/