Package:
mice
Authors:
Stef van Buuren [aut, cre], Karin Groothuis-Oudshoorn [aut], Gerko Vink [ctb], Rianne Schouten [ctb], Alexander Robitzsch [ctb], Patrick Rockenschaub [ctb], Lisa Doove [ctb], Shahab Jolani [ctb], Margarita Moreno-Betancur [ctb], Ian White [ctb], Philipp Gaffert [ctb], Florian Meinfelder [ctb], Bernie Gray [ctb], Vincent Arel-Bundock [ctb], Mingyang Cai [ctb], Thom Volker [ctb], Edoardo Costantini [ctb], Caspar van Lissa [ctb], Hanne Oberman [ctb], Stephen Wade [ctb]
Category:
Multiple Imputation
Use-Cases:
Multiple imputation for mixes of continuous, binary, unordered categorical and ordered categorical data, Inspect the missing data, Generate simulated incomplete data
Popularity:
Description:
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) doi:10.18637/jss.v045.i03. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Last update:
Algorithms:
pmm
Predictive mean matchingmidastouch
Weighted predictive mean matchingsample
Random sample from observed valuescart
Classification and regression treesrf
Random forest imputationsmean
Unconditional mean imputationnorm
Bayesian linear regressionnorm.nob
Linear regression ignoring model errornorm.boot
Linear regression using bootstrapnorm.predict
Linear regression, predicted valuesquadratic
Imputation of quadratic termsri
Random indicator for nonignorable datalogreg
Logistic regressionlogreg.boot
Logistic regression with bootstrappolr
Proportional odds modelpolyreg
Polytomous logistic regressionlda
Linear discriminant analysis2l.norm
Level-1 normal heteroscedastic2l.lmer
Level-1 normal homoscedastic, lmer2l.pan
Level-1 normal homoscedastic, pan2l.bin
Level-1 logistic, glmer2lonly.mean
Level-2 class mean2lonly.norm
Level-2 class normal2lonly.pmm
Level-2 class predictive mean matching
Datasets:
boys
(Growth of Dutch boys)brandsma
(Brandsma school data, Snijders and Bosker, 2012)employee
(Employee selection data)fdd
(SE Fireworks disaster data)fdgs
(Fifth Dutch growth study,2009)leiden85
(Leiden 85+ study)mammalsleep
(Mammal sleep data)mgg
(Self-reported and measured BMI)nhanes
(NHANES example - all variables numerical)pattern
(Datasets with various missing data patterns)pattern1
(Datasets with various missing data patterns)pattern2
(Datasets with various missing data patterns)pattern3
(Datasets with various missing data patterns)pattern4
(Datasets with various missing data patterns)popmis
(Hox pupil popularity data with missing popularity scores)pops
(Project on preterm and small for gestational age infants)potthoffroy
(Potthoff-Roy data)selfreport
(Self-reported and measured BMI)sleep Mammal
(sleep data)tbc Terneuzen
(birth cohort)walking
(disability data)windspeed
(Subset of Irish wind speed data)
Further Information:
van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67.
Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.
https://datascienceplus.com/imputing-missing-data-with-r-mice-package/
Input:
data.frame
Example:
# classic MICE/multiple imputation workflow
library("mice")
#Perform imputation - create multiple imputed datasets
imp <- mice(nhanes, maxit = 2, m = 2)
# Fit a lm model on each of the datasets
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
# Pool the models/results
summary(pool(fit))
Here you can have a interactive look at the example:
https://rdrr.io/snippets/embedding/