| Airquality | Multivariate, Time Series | Real | 154 | 6 | 7 | No | 1973 |
This data set contains daily air quality measurements in New York (May to September 1973) and presents missing values in some variables. It can be loaded in R by calling <code> data(airquality)</code>.
<br>
<br><a href="https://stat.ethz.ch/R-manual/R-devel/RHOME/library/datasets/html/airquality.html" target="_blank">More information on the dataset</a>.
<br>
<br>
Tutorials illustrating methods on this data:
<ul>
<li> Nick Tierney's <code>naniar</code> <a href="https://cran.r-project.org/web/packages/naniar/vignettes/naniar-visualisation.html" target="_blank">vignette</a> for missing data visualization.</li>
</ul>
<br>
</div>
</td>
|
| chorizonDL | Multivariate | Integer, Real | 606 | 110 | 15 | Yes | 1998 |
From the <code>mvoutlier</code> package description: "The Kola Data were collected in the Kola Project (1993-1998, Geological Surveys of Finland (GTK) and Norway (NGU) and Central Kola Expedition (CKE), Russia). More than 600 samples in five different layers were analysed, this dataset contains the C-horizon."
<br>
<br><a href="https://cran.r-project.org/web/packages/mvoutlier/mvoutlier.pdf" target="_blank">More information on the dataset</a>.
<br>
<br> In the <a href="https://cran.r-project.org/web/packages/VIM/VIM.pdf" target="_blank">VIM</a> all outliers have been recoded as NA. It can be loaded by calling <code> data(chorizonDL)</code>.
<br>
</div>
</td>
|
| Health Nutrition And Population Statistics | Multivariate, Time Series | Integer, Real | 15,022 | 397 | 54 | No | 2017 |
"Health Nutrition and Population Statistics database provides key health, nutrition and population statistics gathered from a variety of international and national sources. Themes include global surgery, health financing, HIV/AIDS, immunization, infectious diseases, medical resources and usage, noncommunicable diseases, nutrition, population dynamics, reproductive health, universal health coverage, and water and sanitation." (Data website of the World Bank Group, January 23th 2019)
<br>
<br>The data have been gathered from 259 countries over the last 58 years.
<br><a href="https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics" target="_blank">More information on the dataset</a> on the Wold Bank Group website.
<br>
<br><a href="http://user2019.r-project.org/datathon/">R Datathon</a> on this dataset organized by the useR! 2019 conference.
<br>
</div>
</td>
|
| NHANES | Multivariate | Categorical, Integer, Real | 10,000 | 75 | 37 | No | 2012 |
R-package <a href="https://cran.r-project.org/web/packages/NHANES/" target="_blank">NHANES</a> containing data from the US National Health and Nutrition Examination Study. The data comprises body shape and related measurements from the US National Health and Nutrition Examination Survey (NHANES, 1999-2004 and 2009-2012, <a href="http://www.cdc.gov/nchs/nhanes.htm" target="_blank">more details on the survey</a>).
<br>
<br>
Tutorials illustrating methods on this data:
<ul>
<li> Stef van Buuren's <a href="https://www.gerkovink.com/miceVignettes/Ad_hoc_and_mice/Ad_hoc_methods.html" target="_blank">vignette</a> for ad hoc methods and <code>mice</code>.</li>
<li> Jerry Reiter's <a href="/tutorials/Reiter_course_MultipleImputationOverview_2018/Reiter_script_MultipleImputationMICE_2018.html" target="_blank">course</a> on multiple imputation.</li>
</ul>
<br>
</div>
</td>
|
| oceanbuoys | Multivariate, Time Series | Real | 736 | 8 | 3 | No | 1997 |
West Pacific Tropical Atmosphere Ocean Data. The data is collected by the Tropical Atmosphere Ocean project and contains real-time data from moored ocean buoys. It can be found in R in the <a href="https://cran.r-project.org/web/packages/naniar/index.html" target="_blank"><code>naniar</code></a> package and is loaded by calling <code> data(oceanbuoys)</code>.
<br>
<br><a href="https://www.pmel.noaa.gov/tao/drupal/disdel/" target="_blank">More information on the collected data</a> on the website of the Pacific Marine Environmental Laboratory.
<br>
</div>
</td>
|
| Ozone | Multivariate | Categorical, Integer, Real | 366 | 13 | 6 | No | 1976 |
Los Angeles Ozone Pollution Data, 1976. This data set contains daily measurements of ozone concentration and meteorological quantities. It can be found in R in the <a href="https://cran.r-project.org/web/packages/mlbench/index.html" target="_blank"><code>mlbench</code></a> package and is loaded by calling <code> data(Ozone)</code>.
<br>
<br><a href="https://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/Ozone" target="_blank">More information on the dataset</a>.
<br>
<br>
Tutorials illustrating methods on this data:
<ul>
<li> Julie Josse's <a href="/tutorials/Josse_slides_imputation_PCA_2018.pdf" target="_blank">course</a> on missing values imputation using PC methods.</li>
<li> Julie Josse's and Nick Tierney's tutorial on handling missing values. Download the data set from this tutorial: <a href="/tutorials/ozoneNA.csv">ozoneNA.csv</a></li>
<li> Nick Tierney's <code>naniar</code> <a href="https://cran.r-project.org/web/packages/naniar/vignettes/naniar-visualisation.html" target="_blank">vignette</a> for missing data visualization.</li>
</ul>
<br>
</div>
</td>
|
| pedestrian | Multivariate, Time series | Categorical, Integer | 37,700 | 9 | 2 | No | 2016 |
This data set contains hourly counts of pedestrians from 4 sensors around Melbourne in 2016. It can be found in R in the <a href="https://cran.r-project.org/web/packages/naniar/index.html" target="_blank"><code>naniar</code></a> package and is loaded by calling <code> data(pedestrian)</code>.
<br>
<br><a href="https://data.melbourne.vic.gov.au/Transport-Movement/Pedestrian-volume-updated-monthly-/b2ak-trbp" target="_blank">More information on the collected data</a> on the public data website of the City of Melbourne.
<br>
</div>
</td>
|
| riskfactors | Multivariate | Categorical, Integer, Real | 245 | 34 | 14 | No | 2009 |
The data is a subset of the 2009 survey from the Behavioral Risk Factor Surveillance System designed to measure behavioral risk factors for the adult population living in households. It can be found in R in the <a href="https://cran.r-project.org/web/packages/naniar/index.html" target="_blank"><code>naniar</code></a> package and is loaded by calling <code> data(riskfactors)</code>.
<br>
<br><a href="https://www.cdc.gov/brfss/data_documentation/index.htm" target="_blank">More information on the survey</a> on the website of the Centers for Disease Control and Prevention.
<br>
</div>
</td>
|
| SBS52424 | Multivariate | Real | 262 | 9 | 2 | No | 2016 |
The data contains a synthetic subset of the Austrian structural business statistics (SBS) data, more specifically it contains data on 9 variables of NACE 52.42 (retail sale of clothing). From original Austrian SBS data set of confidential raw data a non-confidential, close-to-reality, synthetic data set was generated. It can be found in R in the <a href="https://cran.r-project.org/web/packages/VIM/index.html" target="_blank"><code>VIM</code></a> package and is loaded by calling <code> data(SBS5242)</code>.
<br>
<br><a href="http://statistik.at/web_en/statistics/Economy/enterprises/structural_business_statistics/index.html" target="_blank">More information on the initial SBS data</a> on the website of Statistik Austria.
<br>
</div>
</td>
|
| sleep | Multivariate | Integer, Real | 62 | 10 | 6 | No | 1976 |
The data contains sleep data. It can be found in R in the <a href="https://cran.r-project.org/web/packages/VIM/index.html" target="_blank"><code>VIM</code></a> package and is loaded by calling <code> data(sleep)</code>.
<br>
<br><a href="https://www.semanticscholar.org/paper/Sleep-in-mammals%3A-ecological-and-constitutional-Allison-Cicchetti/8d4f202354bf0fd1bd445792340e16acc042ec6d" target="_blank">More information about the collected data</a> in Allison, T. and Chichetti, D. (1976) Sleep in mammals: ecological and constitutional correlates. <i>Science</i> <b>194 (4266)</b>, 732-734.
<br>
</div>
</td>
|
| tsAirgap | Time series | Integer | 144 | 1 | 9 | Yes | 1960 |
The data contains monthly totals of international airline passengers between 1949 and 1960. It can be found in R in the <a href="https://cran.r-project.org/web/packages/imputeTS/index.html" target="_blank"><code>imputeTS</code></a> package and is loaded by calling <code> data(tsAirgap)</code>.
<br>
<br><a href="https://www.wiley.com/en-us/Time+Series+Analysis%3A+Forecasting+and+Control%2C+5th+Edition-p-9781118674918" target="_blank">More information on the data</a> in the work from Box & Jenkins.
<br>
</div>
</td>
|
| tsHeating | Time series | Real | 606,837 | 1 | 9 | Yes | 2015 |
The data contains a time series of a heating systems supply temperature, measured from 18.11.2013 - 05:12:00 to 13.01.2015 - 15:08:00 in 1 minute steps. It can be found in R in the <a href="https://cran.r-project.org/web/packages/imputeTS/index.html" target="_blank"><code>imputeTS</code></a> package and is loaded by calling <code> data(tsHeating)</code>. The data comes from the GECCO Industrial Challenge 2015.
<br>
<br><a href="http://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2015/" target="_blank">More information about the challenge</a> on the website of SPOTSeven Lab.
<br>
</div>
</td>
|
| tsNH4 | Time series | Real | 4,552 | 1 | 9 | Yes | 2014 |
The data contains a time series of a NH4 concentration in a wastewater system, measured from 30.11.2010 - 16:10 to 01.01.2011 - 06:40 in 10 minute steps. It can be found in R in the <a href="https://cran.r-project.org/web/packages/imputeTS/index.html" target="_blank"><code>imputeTS</code></a> package and is loaded by calling <code> data(tsHeating)</code>. The data comes from the GECCO Industrial Challenge 2014.
<br>
<br><a href="http://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2014/" target="_blank">More information about the challenge</a> on the website of SPOTSeven Lab.
<br>
</div>
</td>
|