Category: Data Science

Readings on data and statistical services for a liberal arts college

I’m beginning to do some scenario planning on what will data and statistical services offered by the library look like in 10-15 years. As part of that activity I’m compiling a list of articles, websites,¬†&¬†presentations that will help inform that perspective.

Many of these come from the article Teaching the next generation of statistics students to ‘think with data’: special issue on statistics and the undergraduate curriculum by Nicholas Horton and Johanna Hardin. That article has a nice section of key articles on statistics in the undergraduate curriculum, from which I’ve made some selections below.

Setting the stage for data science: integration of data management skills in introductory and second courses in statistics. Horton, Baumer, Wickham. 2015. (pdf)

Identifies 5 key elements that deserve greater emphasis in the undergrad curriculum:

  1. “Thinking creatively, but constructively, about data”…data cleaning, data storage
  2. working with data sets of varying sizes and understanding scalability issues…querying databases
  3. command-line skills. The authors mention R, Python. I also would include Unix. The command-driven environment “provide freedom from the un-reproducible point-and-click application paradigm”.
  4. “Experience wrestling with large, messy, complex, challenging data sets…these data are more similar to what analysts actually see in the wild.”
  5. “An ethos of reproducibility”

The article goes onto illustrate examples of utilizing these 5 elements in coursework.


Tidy Data – slides of presentation by Wickham

Data acquisition and preprocessing in studies on humans: what is not taught in statistics classes?

Statistics and Science: A Report of the London Workshop on the Future of the Statistical Sciences 2014 (pdf)

Humanities Data in R

Implications of the Data Revolution for Statistics Education¬†(pdf) 2015 calls for more emphasis on big data, data visualization, and developing an “aesthetic for data handling and modeling based on solving practical problems”.

A data science course for undergraduates: thinking with data (pdf)

A cognitive interpretation of data analysis

Teaching and learning data visualization: ideas and assignments (pdf) 2015

Meeting Student Needs for Multivariate Data Analysis: A Case Study in Teaching a Multivariate Data Analysis Course with No Pre-requisites Amy Wagaman, Amherst College

Curriculum Guidelines for Undergraduate Programs in Statistical Science