Stat 159/259 - Reproducible and Collaborative Data Science

All materials for this course are available on GitHub.

The class syllabus will be updated over the course of the first couple of weeks of class.


  • Due Monday October 9, 2017:

Stuart is an ethnographer at BIDS and will guest lecture in the class on October 10th, talking about this work. You will only need to submit a written report on the above paper. Further information about the lecture follows, you should spend some time exploring the links provided below in order to make the most of Stuart’s presentation.

This lecture will be an inside look into a reproducible research project (repo), which studied conflict between automated software agents (or bots) in Wikipedia. The paper is a bit long and goes into a lot of detail, so feel free to skim. Also spend a little bit of time exploring Wikipedia’s public but behind-the-scenes spaces to get a general background on what bots do and how the community governs them. Try to also get some familiarity with Wikipedia as a version control system, because these commit histories are the primary data analyzed in the study.

  • Main materials:
    • The paper above, also available here
    • Introduction to Wikipedia Bots
      • This page contains many links to pages you might find interesting, including a list of bots
    • Recent bot edits to English Wikipedia articles – as every page on Wikipedia (including discussion pages) is a flat text file in a version control system.
      • Click “diff” on each line to get a diff of what the bot changed
      • Click “hist” to see the history of edits to that artcile
      • Click the bot’s username to get their profile
      • Click “talk” for messages left to the bot’s operator
      • Click “contribs” for all the bot’s edits
  • Some additional materials if you are interested further: