Readings on data and statistical services for a liberal arts college


I’m beginning to do some scenario planning on what will data and statistical services offered by the library look like in 10-15 years. As part of that activity I’m compiling a list of articles, websites, & presentations that will help inform that perspective.

Many of these come from the article Teaching the next generation of statistics students to ‘think with data’: special issue on statistics and the undergraduate curriculum by Nicholas Horton and Johanna Hardin. That article has a nice section of key articles on statistics in the undergraduate curriculum, from which I’ve made some selections below.

Setting the stage for data science: integration of data management skills in introductory and second courses in statistics. Horton, Baumer, Wickham. 2015. (pdf)

Identifies 5 key elements that deserve greater emphasis in the undergrad curriculum:

  1. “Thinking creatively, but constructively, about data”…data cleaning, data storage
  2. working with data sets of varying sizes and understanding scalability issues…querying databases
  3. command-line skills. The authors mention R, Python. I also would include Unix. The command-driven environment “provide freedom from the un-reproducible point-and-click application paradigm”.
  4. “Experience wrestling with large, messy, complex, challenging data sets…these data are more similar to what analysts actually see in the wild.”
  5. “An ethos of reproducibility”

The article goes onto illustrate examples of utilizing these 5 elements in coursework.

 

Tidy Data – slides of presentation by Wickham

Data acquisition and preprocessing in studies on humans: what is not taught in statistics classes?

Statistics and Science: A Report of the London Workshop on the Future of the Statistical Sciences 2014 (pdf)

Humanities Data in R

Implications of the Data Revolution for Statistics Education (pdf) 2015 calls for more emphasis on big data, data visualization, and developing an “aesthetic for data handling and modeling based on solving practical problems”.

A data science course for undergraduates: thinking with data (pdf)

A cognitive interpretation of data analysis

Teaching and learning data visualization: ideas and assignments (pdf) 2015

Meeting Student Needs for Multivariate Data Analysis: A Case Study in Teaching a Multivariate Data Analysis Course with No Pre-requisites Amy Wagaman, Amherst College

Curriculum Guidelines for Undergraduate Programs in Statistical Science

 

Linked Open Data & Literary Networks


We had the first meeting today of what we’re calling our Linked Open Data Working Group. In addition to myself, group members are Mackenzie Brooks (Digital Humanities Librarian), Jeff Knudson (Senior Technology Architect/ITS), and Brandon Walsh (Mellon Digital Humanities Fellow).

What is it we want to accomplish through this group?

  • Develop a better understanding of Linked Open Data (LOD) and how it might apply to projects at W&L.

We want to think about LOD in the context of our specific DH projects in order to avoid talking about it in the abstract. But we also want to make sure that we clearly identify what we want to accomplish with those projects instead of having a solution (e.g., LOD) that is looking for a problem to solve. In other words, we’re going to develop the vision for the project and then work backwards.

We have several potential projects but the easiest to get started on is literary networks. This project evolved out of archival research relating to the Shenandoah literary magazine published by W&L. While Shenandoah is partially indexed in MLA, a full index of Shenandoah has never been produced. (Also, the contents are not in JSTOR.)  A student worker in the library has compiled an index as part of her job in Special Collections and Archives. Our DH Librarian (previously our Metadata Librarian) identified the necessary fields and created the spreadsheet for the data entry. The Shenandoah index has over 6,400 entries.

Our first task in this project doesn’t actually involve LOD: creating a web-based index to Shenandoah. But we want to keep LOD principles in mind as we develop the index. The Shenandoah data set provides research material that goes far beyond merely an index to a journal.

While the research agenda of the literary networks project is the topic of a future post, the essence is that I want to examine the relationships and connections among authors and editors.

 

Top-level functionality

Here are primary features involved in a LOD approach to this data set. Each feature is a different stage, or layer, of the project. A use case scenario describes functionality enabled by each stage.

  • a web-based index to Shenandoah

Use case: Queries based on editor. From these queries we can form network graphs based on relationships among authors and editors and issues.

  • expose the index/relationship data as LOD

Use case: Exposing this data set as a LOD triple store (with possibilities of generating query results in json & csv) allows for the data to be analyzed in a variety of tools such as Palladio, R, Gephi. Plus, it provides the ability for other projects to integrate this data.

  • the web index incorporates additional data about authors

Use case: the web interface shows brief biographical information and publishing history about each author. Instead of gathering this data manually and entering it directly into the data set, we want to explore creating a process that connects external information about these authors with this data by consuming LOD.

  • expanding the Shenandoah data set by including relationship information identified by archival research

Use case: literary networks are influenced by friendships and social contacts. Publishing decisions are often made by brokers (e.g., Ezra Pound), whose influence is not formally represented in the index data. There’s a complex challenge in figuring out how to represent this information.

  • expanding the data set with data of other literary journals in order to create a broader data set of authors publishing in the mid-century

Use case: While the data set starts with one specific literary journal (Shenandoah), it’s the connections among authors and editors publishing in a larger set of journals during the same time period that is more interesting. Authors do not simply write for one publication. Ultimately, the literary networks project will create a data set of authors (and editors) who published in mid-century literary journals.

From originating with the Shenandoah data set we gain experience with the process of utilizing LOD. What we learn through this initiative can be applied to other projects, particularly those with biographical data.

Process of the LOD working group

As a group we will meet once a month. We will use a Slack channel in the W&L DHAT team for communication.

DH as a trojan horse for information literacy


The digital humanities (DH) represent an academic library’s greatest opportunity for strengthening its role in the curriculum and research. DH provides a frame for understanding the creative process of scholarship.

The methods and tools within DH are not unique to disciplines within the humanities. The core activities of DH reflect the fundamental hallmarks of information literacy as expressed in the ACRL Framework for Information Literacy. DH does not exist without information literacy. Yet, a separation exists within the profession of librarianship between DH and information literacy. Our academic libraries are organized so that information literacy is the domain of subject liaisons/instructional librarians and DH emerges from R&D-type efforts. In most libraries these are entirely separate departments. In some universities DH is entirely separated from the library, even when a DH center is physically located within the library building.

DH and information literacy are on a collision path fighting for resources. Yet, DH can be a vehicle for strengthening the reach of information literacy in the curriculum. Opportunities exist for collaborative initiatives bringing the two together rather than siloed within organizational boundaries. Librarians must advocate for integrating these practices instead of competing for resources. Here are some steps for moving forward:

First, increase the dialogue between instruction librarians and DH specialists in order to move beyond the barrier of the term DH and recognize its essence as applicable to all disciplines.

Second, redefine the role of the subject liaison/specialist to incorporate a range of digital practices so that all academic librarians are digital scholarship librarians.

Third, take action and demonstrate through example by jumping in with both feet to figure out what works at a particular institution.

DH has the power to enhance an information literacy program. Within the confluence of DH and information literacy, a university can find the capacity to sustain digital scholarship.

First-year writing courses & DH


We’re half-a-year into our 4-year Mellon DH grant. On my way back from DLF in Vancouver at the end of October, I got stranded in the Chicago airport for most of the day. Those “opportunities” provide plenty of time to think. For a couple of years W&L has been issuing an open call to faculty to submit proposals for incentive grants in DH. As I was sitting in the airport, I started to reflect on how we could take a more systematic approach to ensuring that the grant money contributed to structural changes in the curriculum. In other words, what is it that we’re trying to incentivize?

One of our goals is to introduce more first-year students to DH.  Students encountering digital methods early in their academic careers are better equipped for handling DH assignments and projects appropriate for upper level courses.  Our students are challenged to grasp the implications of a world mediated through technology. The digital environment is not in opposition to the critical thinking nurtured through the processes of close reading and composition. Rather, through software we find tools that are suitable for enhancing our understanding of the world around us and to present new forms of expression.

Our students have the opportunity in their lifetimes to creatively define how technology impacts not only their future but also that of succeeding generations.

As their careers progress into the mid-century, our graduates’ entrepreneurial instincts and leadership will identify solutions that can only be met through their critical understanding of digital information and technology.

The foundations for that digital mindset of addressing humanistic concerns starts in the first-year of college.

Our initiative is to collaborate with faculty teaching the first-year writing courses and seminars to craft an introductory set of DH assignments that relates the core concepts of these courses with analytical and creative methods within DH that establishes a baseline of the critical understanding needed for thriving in a digital society.

What type of DH assignments are suitable for first-year writing courses? We’re not yet sure. It’s not appropriate for librarians and technologists to say, “This is what you should do.” Over the course of the coming year we want to define that with the faculty teaching those courses. We’re going to do that through a series of conversations. Plus, we’ll seek the advice of faculty at other institutions that have explored the concepts and are further along that path.

Readings on electronic literature, or conversations on digital narrative


I initially prepared the following list in preparation to guest lecture in an upcoming creative writing (fiction) course that will introduce students to ways of telling a story in digital media that takes forms other than linear prose.

First, why the term electronic literature (e-lit)? That’s a stiff sounding term that is a throwback to an earlier time before digital became commonplace. But e-lit is the term that seemingly has gained the most traction to refer to narratives that make innovative use of digital media. I have a slight discomfort with the term electronic literature (and also with digital literature) but my unease with the terminology is a topic for another post.

Any discussion of e-lit must involve the Electronic Literature Organization (ELO) at eliterature.org. That website is quickly overwhelming but the summary page What is E-Lit? is a good place to start.

The foundations of e-lit

An essay by Katherine Hayles offers a broad survey of e-lit (up to 2007). Hayles is an important figure in media studies and this essay is a good opportunity to introduce students to her works. Hayles includes references to landmark writers such as Kittler and Manovich and positions e-lit in a larger framework of the modern digital society.

This essay by Hayles is the first chapter in the book Electronic Literature: New Horizons for the Literary, which includes a companion website. The book attempts to establish a canon of early e-lit. But Janet Murray, another key scholar in the field of media studies, points out that these early works “are useful experiments, necessary failures, and limited successes, full of interesting mistakes that if appropriately acknowledged can push practice forward.” (Murray, Janet H. Hayles, N. Katherine. Electronic literature: new horizons for the literary. Modern fiction studies 55.2 01 Jan 2009: 407. Johns Hopkins University Press.)

A good summary of Hayles book on e-lit is provided on The Quarterly Conversation site in an article subtitled How Electronic Literature Makes Printed Literature Richer. Anyone who finds Hayles even slightly interesting should read her book How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics.

Pathfinders: Documenting the Experience of Early Digital Literature is the best source for understanding where e-lit comes from through an examination of pre-web hypertext literature in the years between 1986 and 1995. The scholarly literature on e-lit often refers to seminal works such as Shelley Jackson’s Patchwork Girl (1995) and other hypermedia texts created through tools developed by Eastgate System. However, the technology to actually read those works of e-lit today are inaccessible to most people. While Pathfinders does not provide a simulation of Patchwork Girl, it offers an intriguing methodology of showcasing how Shelley Jackson and readers interact with Patchwork Girl.

The present state of e-lit

There’s something odd about e-lit: it appears to be mostly discussed within academia and it’s difficult to find good examples on the web of what is called e-lit. How could that be?

Writer Paul La Farge provides a great comment:

“I actually don’t think digital literature is suffering from a lack of theory at this point; if anything, it suffers from a lack of practice. We need more writers! And a more diverse and robust way of getting their work into the world: not just more competent critics (we have some), but more kinds of competent critics, and more places where conversations about digital literature can happen, and more avenues by which digital lit can reach readers. All of this will surely happen in time. What I think the medium needs now is encouragement, and perhaps rescue from the forbiddingly technical language in which it has been theorized. It depresses me to think of digital literature as being exclusively an academic specialty: it’s as if Film Studies departments had sprung into existence all over the world, before anyone had made any movies.”

This quote is from an excellent series of posts by author Illya Szilak that appeared on the Huffington Post.

Note: I will be updating this post with new readings.

 

E-lit postings by Illya Szilak


Over the course of a year (late 2012 – late 2013) author Illya Szilak wrote a series of articles on Huffington Post about electronic literature that are worth reading for anyone interested in the topic. Szilak is the author of Queerskins – A Novel and Reconstructing Mayakovsky – A Novel of the Future.

Unlike most people who write about e-lit, Szilak is a physician and not an academic. As a creator of contemporary e-lit she brings a perspective that is often absent from the conversation on this topic.

Due to the navigational features of the Huffington Post it isn’t easy to read her articles in the order they were written. So, I arranged the following links to each article in chronological order.

The Death of the Novel: How E-Lit Revolutionizes Fiction 11/08/2012

Video in the House of the Word: How e-Lit Intersects With Cinema 11/20/2012

What Does a Polar Bear Do in a Jungle? How E-Lit Expands the Habitat of Literature 12/11/2012

The Death of the Author: E-lit and Collective Creativity 12/27/2012

It’s Got a Good Beat and You Can Dance to It: E-lit Plays With Time 1/17/2013

New Wor(l)d Order: E-lit Plays With Language 2/7/2013

It’s All Fun Until Someone Loses: E-lit Plays Games 3/7/2013

Just Playing Around: Why E-lit Matters 3/15/2013

Killing the Literary: The Death of E-lit 3/19/2013

Books That Nobody Reads: E-lit at the Library of Congress 4/24/2013

Fleshly Data: E-lit and the Post-Human 5/10/2013

Remembering the Human: E-lit and the Art of Memory 5/15/2013

Reorienting Narrative: E-lit as Pyschogeography 6/11/2013

The Silent History: E-lit Looks to the Future 7/1/2013

A Book Itself Is a Little Machine: Emily Short’s Interactive Fiction 10/30/2013

A Book Itself Is a Little Machine: Emily Short’s Interactive Fiction, pt 2 11/4/2013

Electronic Literature, Digital Humanities, & Creative Writing


This morning we met with a professor teaching Fiction Writing who wanted to incorporate a DH assignment into the course that required the students to tell a story through a new technology of their choice.

The students will start by completing a 3-page writing assignment with pen and paper. Then they will be asked to translate that story into a digital medium. The keyword here is translate. Is chopping a thousand words of prose into 50 tweets a translation of prose into digital medium? I would say not. As with any translation, how does the language of the digital medium impact the text? How does the language of the digital medium provide different capabilities (or affordances) that inspires new forms of creativity?

As an assignment, students will have to grapple with the technological platform (the language) that they have chosen for their translation. Students must learn that every platform choice comes with limitations and constraints that have significant impacts on determining the structure of their narrative. While these restrictions appear to be determined by the technology, students should grasp that the limitations are the functions of the underlying software. The code behind the platform reflects the dictates of software developers.

Many examples of electronic literature try to work within or around those constraints. That likely is a simple reflection of the coding limitations of the authors.

I would prefer that students start their translation process not with the choice of platform. But start with creativity. And again with pen and paper. Approach the digital with a blank slate and not with the limitations of an imposed piece of software. Sketch out in pencil what would be ways of representing this narrative if the choice of digital medium was wide open. Have the students describe the capabilities that the software would provide to tell the story. Have the students create a storyboard, a flow of the narrative, with the technologies of their imagination. These tools may not yet even exist. But if they can be imagined, then the algorithms  can be created.

What’s important is not that the students learn to tell a story in prezi or twitter, but that they learn that digital media is software that has capabilities and limitations defined by its creator. The process of software development is a creative act. While students in a fiction writing course will not become coders overnight, this exercise could inspire them to see the linkage between the creative process and the process of developing the tools that we all use to tell the stories in digital media.

Remarks to VICULA on DH


The following is a draft of remarks I gave to a meeting of VICULA (Virginia Independent College and University Library Association) today at its meeting at Washington and Lee. I don’t read my talks directly from a script, so what I actually said varies but this represents the heart of it.

I’m going to give an overview of our digital humanities and some of the issues we’re facing. And my colleagues are going to go into more specifics about the aspects that they are working on.

First, about that term digital humanities: DH. It’s a problematic term in an undergraduate college. It’s very hard to define and is more often associated with graduate education and faculty research. I do like this definition: DH is a set of “convergent practices that explore a universe in which print is no longer the primary medium in which knowledge is produced and disseminated” (Burdick et al. 2012, 122) [pdf] But what does that really mean in practice? What does it mean for undergraduate teaching and learning?

Many liberal arts colleges have adopted different terms, such as the digital liberal arts or digital studies. DH is more than the humanities. These techniques can apply to many disciplines. There’s really little distinction between DH and digital scholarship. My advice is not to focus too much on trying to define the term because that can lead to a lot of non-productive conversations as you pull in people from different disciplines.

At W&L we have a very practical reason for using DH: the dean of our college. She likes the term DH. She started the conversation here about DH about 3 years ago and she is the champion for our DH initiatives. So, here at W&L the term DH has developed a certain brand, a recognition, that works here. I encourage you to find the term that works at your institution. But spend more time talking about what you do and not about a definition.

We do have a lot of momentum going about DH. If you’re interested in the background, a group of us wrote a case study. I’m going to talk about where we are now. Collaboration is the key to everything we’re doing. And we’re very fortunate in having strong leadership, not only from our dean but also our faculty. Again, the case study describes how our DH activities are organized.

As we were writing that case study, we started examining what many other liberal arts colleges were doing with DH. And we noticed that many of the colleges had large grants from the Mellon Foundation. And we started wondering: how can we get some of that funding? So one day I picked up the phone and called the grants officer in our development office. We are very fortunate in that we have a wonderful grants officer. And he told me the process for applying for a Mellon grant.

I want to talk about this process a  bit since some of you might be interested in pursuing this funding source. The Mellon Foundation has an unusual process. It does not accept unsolicited applications but there is a process. Even though Mellon has a reputation for being somewhat exclusive, there are clear indicators that Mellon is expanding the range of institutions that they fund. And it’s important to remember that Mellon is a humanities foundation. The process is simple. Your provost or president simply sends a brief email describing the concept to the appropriate program officer at Mellon.

We were shocked that Mellon replied that day, within hours, that they were interested and wanted to see a draft proposal. We then had a conference call with the program officer: on the call was our provost, the two faculty most involved in DH, myself, and our grants officer. We got clarification as to what Mellon liked and did not like. They are very clear in what they do and do not want to see. So we pulled together a draft, about 5 pages, and submitted that. We were even more surprised that Mellon responded again the same day saying that they would fund the project. Of course, that was a tentative approval. It still had to be approved by the board of Mellon.

I encourage you not to be shy about approaching Mellon but the contact, at least for the liberal arts program, should come from the college’s senior administration. Many people are aware that research libraries often get significant Mellon funding for digital library and digital scholarship initiatives. But that is through a separate process, a different program officer at Mellon. So it’s important to make sure what division of Mellon you are targeting. And Mellon is also very interested in collaborative endeavors and multi-institutions initiatives.

If your institution has not received Mellon funding before, you probably will want to start with a small concept, perhaps a planning grant. In our initial contact, we did not specify a dollar amount. Mellon will tell you the amount that they will fund.

So what are we doing? Our grant proposal is titled DH Studio: a pedagogical innovation. We wanted to anchor our initiative around the curriculum, particularly a series of one credit courses in DH that serve as labs for humanities courses. Mackenzie Brooks will speak more about the DH Studio courses.

Our grant is divided into multiple areas: staffing for the library to support DH studio, incentive grants to faculty, summer research grants, professional development, a speaker series, student workers, and funding to send students to conferences. The about page of our DH website has more details on these initiatives.

So everything looks really great. We have the senior administration fully behind DH, great leadership from the faculty, very positive buzz among the faculty, the library is excited about DH, and we have a large grant. What could go wrong? One very important thing: lack of student interest.

We thought students would be excited to learn this stuff. But our students are very practical, very career oriented. They could not make the connection between DH and their careers. Part of the problem is that term DH. It means nothing to students.

And building interest among students is a major focus of our initiative this year. [At this point in the talk, I spoke about some our curricular activites. A focus was on the courses we’re teaching and the enrollment issues we encountered. I’m going to be doing an upcoming talk at DLF on the specifics of that issue. I’ll post those comments when they’re available.]

Digital history course project


In the second class of the digital history course. We focused on working with data, which is the largest part of the course project.

We started by reviewing two readings involving our case study of the Legacies of British Slave-ownership. Appendices 1 & 2 of the book about that project provide insights into prosopography and database development. As we reviewed those readings, I explained terms that were not yet familiar to the students. And we explored concepts around collecting and organization data about individuals. The course project page provides further background on the project and a few sample maps.

 

Creativity and Code


The third class of multimedia storytelling design is focused on what I call creativity and code. Continuing the focus on Snow Fall, the class reading for today is How We Made Snow Fall. That last link is a really good article, worth reading closely.

Creative thoughts from the how Snow Fall is made article:

  • Making a single story out of the digital assets….”to weave things together so that text, video, photography and graphics could all be consumed in a way that was similar to reading–a different kind of reading.” Talk about the origin of the word text…Latin…to weave (see Latin dictonary)
  • The design process: initial prototyping, collaboration, integration of graphics and video into “the narrative experience”
  • “a lot of trial and error and experimenting”
  • “design revisions and tweaks”
  • filmmaking techniques, but reading centric
  • “We focused on the pacing, narrative tension, and story arc–all while ensuring that each element gave the user a different experience of the story.”
  • moving images…pauses at “critical moments in the text”
  • “Having a tight edit that slowly built the tension of the narrative was the overall goal.”
  • Color palettes
  • Biggest challenge: “managing the path we wanted the reader to follow”

 

Much of this class period will be getting students to do HTML hands-on. We’ve talked a lot about HTML in the first class and looked at a lot of HTML code, but I want to get a sense as to how far the students are coming along with actually doing something with HTML. I suspect not far.

Some thoughts on topics to cover in today’s class:

  • Have the students open up the W3schools site. A good reference source for tags. But, also, for those new to HTML: the try it yourself  online editor is an extremely simple way for seeing how tags work.
  • Tell students about lynda.com. Not required for this class but a great resource for learning on your own. In future versions of this course, I might think about requiring lynda.com access rather than a book.
  • What is a plain text file? Contrast with binary files using machine code and with WYSIWYG document editors. What’s a text editor? A code editor? Try opening a binary file in a text editor.
  • File extensions. Nice to show those extensions in Mac OS X (Finder>Preferences>Advanced>Show all filename extensions.

I had a student email me with a problem in getting an HTML document display in the browser as a Web page. Instead, the page simply showed the HTML code. The problem was that the student followed the instructions in the Duckett book and used TextEdit on the Mac to create the page. TextEdit, being a rather confusing software program for writing HTML, saved the document as RTF (even though the student had correctly added the .html extension). There’s a non-intuitive option within TextEdit for saving as plain text.

For this class I’m recommending the use of Atom, which is already installed on the iMacs in the journalism lab. But make sure that students know how to install this on their own laptops.

  • Do students understand that a website is a directory (i.e., folder) on a server?
  • Create a basic web page in HTML. Perhaps by just copying over what they had done in the w3schools try it editor.
  • Save the page as index.html…as a way of introducing the importance of index.html as a filename in web site directories. Stress, though, that not any directory can serve web pages. You need a directory that is associated with web server software. In the next class we will look at actually uploading the pages to a server.
  • Play around with HTML. Go over again the head and body. Title. H# tags.
  • Introduce lists and links.
  • Relative and absolute URLs.
  • Talk about text formatting (non-CSS).
  • Take questions.