We had the first meeting today of what we’re calling our Linked Open Data Working Group. In addition to myself, group members are Mackenzie Brooks (Digital Humanities Librarian), Jeff Knudson (Senior Technology Architect/ITS), and Brandon Walsh (Mellon Digital Humanities Fellow).
What is it we want to accomplish through this group?
- Develop a better understanding of Linked Open Data (LOD) and how it might apply to projects at W&L.
We want to think about LOD in the context of our specific DH projects in order to avoid talking about it in the abstract. But we also want to make sure that we clearly identify what we want to accomplish with those projects instead of having a solution (e.g., LOD) that is looking for a problem to solve. In other words, we’re going to develop the vision for the project and then work backwards.
We have several potential projects but the easiest to get started on is literary networks. This project evolved out of archival research relating to the Shenandoah literary magazine published by W&L. While Shenandoah is partially indexed in MLA, a full index of Shenandoah has never been produced. (Also, the contents are not in JSTOR.) A student worker in the library has compiled an index as part of her job in Special Collections and Archives. Our DH Librarian (previously our Metadata Librarian) identified the necessary fields and created the spreadsheet for the data entry. The Shenandoah index has over 6,400 entries.
Our first task in this project doesn’t actually involve LOD: creating a web-based index to Shenandoah. But we want to keep LOD principles in mind as we develop the index. The Shenandoah data set provides research material that goes far beyond merely an index to a journal.
While the research agenda of the literary networks project is the topic of a future post, the essence is that I want to examine the relationships and connections among authors and editors.
Here are primary features involved in a LOD approach to this data set. Each feature is a different stage, or layer, of the project. A use case scenario describes functionality enabled by each stage.
- a web-based index to Shenandoah
Use case: Queries based on editor. From these queries we can form network graphs based on relationships among authors and editors and issues.
- expose the index/relationship data as LOD
Use case: Exposing this data set as a LOD triple store (with possibilities of generating query results in json & csv) allows for the data to be analyzed in a variety of tools such as Palladio, R, Gephi. Plus, it provides the ability for other projects to integrate this data.
- the web index incorporates additional data about authors
Use case: the web interface shows brief biographical information and publishing history about each author. Instead of gathering this data manually and entering it directly into the data set, we want to explore creating a process that connects external information about these authors with this data by consuming LOD.
- expanding the Shenandoah data set by including relationship information identified by archival research
Use case: literary networks are influenced by friendships and social contacts. Publishing decisions are often made by brokers (e.g., Ezra Pound), whose influence is not formally represented in the index data. There’s a complex challenge in figuring out how to represent this information.
- expanding the data set with data of other literary journals in order to create a broader data set of authors publishing in the mid-century
Use case: While the data set starts with one specific literary journal (Shenandoah), it’s the connections among authors and editors publishing in a larger set of journals during the same time period that is more interesting. Authors do not simply write for one publication. Ultimately, the literary networks project will create a data set of authors (and editors) who published in mid-century literary journals.
From originating with the Shenandoah data set we gain experience with the process of utilizing LOD. What we learn through this initiative can be applied to other projects, particularly those with biographical data.
Process of the LOD working group
As a group we will meet once a month. We will use a Slack channel in the W&L DHAT team for communication.