Computing for the Humanities - Sara Woodbury

Semester breaks are for rest and relaxation, but they’re also a great time to learn new skills through workshops and other opportunities. Such was the case with the Digital Humanities bootcamp I took last week, which I’ll be talking about today.

Image courtesy of https://www.anaconda.com/.

This intensive workshop met from 9-4 Monday through Friday. Each day focused on a different activity, including visualizations, optical character recognition, and social media analysis. For the actual coding, we used Python, one of the more accessible programming languages out there, as offered through the open-source distributor Anaconda. To help concretize what we were learning, we used examples from our own research, empowering each of us to conceptualize how these various methods could benefit our work.

A simple test code, in this case producing a list of words.

One of the techniques that particularly excited me was machine reading, known as Optical Character Recognition. As I’ve mentioned in previous posts, I would like to expand on my work with the Roswell Museum and its WPA documentsas part of my ongoing research with community art centers. In addition to scanning and uploading every scrap of paper from that archive, we would need to produce a transcription for each text so that readers would not have to rely exclusively on the original, faded documents for content. As you can imagine, typing out a transcription for every document would take years for a human like myself, but a python program can enable a computer to read and produce a preliminary transcription in a fraction of the time.

As an example, let’s take a look at the checklist above for an exhibition selection of textile patterns shown in October 1938. To get the computer to recognize and register the individual characters of the letters, you have to heighten the contrast between the text and the page as much as possible. To do this, you first have the python program convert the photograph to grayscale. Once you’ve done this, you further convert it to black and white. While you’re doing this, you can add command filters that help adjust for shadows and other imperfections that interfere with the clarity of the text. This is especially useful for photographs taken on smartphones, as was the case with this checklist.

The actual code for turning the photograph into a black-and-white image.

Once you’ve got your black and white image, you can have the computer read it. Here’s the transcription that I generated for the checklist:

CALIFORNIA PRIFT!D TExtires

Received Cctoper 4, 1938

Yaxichilan on Linen
Costa Rica |
History of Music Hall
Guatamalna –

Playboy |

Figures with Still Life

Family Print

Carnival
Kinston
From & Dorming Garden

Kitchen Print

ixhibited October 10

While there are a lot of typos in this document, it’s reading at more than 50% accuracy, which is great for character recognition. Since computers can’t intuit (yet at least, take what you will from that), they process things exactly as they see them. When it comes to historical documents, then, it’s a given that you’re going to have to clean up the text, but the rough transcription is there. Once I had the program working, it only took a few seconds to generate this transcription, something that would have taken me a couple of minutes, at least. Multiply that by the hundreds or even thousands of written pages in an archive, and you’re saving yourself a substantial number of hours. Transcribing an archive will still take a long time, but Optical Character Recognition makes it a lot more manageable.

We also explored visualizations such as graphs and charts. I generated the bar graph below using a sample data sheet, in this instance opinions about commas. In addition to presenting the information itself, we learned how to change the palette, in this instance to correspond with William and Mary’s Pantone colors.

I’ve made graphs before in Excel, but python programs allow you to create a much wider array of visual explorations. A particularly exciting one we all liked was the heat map below, with more saturated colors corresponding to higher concentrations of numbers. By altering the data within the various columns of the program, we could change the layout to emphasize different sets of information.

The last day explored social media scraping and mapping. For scholars working on contemporary topics, Twitter, in particular, is a vast repository of information and opinions but can be overwhelming to process alone. Python, however, enables you to create programs that organize tweets. While I personally don’t foresee myself using this program that much, it’s still good to know about.

The code for plotting pop-up pins on a map. I found this exercise particularly useful, as it could be an effective way to plot out different art centers.

I was more interested in the mapping exercise. Using python, we were able to not only map specific locations and mark them with pins, but also add captions, descriptions, photographs, and other information. I ended up making a test map featuring community art centers in New Mexico:

This simple pop-up map shows only minimal information, as I wanted to see if I could program it. In the long term, I’d like to create a map that not only shows all the known art center sites, but includes more detailed information about their histories and their collaborative networks.

What excites me about this program is that it can develop alongside my research. As I find out about new art centers, I can add photographs, dates, and other information. Beyond simply plotting locations, I learned I can also use python to potentially draw the travel routes of different exhibitions and begin plotting the networks these art centers shared with one another. Python can help me expand my focus beyond Roswell, and start to incorporate that research into a broader national framework.

I’ll be the first to admit that I’m not a programmer, and my knowledge of python remains fairly rudimentary. Fortunately, William and Mary has a computer science department that is more than eager to collaborate, so I’ll have plenty of help as I continue to expand and develop my skills. If I’ve learned anything about Digital Humanities, it’s that collaboration is key to every project, so I’m happy to know that there are plenty of other students who would be able to help me.

In short, I had a terrific experience. I not only learned a lot of new things, but have access to resources that will let me continue to grow and develop my DH skills.

Leave a comment Cancel reply