Semester breaks are for rest and relaxation, but they’re also a great time to learn new skills through workshops and other opportunities. Such was the case with the Digital Humanities
This intensive workshop met from 9-4 Monday through Friday. Each day focused on a different activity, including visualizations, optical character recognition, and social media analysis. For the actual coding, we used Python, one of the more accessible programming languages out there, as offered through the open-source distributor Anaconda. To help concretize what we
One of the techniques that particularly excited me was machine reading, known as Optical Character Recognition. As I’ve mentioned in previous posts, I would like to expand on my work with the Roswell Museum and its WPA documents
As an example, let’s take a look at the checklist above for an exhibition selection of textile patterns shown in October 1938. To get the computer to recognize and register the individual characters of the letters, you have to heighten the contrast between the text and the page as much as possible. To do this, you first have the python program convert the photograph to grayscale. Once you’ve done this, you further convert it to black and white. While you’re doing this, you can add command filters that help adjust for shadows and other imperfections that interfere with the clarity of the text. This is especially useful for photographs taken on smartphones, as was the case with this checklist.
Once you’ve got your black and white image, you can have the computer read it. Here’s the transcription that I generated for the checklist:
CALIFORNIA PRIFT!D TExtires
Received Cctoper 4, 1938
Yaxichilan on Linen
Costa Rica |
History of Music Hall
Figures with Still Life
From & Dorming Garden
While there are a lot of typos in this document, it’s reading at more than 50% accuracy, which is great for character recognition. Since computers can’t intuit (yet at least, take what you will from that), they process things exactly as they see them. When it comes to historical documents, then, it’s a given that you’re going to have to clean up the text, but the rough transcription is there. Once I had the program working, it only took a few seconds to generate this transcription, something that would have taken me a couple of minutes, at least. Multiply that by the hundreds or even thousands of written pages in an archive, and you’re saving yourself a substantial number of hours. Transcribing an archive will still take a long time, but Optical Character Recognition makes it a lot more manageable.
We also explored visualizations such as graphs and charts. I generated the bar graph below using a sample data sheet, in this instance opinions about commas. In addition to presenting the information itself, we learned how to change the palette, in this instance to correspond with William and Mary’s Pantone colors.
I’ve made graphs before in Excel, but python programs allow you to create a much wider array of visual explorations. A particularly exciting one we all liked was the heat map below, with more saturated colors corresponding to higher concentrations of numbers. By altering the data within the various columns of the program, we could change the layout to emphasize different sets of information.
The last day explored social media scraping and mapping. For scholars working on contemporary topics, Twitter, in particular, is a vast repository of information and opinions but can be overwhelming to process alone. Python, however, enables you to create programs that organize tweets. While I personally don’t foresee myself using this program that much, it’s still good to know about.
I was more interested in the mapping exercise. Using python, we were able to not only map specific locations and mark them with pins, but also add captions, descriptions, photographs, and other information. I ended up making a test map featuring community art centers in New Mexico:
What excites me about this program is that it can develop alongside my research. As I find out about new art centers, I can add photographs, dates, and other information. Beyond simply plotting locations, I learned I can also use python to potentially draw the travel routes of different exhibitions and begin plotting the networks these art centers shared with one another. Python can help me expand my focus beyond Roswell, and start to incorporate that research into a broader national framework.
I’ll be the first to admit that I’m not a programmer, and my knowledge of python remains fairly rudimentary. Fortunately, William and Mary
In short, I had a terrific experience. I not only learned a lot of new things, but have access to resources that will let me continue to grow and develop my DH skills.