Tag: digital dissertation

Methodology of a visualization

Introduction

Visual representations of data offer a quick way to express a lot of information. As the old adage goes, a picture is worth a thousand words. One of the facets of digital humanities research is providing information in the form of visuals: graphs, maps, charts, etc.

I was already writing up some notes on a visualization I was creating for the dissertation when I read this excellent blog post by Fred Gibbs (a version of a presentation at the AHA 2015). In this essay I think Fred accurately identifies the digital humanities field as one in need of stepping up to the next level. It is no longer enough to present visuals as humanities research, but it is time to start critiquing what is presented, and for researchers to start explicitly explaining the choices that went into creating that visualization.

With those thoughts in mind, I present the methodology, the decisions, and the visualization of over 200 deaths at the KZ Porta Westfalica Barkhausen, during a one year period.

A change is happening (at least for me) in how data is analyzed. I have a spreadsheet of over 200 deaths, with various information, death date, location, nationality, etc. The desire to create a visualization came from wanting to understand the data and see the commonalities and differences. The first question I had was how many nationalities are represented, and which countries. The second question is what is the distribution of the deaths by month.

The following is how I came to a visualization that answers the first question.

Data Compilation

Data is taken from two locations and merged.

  • The first set of data is a large spreadsheet obtained from the KZ Neuengamme Archiv containing all of their data on the prisoners that died and were at KZ Neuengamme or one of the satellite camps. This file contains 23,393 individuals.
  • The second data set is another set of files from KZ Neuengamme Archiv, but is derived from a list compiled by French authorities. It is available online at: http://www.bddm.org/liv/index_liv.php. The files were split into three sections listing the dead from Barkhause, Porta Westfalica, and Lerbeck. These files contained a total of 177 individuals.

Combining just the individuals matching those who were in a Porta Westfalica KZ from both sets of data left around 280 individuals.

Data Cleaning

There were a number of steps needed in order to have useful information from the data.

  • First of all, the data from the French archive was highly abbreviated. For example, the column containing the locations of internment were two or three letter abbreviations of location names. Elie Barioz, for example, had the locations “Wil, Ng (Po, Bar)” which, when translated, turn into “Wilhelmshaven, Neuengamme (Porta Westfalica, Porta Westfalica-Barkhausen)”
    • The process of translating the abbreviations was quite labor intensive. First, I had to search on the French site for an individual: http://www.bddm.org/liv/recherche.php
    • Search for ‘Barioz’. image-of-searchingNote: The Chrome web browser can automatically translate the pages on this site.
    • The correct individual can be determined by comparing the full name and the birthdate. The citation to the location in the book is a hyperlink to that record (ex. Part III, No. 14 list. (III.14.)).image-of-matches
    • The abbreviations for this individual’s interment locations are hyperlinks to more information, part of which is the full name of the location. Clicking on ‘Wil’ results in a pop up window describing the KZ at Wilhelmshaven and information about the city.
      image-location-pop-up
    • After determining that ‘Wil’ meant ‘Wilhelmshaven’, all occurrences of ‘Wil’ in that column can be changed to ‘Wilhelmshaven’.This process is repeated until all of the abbreviations have been translated.
  • Remove extraneous asterisks. It was quite frustrating to note that the French site did not include information on what the asterisk and other odd symbols mean. (Another odd notation is the numbers in parenthesis after the birth location.) I had to simply just delete the asterisks, losing any possible meaning they might have had.
  • Combine duplicates. Keep as much information from both records as possible.
  • Fix dates. They should all be the same format. This is tricky, in that Europe keeps dates in the format MM-DD-YYYY. For clarity sake, it would be best to use “Month DD, YYYY”. I left them as is for now. Editing 280 dates is not fun…
  • Fix nationality. The Tableau software references current nations. The data in the spread sheets uses nations current to the time of creation. For example, some individuals were noted with the nationality of ‘Soviet Union (Ukraine)’. These needed to be brought to the present as ‘Ukraine’. More problematic were the individuals from ‘Czechoslovakia’. Presently, there is the Czech Republic and Slovakia. The question is, which present day nationality to pick. There is a column for birth place which potentially solves the issue, but this field is for where the individual was born, wich, in the case of Jan Siminski, is seen. He was born in the Polish town of Obersitz (German translation), so the birth place can not clarify his nationality as Czech or Slovakian.
  • This brings up another issue, the translation of place names. City names in German, especially during the Third Reich, are different than current German names for the city, which are different than the English name of the city, which are different than what the nation calls the city. I need to standardize the names, picking, probably English. Tableau seemed to have no problem with the ethnic city names, or the German version, so I left them as is.

 

Tool Picking

I used the free program, Tableau Public: http://www.tableau.com/

This allows for very quick visuals, and a very easy process. The website has a number of free tutorials to get started. http://www.tableau.com/learn/training

Map

The first visualization I wanted to make was a map showing where the prisoners were from, their nationality. The map would also show the number of prisoners from each country. (This is not a tutorial on how to use Tableau, but a walk through of the pertinent choices I made to make sense of the data, it is methodology, not tech support. 🙂 )

Using the default settings (basically, just double clicking on the Nationality field to create the map) results in a dot on each country represented in the data.

image-blue-dots-map
This can be transformed into a polygon highlight of the country by selecting a “Filled Map”.

image-filled-map
Next step was to apply shading to the filled map; the larger the number of prisoners who died from that country the darker the fill color.
image-filled-map
The default color was shades of green. I wanted a more dull color to fit in with the theme of the visualization, “death”. I picked a light orange to brown default gradient, separated into 13 steps (there are 13 countries represented).

Table

While just a filled map with gradient colored countries is helpful, the information would be more complete, more fully understandable, with a legend. This can be created by using a plane table listing the countries and the number of dead from that country. Each row is color coordinated with the map by using the same color scheme and number of steps as with the map.

image-table

 

Dashboard

In Tableau, you create a dashboard to combine the different work sheets, maps, tables, graphs, etc. In this case, a full page map, with the table overlaid completes the visualization.

Result

The result is a very simple map, created in about ten minutes (after a few video tutorials to refresh my memory on how to create the affects I wanted).

(See a fully functioning result below this image.)
image-final-result

 

Benefits of Tableau

Tableau has some limitations. The results are hosted on their servers, which has the potential for lock down. They use proprietary, closed source code and applications.

But there are many benefits. The default visualizations look great. It is very easy to create simple and powerful visualizations. The product is capable of producing very sophisticated statistical representations. You can use the free and open source stats program R. The visualizations are embed-able in any website using Javascript.

The biggest benefit of using Tableau is the automatic link back to the original data source. I think the most needed shift in humanities (particularly the history profession), and the biggest benefit of “digital” capabilities for the humanities, is the ability to link to the source material. This makes it infinitely more easy for readers and other scholars to follow the source trail in order to provide better and more accurate feed back (read critique and support).

To see the underlying data in this visualization, click on a country in the map or the table. A pop up window appears with minimal data.

image-pop-up-info

Click on the “View Data” icon.

image-view-data-icon

Select the “Underlying” tab and check the “Show all columns” box. Voilà!

image-underlying-data

Behold the intoxicating power of being able to view the underlying data for a visualization!

Digital Humanities Improvement Idea

Imagine, if you will, the typical journal article or book, with footnotes or end notes referencing some primary document or page in another book or article. With digital media, that footnote turns into a hyper-link. A link to a digital copy of the primary document at the archive’s site, or the author’s own personal archive site. Or it links to a Google Book site with the page of the book or journal displayed. Now you have the whole document or at least a whole page of text to provide appropriate context to citation.

Way too often I have been met with a dead end in following citations; especially references to documents in an archive. Not often, but archives change catalog formats, documents move in an archive, they no longer are available to researchers, etc. It would be so much easier to have a link to what some researcher has already spent time finding. Let’s build on the shoulders of each other, rather than make each scholar waste time doing archival research that has already been done.

I think it incumbent upon all researchers to provide more than a dead-text citation to their sources. In this digital age, it is becoming more and more trivial to set up a repository of the sources used in research, and the skills needed to provide a link to an item in a repository less demanding. Here are some ideas on how to accomplish this already.

  • Set up a free, hosted version of Omeka at http://omeka.net. Add all of your source material to Omeka. Provide a link to the document in Omeka along with your citation in the footnote or end note.
  • Create a free WordPress account at http://wordpress.com. Add a post for each source document. Provide a link to that post in your citation.
  • Most universities have a free faculty or student web hosting environment (something likehttp://univ.edu/~usrname/). Dump all of your digital copies of your documents in that space (nicely organized in descriptive folders and with descriptive file names–no spaces in the names, of course). Now, provide a link to that resource in your citation.
  • Set up a free Zotero account at http://zotero.org. Set up a Group Library as Public and publish all of your sources to this library.

I intend to take my own advice. I have an Omeka repository already set up, with a few resources there already: NaziTunnels Document Repository. Once I start publishing the text of my dissertation, there will be links back to the primary document in the footnotes.

I would love to see this type of digital citation become as ubiquitous as the present-day dead-text citation.

I have not addressed Copyright issues with this. Copyright restrictions will severely limit the resources to be used in an online sources repository, but there are certainly work ways to work around this.

If hosting the sources on your own, one quick fix would be to put the digital citation sources behind a password (available in the book or journal text). Another option might be to get permission from the archive if only low quality reproductions are offered.

End

Let me know if you find the live-text or digital citation idea viable. Do you have other ideas for providing a repository of your sources?

Drop me a note if you want more detail on how I created the map in Tableau. I’m by no means proficient or in no way the technical support for Tableau, but I’ll do what I can to guide and advise.

The archive is live

Part of my dissertation is to create an online archive of the documents I find. Thanks to the Hist 698 Digital History Techne class I had with Fred Gibbs this semester, the technical work of this part of the dissertation is now done. I used Omeka with the Scripto plugin (which is really a bridge to a MediaWiki installation) for the archive, and an Exhibit from MIT’s Simile project for a quick and dirty display of data and a map plotting the location of several of the tunnel locations.

Also part of the course, is to give a brief presentation about the final project, which is taken from this post.

Goals

I had two goals for this course.

  1. Create an quick and easy way to display the location and information about some of the tunnel sites using a selection of documents.
  2. Create an online archive that would allow myself and others to transcribe and translate the documents.

Part 1

I was to use the Exhibit tool to complete the first goal. Set up was a bit more difficult than planned. I had an Exhibit working for a different project, and was finally able to massage the data into a copy of that code, and integrate it into the website using a WordPress Template.

Map showing the location of tunnel projects in the A and B groups.

This allowed me to display the data in three different views. First is the map, as seen above. I was able to show the tunnels in the two different groups identified in the documents. The A projects were existing tunnels, caves, or mines that were to be retrofitted and improved before factories could be moved in. B projects were to be completely new underground spaces.

The Exhibit also has a table view, showing all of the items with select information for easy comparison, or information retrieval at a glance. For each view, the right hand side provides options for filtering the data. Exhibit uses JavaScript, so with of the data is already present in the page,  filters and changes are applied instantly without any page reloads and slow data retrieval from the server.

A third view shows all of the items separately, with all of the available data.

Ideally, this information would be stored in a Google Spreadsheet to make updating and adding a cinch, but I was not able to get that working, so the data is in a JSON file instead. It would also have been neat to pull the information from the archive. Perhaps that can be built later.

Part 2

I also set up an Omeka install to host the images I had previously digitized from the United States Holocaust Memorial Museum. I not only want an archive, but also a way to have others transcribe and translate the documents, so I installed the Scripto plugin which is dependent on a MediaWiki install as well.

The ability to transcribe and translate is also an integral part of my dissertation. I want to argue, and show that historical work can not and should not be done alone. One way to do this is to get help from the undergraduates in the German Language program here at George Mason University. The German Language director at GMU is all on board to have some of her upper level students take on translation as part of their course work. This not only helps me, but helps them learn German by looking at interesting historical documents (and hopefully get them interested in history), but also helps future researches to be able to search and find documents easier.

Transcribing and translating made possible by Scripto and MediaWiki.

Historical Questions

This was the hardest part of the course. I’m really good at creating digital stuff because that is what I do all day. But I’m a little rusty on the historical interpretation and asking questions. What also makes this hard is not knowing completely what data I have yet.

Part of the problem with coming up with good, probing questions, is that I haven’t had a lot of time to look at the documents to see what is there. Also, there is not much written on this topic, so I’m kind of figuring out the story as I go. It’s a lot easier to read secondary works and ask new questions, or ask old questions in different ways. But there are no questions yet, except what happened.

The bigger questions, in light of this course, should be about how does this technology that we learned help understand the history, or help generate new questions. Will displaying the data in different ways help me make connections and inspire ideas that I would not otherwise have made or thought? Do the digital tools allow me to process more data than I could do non-digitally?

Another stumbling block (or is it a building block, it’s all about perspective, right), comes from my recent trip to Germany for research. While there I met with Dr. Ulrich Herbert at the University of Freiburg. He’s somewhat of a scholar in the area of slave labor, and has kept up to date on the writings regarding the underground dispersal projects. His wise suggestion for my dissertation was to focus on a single tunnel site, rather than trying to write about the organization responsible for all of the dispersal projects. Such an undertaking would take a life time, he said. So now I need to focus on just one tunnel, rather than all of them. Fortunately, Dr. Herbert put me in contact with the Director of the Mittelbau-Dora Concentration Camp Memorial, Dr. Jens-Christian Wagner. With his help, I may be able to find a specific tunnel to focus on, and make my trip in July 2013 that much more profitable.