Author: ammon

Irony Juxtaposed

Here’s an interesting history I ran across in my research. It’s a little bit of situational irony, something that happens to everyone. This woman was able to notice it and find humor in it later in life despite all of the tragedy. This event is juxtaposed with another event that shows how interesting humanity can be. Even though she had just survived months of the most brutal displays of humanity in the concentration camps, she was able to show compassion to her “enemy”.

Györgyné (Zsuzsa) Papp was born in Budapest, Hungary in 1921. Her family was Jewish, but not religiously. She was arrested and sent to Auschwitz. Later she was selected to go to a labor camp. She ended up being transferred to several labor camps and ended up in Salzwedel.

One morning she woke up and all of her captors were gone. It was April 1945. The Americans were advancing quickly, so the German soldiers had fled. Zsuzsa and her sister went into the town of Salzwedel to search for food. The town was nearly deserted as well. They went into stores already looted by other prisoners. Then entered homes to find any food they could. In one home they found a loaf of bread on the table. As they went to get it they heard sobs from a woman who told them that was all the food she had left for herself and her four children. Even though they were starving and had been abused and mistreated for months by this woman’s nation, they felt pity on her and left the bread.

Zsuzsa tells how one of the things she was most fearful of while in the concentration camps was cleaning the latrines. They were just too awful for her to contemplate. She felt so fortunate to have escaped the dreaded latrine duty in all her months in the concentration camps. As she was walking around Salzwedel she somehow fell into a ditch used for a latrine and found herself covered in waste.

A Map of KZ Porta Westfalica

I needed to get the latitude and longitude of several places for the GIS project. I used Google Maps to get the data. Just click on a point on the map and the info box shows you the lat/long.

 

GoogleMaps-LatLong

While playing with this, I figured I’d make a more permanent map showing some of the important locations. That map is found here:

https://www.google.com/maps/d/edit?mid=zuMewFpePmAg.kUArQ9EReKT4

Map-with-locations

I was able to find the locations with the help of a couple of maps I found in archives.IMG_0635 copy LagerMap

 

The nice thing about the Google map is that it can attache photos of the points I marked (as seen in the first, feature, image).

 

 

 

Converting PDFs to PNGs & My Workflow

I’ve posted about combining a bunch of images into one PDF, but how about going the other way?

This site has a great tutorial for using GhostScript to convert a PDF into PNGs suitable for using for OCR. They do a great job explaining the different flags for GhostScript and some tips for getting the best resolution for the PNGs. The one step they don’t show is how to get each page of a PDF into a separate PNG (so a 10 page PDF makes 10 PNGs).

Here’s how to do that:

In the output image name, add: %03d

This will insert an automatically incremented number padded with a padding of three digits. That means the first number will be 001, then 002, then 003, and so forth. This is really helpful in keeping the files in alphabetical and numerical order. Otherwise you’ll get a file ending in 11 coming before 2.

Here is the complete command I have been using:

gs -dSAFER -sDEVICE=png16m -dINTERPOLATE -dNumRenderingThreads=8 -r300 -o Zsuzsa_Polgar-%03d.png -c 30000000 setvmthreshold -f Polgar_Zsuzsa-1574-10.03.1992.pdf

So my workflow has been like this:

1. If I have a scanned copy of files in PDF form I run the above GhostScript command. This results in a folder of PNG images.

2. I run a new watermark/OCR tool on the folder of images. It is a Ruby script which utilizes ImageMagick for creating a watermark and Tesseract for running OCR on the images. You can find this program here:

https://github.com/mossiso/cowl

This creates a folder called ‘output’ with a PDF of all the images (kind of redundant when starting with a PDF, but now the pages have the watermark on them), and two sub-folders, one with the OCR files, and one with the watermarked copies.

3. Now I can get rid of the PNGs that were created with the GhostScript command.

Now that I have each page OCRed, I can do searches on these files, where otherwise I had to read through the entire PDF page by page. For example, today I’m looking through a 40+ page PDF transcript of a survivor interview to find the parts where she talks about her experiences at the Porta Westfalica camp. While I’ll read through each page, to get a sense of where I should be looking I can now do a search on the OCRed pages to find out where the term ‘Porta’ is found.

Screen Shot 2015-01-30 at 1.17.16 PM

Now I know that at least on pages 47 and 48 is where I’ll find some description of her time in Porta Westfalica.

Copying Files From Mac to Linux With Umlauts

I ran into an issue today where I wanted to copy some files from my laptop to the web server.

Usually, I just run the scp command like so:

scp -r /path/to/files/on/laptop/ user@server.com:/path/to/put/files/

This will copy all of the files without problems.

The problem is that there were nearly 300 files to copy, and so I left the laptop to do the copy. In the meantime, it went to sleep, stopping the copy. Scp is not smart enough to just copy the files that didn’t get copied, but will copy all nearly 300 files again.

There is a program that has this intelligence, though… rsync !

Run this command like so:

rsync -avz /path/to/files/on/laptop/ -e ssh user@server.com:/path/to/put/files/

 

This usually works great… except when there are umlauts in the file names. Apparently Macs and Linux use a different terminology when talking UTF-8.

The default Mac version of rsync is woefully out of date, though, and doesn’t support an option to fix this issue.

The solution!

You’ll need to have homebrew installed in order to update to the latest version of rsync. If you don’t have homebrew installed already, you need to.

Then it’s a simple install command:

brew install rsync

And now you can do the rsync command again:

rsync -avz --iconv=UTF8-MAC,UTF-8 /path/to/files/on/laptop/ -e ssh user@server.com:/path/to/put/files/

 

The –inconv option allows Mac and Linux to speak the same UTF-8 language.

Special thanks to Janak Singh for the rsync option and detailed information on the issue.

 

Update: December 9, 2014.

There were some issues with the umlauts on the Linux server, and with the names of the files as I put them into Omeka, so I decided to do away with the special characters altogether. But how to change all of the file names? Easy, use the rename command.

On the Linux server it was easy as:

rename ü ue *.png

on the Mac I needed to install the rename command with homebrew first:

brew install rename

The syntax is a little bit different on the Mac:

rename -s ü ue *.png

 

You can also do a dry run to make sure it the command doesn’t do something you don’t like.

rename -n -s ü ue *.png

 

That takes care of the special characters issue.

Watermarking and OCRing your images

I have accumulated nearly 2000 images, all scans of documents, relating to the dissertation. One goal of the project is to make these documents open and available in an Omeka database. In order to more correctly attribute these documents to the archives where I got them, I need to place a watermark on each image.

I also need the content of the documents in a format to make it easy to search and copy/paste.

The tools to do each of those steps are readily available, and easy to use, but I needed a script to put them together so I can run them on a handful of images at a time, or even hundreds at a time.

To layout the solution, I’ll walk through the problem and how I solved it.

When at the Neuengamme Concentration Camp Memorial Archive near Hamburg in the summer of 2013, I found about 25 testimonials of former inmates. In most cases I took a picture of the written testimonial (the next day I realized I could use their copier/scanner and make nicer copies). So I ended up with quite a number of folders, each containing a number of images.

Screen Shot 2014-11-18 at 10.52.38 AM

So the goal became to water mark each of the images, and then to run an OCR program on them to grab the contents into plain text.

Watermark

There are many options for water marking images. I chose to use the incredibly powerful ImageMagick tool. The ImageMagick website has a pretty good tutorial on adding watermarks to single images. I chose to add a smoky gray rectangle to the bottom of the image with the copyright text in white.

The image watermark command by itself goes like this:

width=$(identify -format %w "/path/to/copies/filename.png"); \
s=$((width/2)); \
convert -background '#00000080' -fill white -size "$s" \
-font "/path/to/font/file/font.ttf" label:"Copyright ©2014 Ammon" miff:- | \
composite -gravity south -geometry +0+3 - \
"/path/to/copies/filename.png" "/path/to/marked/filename.png"

This command can actually be run on the command line as is (replacing the paths to images the font file, and copyright text of course). I’ll explain the command below.

The first line gets the width of the image to be watermarked and sets it to the variable “width”. The second line gets half the value of the width, and sets it to the variable “s”.

The third line starts the ImageMagick command (and is broken onto several lines using the \ to denote that the command continues). The code from ‘convert’ to the pipe ‘|’ creates the watermark, a dark grey rectangle with white text at the bottom of the image.

Screen Shot 2014-11-18 at 1.40.12 PM

OCR

Most of the images I have are of typed up documents, so they are good candidates for OCR (Optical Character Recognition), or grabbing the text out of the image.

OCR is done using a program called tesseract.

The tesseract command is relatively simple. Give it an input file name, an output file name, and an optional language.

tesseract "/path/to/input/file.png" "/path/to/output/file" -l deu

This will OCR file.png and create a file named file.txt. The -l (lowercase letter L) option sets the language to German (deut[sch]).

OCR'd!

The Script

The script is available at my GitHub repo: https://github.com/mossiso/ocr-watermark

Here is how to use the script.

Download the ocrwm file and put it in the directory that has the image files.

Open the file with a text editor and set the default label to use in the watermark. If desired, you can also specify a font file to use.

edit-2-lines

On the command line (the terminal), simply type:

bash ocrwm

At it’s basic this will make a “copies” directory and put in there a copy of each image file (it will find images of the format JPG, GIF, TIF, and PNG in the directory where you run the command).

 

To OCR and Watermark the images do:

bash ocrwm -ow

This will make the copies as above, but will also create a directory named “ocr” and a directory named “marked” and add respective files therein.

 

You can also create a single pdf file from the images in the directory like so:

bash ocrwm -pow

 

Adding the l (lowercase letter L) option allows you to set the text in the watermark.

bash ocrwm -powl "Copyright ©2014 Me"

 

There is an option to not copy the files. This is useful if the files have been copied using this script previously (say you ran the script but only did water marks and not OCR, then to just do the OCR you can run the script again but not have to copy the files again).

bash ocrwm -co

 

 

Gotchas

Here are things to look out for when running the script.

By default, the script will run the OCR program, tesseract, with German as the default language. You can change that to English by deleting the “-l deu” part on the line that calls tesseract. The list of language abbreviations and languages available are in the tesseract manual (or on the command line type).

man tesseract

 PDFs

A few times I had PDFs as the original format to work with. In most cases these were multi-page PDFs. In order to use the script with these, I first needed to break out each page of the PDF and convert it to a PNG format. See here for a reason to choose PNG over other formats.

The ImageMagick command ‘convert’ will take care of that:

convert -density 600 -quality 100 original.pdf newfile.png

Depending on how many pages are in the PDF, the command can take quite a while to run. For a 30 page PDF, it took my laptop about 5 minutes. The end result is a PNG image for each page incrementally numbered beginning with zero. If the PDF above had four pages, I would end up with the following PNGs: newfile-0.png, newfile-1.png, newfile-2.png, newfile-3.png

Now I could run the ocrwm script in the directory and get OCR’ed and watermarked images. In this case I could leave off the ‘p’ option because I began with a PDF with all pages combined.

bash ocrwm -ow

 

Feel free to download the script, make changes or improvements, and send them back to me (via the github page).

 

Das war die Hölle

While reading through the survivor accounts that I gathered from the Neuengamme Concentration Camp Memorial last summer, I found a unique report. Apparently at one time either the Danish government, the National Museum in Copenhagen, or the Freedom Museum in Copenhagen put out a survey to former concentration camp inmates.

Axel Christian Hansen was one such inmate. Born in 1899, he was captured in Denmark as a political dissident on September 30, 1944. Sent first to Neuengamme, he was then sent to Porta Westfalica on October 3. His answers are terse, yet convey much; as do the questions left unanswered. Here are a few of the questions and his answers. The survey was conducted in Danish on an unspecified date, and translated into German in 1990.

The first section deals with his transportation from Neuengamme (near Hamburg) to Porta Westfalica.

Type of transportation: Cattle car/ passenger car/automobile/shipopen/closed

How many in each car: 50 men

Was there straw or carpet or other? No

Did you receive any rations during the trip? bread-jam-meat? No

How much?

Did you receive anything to drink? No

How did you relieve yourself? In the corner of the car.

Were there air raids? Yes

Did you stay in the cattle car? Yes

Was it locked? Yes

Where were the guards? In the first car.

Where there any dead or wounded? No

Where there any escape attempts? No

Was there any mistreatment? No

Further comments regarding the transportation and description of exceptional experiences.

There was no time to sleep in the train car because there were too many of us. When we were shipped to Porta, we were given a little bit of water and a little bit to eat from a guard.

 

The second part deals with the arrival in Porta Westfalica.

What did you have remaining of your things upon arrival? A belt.

Was your face or head shaved? Yes

Was your body shaved? Yes

Where you shaved in another way? Yes, with a reverse mowhawk

How often did you get a reverse mowhawk (Autobahn)? 3 times

When were you allowed to grow your hair? Never

 

Section three deals with daily life.

How often did you receive a change of clothes (approximate date received)? The prisoner clothes were never changed.

What was exchanged? Shirt and underpants were changed every third week.

Was there any opportunity to wash or receive washed clothing? No

What kind of shoes? Wooden shoes (clogs)

Condition of the shoes? bad

List your other personal belongings (toothbrush, soap, tissue, toilet paper, etc, and how long you had them)

How many roll calls were there per day? about 4-5

When? Mornings, evenings, middle of the night

How long did they normally last? from 1 to 3 hours

How long did the longest last? 3 hours

 

There is much more to be found in the document. It will be available in the document repository I am building with Omeka, where it can be translated and transcribed by anyone who wants.

Much about the camp life is known because of memoirs of the Danish political prisoners. Following are a couple of books by Danish survivors:

Kieler, Jørgen. Resistance Fighter: A Personal History of the Danish Resistance Movement, 1940-1945. Jerusalem, Israel; Lynbrook, NY: Gefen Publishing House, 2007.
Madsen, Benedicte, and Søren Willert. Survival in the Organization: Gunnar Hjelholt Looks Back at the Concentration Camp from an Organizational Perspective. Aarhus [Denmark]; Oakville, Conn.: Aarhus University Press, 1996.

 

History and Maps

I have been reading up on Geographical Information Systems/Sciences. There seem to be a number of flavors of combining history with maps and geographical data and methodologies. The various terms I have run across are Historical GIS, historical geography, cultural geography, spatial history or spatial humanities.

Here are the list of books I would like to tackle, with the first two being the most important for my research.

  • Knowles, Anne Kelly, Tim Cole, and Alberto Giordano. Geographies of the Holocaust. Bloomington: Indiana University Press, 2014.
  • Gregory, Ian, and Paul S Ell. Historical GIS: Technologies, Methodologies, and Scholarship. Cambridge; New York: Cambridge University Press, 2007.
  • Black, Jeremy. Maps and History: Constructing Images of the Past. New Haven: Yale University Press, 1997.
  • Bodenhamer, David J., John Corrigan, and Trevor M. Harris. The Spatial Humanities: GIS and the Future of Humanities Scholarship. Bloomington: Indiana University Press, 2010.
  • Daniels, Stephen, Dydia DeLyser, J. Nicholas Entrikin, and Doug Richardson. Envisioning Landscapes, Making Worlds: Geography and the Humanities. Milton Park, Abingdon, Oxon ; New York: Routledge, 2011.
  • Dear, Michael J. Geohumanities: Art, History and Text at the Edge of Place. London: Routledge, 2011.
  • Gaddis, John Lewis. The Landscape of History: How Historians Map the Past. Oxford: Oxford University Press, 2002.
  • Hillier, Amy, and Anne Kelly Knowles. Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship. Redlands, Calif: ESRI Press, 2008.
  • Knowles, Anne Kelly, and Amy Hillier. Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship. Pap/Cdr. ESRI Press, 2008.
  • Pickles, John. A History of Spaces: Cartographic Reason, Mapping, and the Geo-Coded World. London: Routledge, 2004.

I also started working through some of the interviews to pull out locations. This required me to first figure out what I was looking for. I decided to look for eight specific “locations” and map those on a map of the camp. My goal is to see if where these events happen influence why they happen. Or if there is any correlation or outliers after mapping the data.

I had a neat idea while reading one book: visualize what people knew geographically during a certain time. So the map represents the lands and info that a person/people knew about. Say a person from 1820’s England. Their “map” of the world would include Europe, eastern US, maybe a few other countries, but exclude Antarctica?, Irabian peninsula?, Madagascar?, Thailand?, India?

GSA 2014 – DH Panel

I recently participated in the annual German Studies Association Conference. (On a side note, my last professor at ASU, Dr. Gerald Kleinfeld started the GSA.)

I was delighted to be on one of two DH panels at the GSA. Since I don’t have anything noteworthy finished or started with my dissertation, I spoke about how the humanities can and should learn from the Open Source community. Specifically, the humanities can learn three things from the Open Source community, more particularly from how the communities that existed to create the Internet and open source software. The three points I talked about in the paper were:

  • Freedom of information, ability to share and collaborate on research in open and unrestricted ways
  • Ability to find more sources
  • Ability to allow a wide range of interested individuals to participate

Here is an HTML version of my presentation (unfortunately I used boring PowerPoint…):

[Presentation in OpenDocumentPresentation (LibreOffice)]

[Paper in Doc]

[Speaking Notes in PDF]

In the spirit of Open Source, I should have posted these much earlier before the conference and let interested individuals make comments, corrections and additions. But, as is all to common with me, I waited until the last minute and didn’t even finish until the day before. Nobody else does that, right? 🙂

The experience was great. I met some people, got some ideas, learned some things… Most importantly I got some ideas from Paul Jaskot about how to do the “digital aspect” of my dissertation. Details to follow in another post.

The Final Stretch

But not really. I’m no further than before, but the end is in sight, because I have to finish by January 2016 no matter what. I also have the opportunity to write up my dissertation work for a journal article, and that is due in summer 2015. So I have to be done by then. I will also be presenting my work at a conference in April 2015, so most of the work and write up has to be done by then.

So, you see what I’ve done? I’ve given myself strict deadlines by promising to have the dissertation done by a certain time. I’m holding myself accountable!

So, even though this is the final stretch, it’s still the whole race, practically.

I made a list of all the things that need to happen by the end of November.

  • Read through DH, GeoHistory, historical GIS books.
  • Fix the Omeka/Wiki/Scripto installation
  • Watermark all images
  • Import all images into Omeka
  • Contact Natalia Dudnik (GMU) about learning module for using Scripto
  • Contact UVA German department about learning module for using Scripto
  • Go through 5+ survivor testimonies and pick out all the uses of “place”
  • Figure out database schema to use while going through survivor testimonies
  • Finish chapter 1 about Germany getting ready for moving factories underground
    • Summary (numbers) of bombing raids (on Germany and Britain): total killed, total bombs, etc
    • bombing raid example of Britain
    • Bombing raid examples for Germany
    • Whole section on bombing’s goal of decreasing war production
      • plans, goals for Allied bombing of which factories, which areas of production to target
      • A couple of examples of factories being bombed, the view from the workers, the owners, the gov. Details about destruction, loss of production, work required to rebuild.
    • Big Week section needs to be flushed out.
      • Why Big Week important for this dissertation (it was the kick in the Nazi pants that got them seriously working towards moving factories underground.)?
      • goals
      • where they bombed and why
      • stats on bombing outcome
      • Flesh out big week experience as told by US pilot (better would be to have German perspective, or both)
    • More on Jägerstab
    • Rüstungsstab
    • SS building programs
    • Dispersal plans
    • Slave labor usage

I have also committed to blog more. Each week. Whatever I’ve done. I’ll write about that, or just post what I have written.