German Federal Archives, Crowdsourcing & the Wikimedia Commons

I spotted the New York Times article Historical Photos in Web Archives Gain Vivid New Lives via Dan Cohen’s Twitter Feed. The article is a nice treatment of the difference between the Library of Congress‘s 50 photo a week contributions to the Flickr Commons and the German Federal Archives‘ contribution of 100,000 images to the Wikimedia Commons (described as ” the virtual archive for material used in Wikipedia articles”).

I took a look at the details of this project – starting with the homepage of the Commons: Bundesarchiv on the Wikimedia Commons. This passage explains one of the goals of the Budesarchiv Gallery:

Very old photographs have become public domain, and events and persons of today can be photographed by Wikipedians with their digital cameras. But for the time between there is a huge gap in Wikipedia articles. The donation of Federal Archive is important to close that gap, and it is to hope that it can serve as a model to other institutions in Germany or elsewhere.

Also, each individual photo includes this disclaimer:

For documentary purposes the German Federal Archive often retained the original image captions, which may be erroneous, biased, obsolete or politically extreme. Factual corrections and alternative descriptions are encouraged separately from the original description.

There is a special category to call out instances of these types of descriptions – BArch images with biased descriptions. In my exploration, I discovered only a very few with these original image captions translated to English. One example is the photo of a single room home for a family of eleven.

In contrast to the Library of Congress addition of 50 photos a week, the German Federal Archive plans to add “a few thousand images a month”. The Commons:Bundesarchiv To Do list is also interesting reading. The To Do page includes tasks both in German and English (though the wiki discussion page is all in German). I love having the opportunity to read about issues confronting those working on this sort of project. For example – there is a discussion about how to determine if an image should remain Uncategorized. What if only 1 person out of three is tagged? Does it still ‘deserve’ to remain marked as ‘uncategorized’?

New categories created for use in this project need to use a special template so that they show up properly within the sub-categories of the Category:Images from the German Federal Archive page. For example – the page which sorts images by country has 64 sub-categories at the time of this post. A new country added using this template approach would immediately show up on the images by country sub-category page.

I will say that the learning curve for classifying images within the Wikimedia Commons in general, and the Budesarchiv project in specific, is much higher than tagging images in the Flickr Commons. There is a handy CommonSense tool (available via the ‘find categories’ tab on any image) that will suggest categories based on keywords, but even that is a bit overwhelming for a beginner.

As an example, let’s look at the image I chose for this post of two boys finishing their ice cream in 1949. Here are the categories currently assigned:

Let’s take a look at what the wiki text looks like to set these categories. First there is the special template for the project which specifies the year and location. I believe that these are attributes uploaded with the original photograph. This gives us the first two categories in our list (emphasis added mine):

{{BArch-License|
|signature=Bild 183 1984-0202-506
|batch=Bild 183
|year=1949
|month=
|location=Berlin
|PD=
}}

Then we get to the standard Wikimedia Commons categories. These are the categories most akin to tags in Flickr. These are the categories which will promote discovery of these images alongside images from other sources from across the Wikimedia Commons:

[[Category:History of Germany]]
[[Category:Ice cream]]
[[Category:Black and white photographs of children]]
[[Category:Black and white photographs of Germany]]
[[Category:Standing males]]
[[Category:Photographs by Brenner]]

These categories were clearly hand added by someone, since the original caption reads (by my rough translation) At the beach: “Is it already gone?”. I suppose I could go in and add [[Category:Beaches]], but I am honestly not sure if there is enough beach in the photo to warrant such a classification.

I am very curious to see comparison stats of the assignment of categories/tags to images in both the Flickr & Wikimedia Commons a year from now. How will we measure success? How will we grade the accuracy of metadata assigned by the public? Which images will get more public views and usage – those added to the Flickr Commons or those added to the Wikimedia Commons?

For now, I am happy to set aside all these thorny questions. I am just so pleased to see a new and ambitious experiment in crowdsourcing image metadata.

4 Comments

Gary
January 28, 2009 at 6:17 am

Interesting article thanks! I was not aware of either the German Federal Archives nor Library of Congress contributions to freely available photos.
Pingback:Open Knowledge Foundation Blog » Blog Archive » Open Everything Berlin + CC Salon Berlin
Pingback:DH2009: Digital Curiosities and Amateur Collections - Spellbound Blog
Mark
December 28, 2010 at 2:46 pm

A few thousand images per month? That’s no small undertaking, but I can’t imagine how 50 photos from the Library of Congress is acceptable. Over worked staff members, sure…..but come on guys that should only take an hour or so. Certainly we have a couple of interns which can help

Comments are closed.