Menu Close

Category: digitization

Jewish New Year 5769: Images and Words from the Past

Flickr LOC: Praying on the Brooklyn BridgeThe Jewish year of 5769 began at sunset of September 29th, 2008. The Jewish New Year (Rosh Hashanah) is a very reflective holiday, one in which individuals are encouraged to consider their own actions from the past year. It made me wonder what materials are available online to let us glimpse the celebration of Rosh Hashanahs long past.

A search in the Flickr Commons yielded this lovely Library of Congress image of women praying on the Brooklyn Bridge (likely participating in the ritual of Tashlikh).

The United States Holocaust Memorial Museum’s Collections & Archives has photos about Rosh Hashanah – including this optimistic card depicting a couple from the Fuerth displaced persons camp flying to Tel Aviv.

Yad Vashem has pulled together selections relating to Rosh Hashanah in an online collection called Marking the New Year.

I also found an assortment of treasures on the Internet Archive:

Rosh Hashanah Poem (1898)

 

These examples only scratch the surface of the archives and collections that include Jewish records. If this has peaked your interest, here are a few other websites to explore:

Know of others I missed – please add them in the comments below!

These sites are from suggestions in the comments:

Flickr Terms of Service, Unwritten Guidelines and Safety Levels

Flickr: Free Click by fikra (Sami Ben Gharbia)As more cultural heritage institutions add photos to Flickr, such as these sets added by the Smithsonian, an AP article discussing freedom of expression in online public spaces identifies some some issues that deserve attention. In ‘Public’ online spaces don’t carry speech, rights, Anick Jesdanun highlights a number of scenarios in which service providers (such as the Yahoo! owned Flickr) clash with their users, including this one (italics my own):

Dutch photographer Maarten Dors met the limits of free speech at Yahoo Inc.’s photo-sharing service, Flickr, when he posted an image of an early-adolescent boy with disheveled hair and a ragged T-shirt, staring blankly with a lit cigarette in his mouth.

Without prior notice, Yahoo deleted the photo on grounds it violated an unwritten ban on depicting children smoking. Dors eventually convinced a Yahoo manager that – far from promoting smoking – the photo had value as a statement on poverty and street life in Romania. Yet another employee deleted it again a few months later.

This image on Flickr gives more details about the photo being removed – and this is the reinstated photo in question. The article points out “Service providers write their own rules for users worldwide and set foreign policy when they cooperate with regimes like China. They serve as prosecutor, judge and jury in handling disputes behind closed doors.” It makes me wonder if the ‘unwritten guidelines’ are applied evenly across Flickr. With the creation of The Commons area, it would be easy to create two standards – one for the general public and another for ‘blessed’ institutions. Images that are acceptable from the Brooklyn Museum (consider this set of Behind The Scenes photos of the Ron Mueck exhibition) might not be accepted from the average person. In my research I discovered a set of Public Domain photos from the National Archives. Some of the photos included in this set are historically valuable images that I would not necessarily want a child to see. Does this mean they shouldn’t be on Flickr? I don’t think so, but that certainly isn’t up to me.

Here are the relevant passages of the Yahoo! Terms of Service:

You agree to not use the Service to:

  1. upload, post, email, transmit or otherwise make available any Content that is unlawful, harmful, threatening, abusive, harassing, tortious, defamatory, vulgar, obscene, libelous, invasive of another’s privacy, hateful, or racially, ethnically or otherwise objectionable;
  2. harm minors in any way;

You acknowledge that Yahoo! may or may not pre-screen Content, but that Yahoo! and its designees shall have the right (but not the obligation) in their sole discretion to pre-screen, refuse, or remove any Content that is available via the Service. Without limiting the foregoing, Yahoo! and its designees shall have the right to remove any Content that violates the TOS or is otherwise objectionable.

That bit about ‘otherwise objectionable’ could be used to cover removal of anything. Being subject to the terms of service of Internet service providers is nothing new, but as archives, libraries and other cultural heritage institutions look for ways to increase their revenue streams and explore innovative ways to bring more eyes to their materials it will become more import to understand these guidelines.

I understand (as the author of the article that inspired this post also points out) that Yahoo! is a business. Their priorities are not always going to be the same as those of the National Archives or the Brooklyn Museum. There are definitely images from history and the world of art that are only appropriate for adults, but isn’t that what Flickr’s content filter feature, named SafeSearch, is all about? These are the three ‘safety levels’ available on Flickr:

  • Safe – Content suitable for a global, public audience
  • Moderate – If you’re not sure whether your content is suitable for a global, public audience but you think that it doesn’t need to be restricted per se, this category is for you
  • Restricted – This is content you probably wouldn’t show to your mum, and definitely shouldn’t be seen by kids

It is interesting that Flickr has it’s own separate list of Community Guidelines, independent of Yahoo!’s terms of service. This is the passage from these guidelines about filtering content:

Take the opportunity to filter your content responsibly. If you would hesitate to show your photos or videos to a child, your mum, or Uncle Bob, that means it needs to be filtered. So, ask yourself that question as you upload your content and moderate accordingly. If you don’t, it’s likely that one of two things will happen. Your account will be reviewed then either moderated or terminated by Flickr staff.

I am still not sure what safety level I would use for a photo showing rows of dead in a concentration camp. I guess given the choices, ‘restricted’ is the best option – but that still doesn’t sit right with me somehow. I did an advanced Flickr search for ‘concentration camp’ with SafeSearch on – and those photos are not currently being marked as restricted. Who is it that we expect to be protecting using SafeSearch? From Flickr’s definition above it is supposed to at least be kids (and maybe your mom and Uncle Bob).

I think the question of the moment is how to know which images are appropriate to upload if some of the guidelines are unwritten. Flickr is a community and understanding the community is essential to success within that community. Once you believe your images are appropriate to include, then you must decide the right ‘safety level’. It is not clear to me how to tell the difference between an image that is not appropriate to be uploaded to Flickr and an image that is okay but needs to be marked with a safety level of ‘restricted’. I am very interested to see how this category of ‘appropriate but restricted’ evolves. For now, I am going to keep a watch on how the Flickr Commons grows and what range of content is included. The final answer for some of these images may be to only provide them via the institutions’ web sites rather than via service providers such as Flickr.

Image credit: Free Click by fikra (Sami Ben Gharbia) via Flickr

THATCamp 2008: Text Mining and the Persian Carpet Effect

alarch: Drift of Harrachov mine (Flickr)I attended a THATCamp session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible – but please forgive the fact that I did not catch the names of everyone who was part of this session.

What Is Text Mining?

Text mining is an umbrella phrase that covers many different techniques and types of tools.

The CHNM NEH-funded text mining initiative defined text mining as needing to support these three research functions:

  • Locating or finding: improving on search
  • Extraction: once you find a set of interesting documents, how do you extract information in new (and hopefully faster) ways? How do you pull data from unstructured bulk into structured sets?
  • Analysis: support analyzing the data, discovery of patterns, answering questions

The group discussed that there were both macro and micro aspects to text mining. Sometimes you are trying to explore a collection. Sometimes you are trying to examine a single document in great detail. Still other situations call for using text mining to generate automated classification of content using established vocabularies. Different kinds of tools will be important during different phases of research.

Projects, Tools, Examples & Cool Ideas

Andrea Eastman-Mullins, from Alexander Street Press, mentioned the University of Chicago’s ARTFL Project and these two tools:

  • PhiloLogic: An XML/SGML based full-text search, retrieval and analysis tool
  • PhiloMine: a extension being developed for PhiloLogic to provide support for “a variety of machine learning, text mining, and document clustering tasks”.

Dan Cohen directed us to his post about Mapping What Americans Did on September 11 and to Twistori which text mines Twitter.

Other Projects & Examples:

Some neat ideas that were mentioned for ways text mining could be used (lots of other great ideas were discussed – these are the two that made it into my notes):

  • Train a tool with collections of content from individual time periods, then use the tool to assist in identification of originating time period for new documents. Also could use this same setup to identify shifts in patterns in text by comparing large data sets from specific date ranges
  • If you have a tool that has learned how to classify certain types of content well… then watch for when it breaks – this can give you interesting trails to things to investigate.

Barriers to Text Mining

All of the following were touched upon as being barriers or challenges to text mining:

  • access to raw text in gated collections (ie, collections which require payment to permit access to resources) such as JSTOR and Project MUSE and others.
  • tools that are too difficult for non-programmers to use
  • questions relating to the validity of text mining as a technique for drawing legitimate conclusions

Next Steps

These ideas were ones put forward as important to move forward the field of text mining in the humanities:

  • develop and share best practices for use when cultural heritage institutions make digitization and transcription deals with corporate entities
  • create frameworks that enable individuals to reproduce the work of others and provide transparency into the assumptions behind the research
  • create tools and techniques that smooth the path from digitization to transcription
  • develop focused, easy-to-use tools that bridge the gap between computer programmers and humanities researchers

My thoughts
During the session I drew a parallel between the information one can glean in the field of archeology from the air that cannot be realized on the ground. I discovered it has a name:

“Archaeologists call it the Persian carpet effect. Imagine you’re a mouse running across an elaborately decorated rug. The ground would merely be a blur of shapes and colors. You could spend your life going back and forth, studying an inch at a time, and never see the patterns. Like a mouse on a carpet, an archaeologist painstakingly excavating a site might easily miss the whole for the parts.” from Airborne Archaeology, Smithsonian magazine, December 2005 (emphasis mine)

While I don’t see any coffee table books in the near future of text mining (such as The Past from Above: Aerial Photographs of Archaeological Sites), I do think that this idea captures the promise that we have before us in the form of the text mining tools. Everyone in our session seemed to agree that these tools will empower people to do things that no individual could have done in a lifetime by hand. The digital world is producing terabytes of text. We will need text mining tools just to find our way in this blizzard of content. It is all well and good to know that each snowflake is unique – but tell that to the 21st century historian soon to be buried under the weight of blogs, tweets, wikis and all other manner of web content.

Image credit: Drift of Harrachov Mine by alarch via flickr

As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Of Pirates, Treasure Chests and Keys: Improving Access to Digitized Materials

Key to Anything by Stoker Studios (flickr)Dan Cohen posted yesterday about what he calls The Pirate Problem. Basically the Pirate Problem can be summed up as “there are ways of acting and thinking that we can’t understand or anticipate.” Why is that a ‘Pirate Problem’? Because a pirate pub opened near his home and rather than folding shortly thereafter due to lack of interest from the ‘very serious professionals’ who populate DC suburbs – the pub was a rousing success due to the pirate aficionados who came out of the woodwork to sing sea shanties and drink grog. This surprising turn of events highlighted for him the fact that there are many ways of acting and thinking (some people even know all the words to sea shanties without needing sheet music).

Dan recently delivered the keynote speech at a workshop at the University of North Carolina at Chapel Hill. The workshop brought together dozens of historians to talk about how the 16 million archival documents of the Southern Historical Collection (SHC) should be put online. He devoted his keynote “to prodding the attendees into recognizing that the future of archives and research might not be like the past” and goes on in his post to explain:

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC.

Much of the stress of Dan’s article is on fear of new techniques of analysis. The choppy waters of text mining and pattern recognition threaten to wash away traditional methods of actually reading individual pages and “most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time”.

I certainly like the idea of new technologically based ways of analyzing large sets of cultural heritage materials, but I also believe that reading individual letters will always be important. The trick is finding the right letter!

And of course – we still need the context. It isn’t as if when we digitize major collections like the SHC that we are going to scan and OCR each page without regard to which box it came out of. We can’t slice and dice archival records and manuscripts into their component parts to feed into text analysis with no way back to the originals.

I like to imagine the combination of all the new technology (be it digitization, cross collection searching, text mining or pattern recognition) as creating keys to different treasure chests. Humanities scholars are treasure hunters. Some will find their gems through careful reading of individual passages. Others will discover patterns spread across materials now co-existing virtually that before digitization would have been widely separated by space and time. Both methods will benefit from the digitization of materials and the creation of innovative search and text analysis tools. Both still require an understanding of a material’s origin. The importance of context isn’t going anywhere – we still need to know which box the letter came from (and in a perfect world, which page came before and which came after). I want scholars to still be able to read one page from the box – I just want them to be able to do it from home in the middle of the night if they are so inclined with their travel budget no worse for wear.

Dan ties his post together by pointing out that:

… in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars.

In my opinion, the core message should be that we just found more locked treasure chests – and for those who are interested, we have some new keys that just might open those locks. I enjoyed the Pirate metaphor (obviously) and I appreciate that there are real issues here relating to strong discomfort with the fast changing landscape of technology, but I have to believe that if we do something that prevents historians from being able to read one letter at a time we are abandoning the treasure chests that are already open for the new ones for which we haven’t yet found the right keys. I am greedy. I want all the treasure!

Image credit: key to anything by Stoker Studios via flickr

ISSUU: Interesting Platform for Online Publishing

Issuu, with the tag line ‘Read the world. Publish the world.’ and pronounced ‘issue’, gives anyone the ability to upload a PDF document and publish it as an online magazine. I am intrigued by the possibilities of using this service to publish digitized archival records – especially those that would lend themselves to a ‘book’ style presentation (thinking here of a ledger or equivalent).

I am not sure I totally understand the implications of the Issuu Terms of service… especially this part:

By distributing or disseminating Uploader Submissions through the Issuu Service, you hereby grant to Issuu a worldwide, non-exclusive, transferable, assignable, fully paid-up, royalty-free, license to host, transfer, display, perform, reproduce, distribute, and otherwise exploit your Uploader Submissions, in any media forms or formats, and through any media channels, now known or hereafter devised, including without limitation, RSS feeds, embeddable functionality, and syndication arrangements in order to distribute, promote or advertise your Uploader Submissions through the Issuu Service.

If I am following that properly, all the rights you are granting to the Issuu Service are only for the purposes of their distribution of your uploaded PDF.

Issuu has a special Copyright FAQ, which in combination with Peter Hirtle‘s page on Copyright Term and the Public Domain in the United States, should support those trying to figure out if they can upload what they want to upload without getting into copyright related hot water.

So how is it different from a plain old PDF? Take a look at the embedded Issuu viewer below showing a 1908 copy of The Colonial Book of The Towle Manufacturing Company Silversmiths.

I don’t think this would ever be the way you would want to give online access to digitized records in general – but I do think that this could be a great way to highlight a particularly impressive set or volume of documents. If an archives featured one of these a month on their homepage – would people subscribe to their RSS feed just to see the new one? On the actual page on which I found the above document, Issuu makes it easy to subscribe to the RSS feed for the Issuu author ‘silverlibrary’.

I don’t know why Issuu has decided that I must create an account before I may view document author silverlibrary’s user profile. I would hope that there was an elegant way for visitors to see a group of Issuu documents created by the same author without having to create an account first (or ever).

Want to know what others think? Take a look at Finally, a Web-based PDF Viewer That Does Not Suck (Issuu) over on TechCrunch. One interesting tidbit I picked up from that review is that Issuu is based in Denmark. I wonder what impact that has on which copyright rules apply to the documents uploaded into Issuu.

Want to read more about their vision? Of course they have a press release in the form of an Issuu publication and I have embedded it below. I think my favorite line is that Issuu is intended to be ‘YouTube for Publications’.

I would love to see a highlighted section created for ‘cultural heritage materials’ (or something like that anyway). Take a look around Issuu and let me know what you think. Is this a viable tool for an archives or manuscript collection to use to highlight parts of their collection?

LOC + Flickr equals Crowdsourced Tagging

Flickr/LOC: Lily Smith between 1910 and 1915 (LC-B2- 2350-8)It is no surprise that the Library of Congress announcing the publication of images on Flickr is news both in mainstream news outlets and in the blogosphere. From librarian.net‘s short and cheery LoC goes 2.0! post to ArchivesNext‘s pondering Is Flickr “legitimate” for archives now that LOC is there?, I have seen a lot of discussion of LoC and Flickr in my RSS feeds.

What is it all about?

In case you have missed the details, the Library of Congress has published two photo collections on Flickr in a new subsection of the website called The Commons. The two collections are:

  • 1930s-40s in Color: 1615 photos taken by photographers working for the US government’s Farm Security Administration (FSA) and the Office of War Information (OWI) and covering “rural areas and farm labor, as well as aspects of World War II mobilization, including factories, railroads, aviation training, and women working between 1939 and 1944.”
  • News in the 1910s: 1500 photos taken by photographers who worked for the Bain News Service. Topics include “sports events, theater, celebrities, crime, strikes, disasters, and political activities, with a special emphasis on life in New York City.”

I enjoyed reading Flickr’s own blog post on the subject, Many hands make light work. It gave me a glimpse of their vision. For them, these two collections from the Library of Congress make up a pilot project – this is just the first step.

On their page for The Commons they first talk about their goals for the project:

Back in June of 2007, we began our first collaboration with a civic institution to facilitate giving people a voice in describing the content of a publicly-held photography collection.

The key goals of this pilot project are to firstly give you a taste of the hidden treasures in the huge Library of Congress collection, and secondly to show how your input of a tag or two can make the collection even richer.

On the homepage for the Library of Congress Flickr pilot I found this introduction:

The Library of Congress invites you to explore history visually by looking at interesting photos from our collections. Please add tags and comments, too! More words are needed to help more people find and use these pictures.

So, here we have a project between two large and well known organizations, with their goals carefully aligned. Let’s get more people looking at the amazing photos from the Library of Congress. Let’s also harness the curiosity and enthusiasm of those who want to be more involved and want to tag content. I love it!

Considering the Tags

So then I started looking at photos and the tags they have. I wish (being my database geek self) that I could see the groupings in which tags were added (ie, that one person added tags 3 through 10). They don’t seem to be displayed alphabetically – but rather in the order in which they were added to the photo.

I considered this photo from the 1930s-40s in Color collection:

LOC Woman Airplane Photo

The list below shows all the tags that were assigned to it, in the order in which the tags are displayed beside the photo above on Flickr (listed separated by commas to preserve space). The ‘Library of Congress’ tag has already been assigned to every photo in the collections upon upload, and therefore always appears first:

Library of Congress, Long Beach, california, 1942, october, WW2, USA, aircraft, douglas, Palmer, WWII, women, manufacturing, yellow, stripes, overalls, engine, Douglas Aircraft, engine installation, military aviation, World War II, women at work, historical photographs, slide film, 4×5, large format, LF, transparency, transparencies, world war 2, technology

In a world with no controlled vocabulary, there seems to be a theory at work of covering all your bases. Rather than noticing that someone had tagged this photo ‘WW2’, it was also later tagged with ‘WWII’, ‘World War II’ and ‘world war 2’. On another photo in the collection I know I saw the tag ‘wwii’. As long as there is no ‘offical’ version for this tag, I see the wisdom in tagging it with all of them – just to be sure.

The official description of the photo is: “Women are trained to do precise and vital engine installation detail in Douglas Aircraft Company plants, Long Beach, Calif. (1942 Oct)”. The metadata provided by the Library of Congress also includes information about the format of the film itself.

These are the subject headings assigned by the Library of Congress catalogers:

  • Douglas Aircraft Company

  • Airplane industry

  • Women–Employment

  • World War, 1939-1945

  • Assembly-line methods

  • United States–California–Long Beach

It is interesting to note that the main things that the independent taggers have captured that the professional catalogers haven’t are either non-topical aspects of the image (‘yellow’ and ‘overalls’) as well as broader more general ideas (‘military aviation’ and ‘technology’).

Does the tag ‘women at work’ tell you more than the LOC subject heading ‘Women–Employment’? Maybe, maybe not – but if you view all the images tagged ‘women at work’ across Flickr, now you can see these women from the 1940s at work beside photos such as three vendors and Bozo village life. Now this is something different. This is knitting threads from the ivory tower of libraries and archives into the communal tapestry that is Flickr. Not only might the addition of the ‘women at work’ tag make these images more accessible to the average person looking for Library of Congress photos – but it also puts these photos in the everyday path of many more people. It brings us firmly back to Flickr’s goal stated above of giving more people a “taste of the hidden treasures in the huge Library of Congress collection”.

Copyright

Flickr has this to say on The Commons’ home page about copyright:

These beautiful, historic pictures from the Library represent materials for which the Library is not the intellectual property owner. Flickr is working with the Library of Congress to provide an appropriate statement for these materials. It’s called “no known copyright restrictions.”

Hopefully, this pilot can be used as a model that other cultural institutions would pick up, to share and redistribute the myriad collections held by cultural heritage institutions all over the world.

I am with ArchivesNext in hoping that this move by the Library of Congress will give archivists and librarians on the ground in other institutions a bit more ammunition with which to fight for posting their images on Flickr. Copyright is one of the issues that seems to give so many organizations pause – so it is interesting to see this new category having been created specifically for cultural institutions. I like that they link back to the Library of Congress’s official answer about what it means if the catalog record notes ‘No known restrictions on publication’. Flickr also explicitly mentions that “If the pilot works – or, when it works! – we’ll look to allow other interested cultural institutions the opportunity to extend the application of “no known restrictions” to their catalogues.” So clearly “no known copyright restrictions” has been created with cultural institutions in mind.

Final Thoughts

I am intrigued to see how this progresses. If nothing else is accomplished, more people will certainly see images from the Library of Congress collections than they would have had none of these photos been published on Flickr. Some will even surf back to the Library of Congress website to learn more about their photo collecitons. For the example photo I selected above, there were already subject headings assigned – but for most of the Bain News Service photos all that is available are bits of “unverified data provided by the Bain News Service on the negatives or caption cards”. Every tag that is added improves the chances that an interested party may find the photo they need.

I have posted before about the potential of crowdsourcing. I am in favor of it. Yes, all the tags won’t be perfect. Yes, there will be seven different ways of tagging for World War II. But when all is said and done, more people will find more photos. More eyes will see the treasures that once were only available to those who could get inside temperature and humidity controlled vaults. And more people will have the opportunity to learn a tiny bit more about why cultural institutions like the Library of Congress are great!

Using WWI Draft Registration Cards for Research: NARA Records Provide Crucial Data

NARA:   	 World War I photograph, 1918 (ARC Identifier: 285374)

In the HealthDay article Having Lots of Kids Helps Dads Live to 100, a recent study was described that examined what increased the chances of a man living past 100.

A young, trim farmer with four or more children: According to a new study, that’s the ideal profile for American men hoping to reach 100 years of age. The research, based largely on data from World War I draft cards, suggests that keeping off excess weight in youth, farming and fathering a large number of offspring all help men live past a century.

The article mentions that this research was “spurred by the fact that a treasure trove of information about 20th-century American males has now been put online”. The study was based out of the University of Chicago’s Center on Aging. The paper, New Findings on Human Longevity Predictors, includes the following reference:

Banks, R. (2000). World War I Civilian Draft Registrations. [database on-line]. Provo, UT, Ancestry.com.

With an account on Ancestry.com, you too could examine the online database of World War I Draft Registration Cards. This Ancestry.com page notes the source of the original data as:

United States, Selective Service System. World War I Selective Service System Draft Registration Cards, 1917-1918. Washington, D.C.: National Archives and Records Administration. M1509, 4,582 rolls

NARA’s page for the World War I Selective Service System Draft Registration Cards, M1509 includes similar background information to what can be found on the Ancestry.com page, but of course – no access to the actual records.

It is frustrating to a study based on archival records that is making the news, but that does not make it clear to the reader that archival records were the source for the research. As I discussed at length in my post Epidemiological Research and Archival Records: Source of Records Used for Research Fails to Make the News, I feel that it is very important to take every opportunity to help the general public understand how archival records are supporting research that impacts our understanding of the world around us. I appreciate that partnering with 3rd parties to get government records digitized is often the only option – but I want people to be clear about why those records still exist in the first place.

Photo Credit: US. National Archives, World War I Photographs, 1918. Army photographs. Battle of St. Mihiel-American Engineers returning from the front; tank going over the top; group photo of the 129th Machine gun Battalion, 35th Division before leaving for the front; views of headquarters of the 89th Division next to destroyed bridge; Company E, 314th Engineers, 89th Division, and making rolling barbed wire entanglements. NAIL Control Number: NRE-75-HAS(PHO)-65

SAA2007: Archives and E-Commerce, Three Case Studies (Session 404)

George Washington US DollarDiane Kaplan, of Yale University Library’s Manuscripts and Archives unit, started off Session 404 (officially titled Exploring the Headwaters of the Revenue Stream) by thanking everyone for showing up for the last session of the day. This was a one hour session that examined ways to generate new funds through e-commerce . Three different e-commerce case studies were presented, followed by a short question and answer period.

University of Wyoming’s American Heritage Center

Mark Shelstad‘s presentation, “Show Me the Money: Or: How Do We Pay for This?”, detailed the approach taken by the University of Wyoming‘s American Heritage Center (AHC) to find alternate revenue streams. After completing a digitization project in the fall of 2004, the AHC had to figure out how to continue their project after their original grant money ran out.

Since they didn’t have a lot of in-house resources, they chose Zazzle.com for their effort to profit from their existing high resolution images. They can earn up to 17% from the sales through a combination of affiliate sales and profits from the sale of products featuring American Heritage Center images.

They had a lot of good reasons for choosing Zazzle.com. Zazzle.com already had an existing ‘special collections’ area, meaning that their images would have a better chance of being found by those interested in their offerings (for example – take a look at the Library of Congress Vintage Photos store). Zazzle.com also did not require an exclusive license to the images. The American Heritage Center Zazzle on-line store opened in 2005.

Currently they are making about $30 a month in royalties from 200 images. Mark pointed out that everyone needs to keep in mind that the major photo provider, Corbis, has yet to turn a profit in online photo sales. He also mentioned a website called Cogteeth.com that lets you click on any image and use those images on t-shirts, mugs.. etc.

Near the end of his talk, Mark shared an amazing idea to create a non-profit that would be a joint organization for featuring and selling products using archival images. I love it! It is easy to see that many archives are small and don’t have the infrastructure to create and run their own e-commerce websites. At the same time, general sites that let anyone set up a store to sell items with custom images on them threaten to loose the special nature of historical images in the shuffle. Even the special collections section of Zazzle lumps the American Heritage Center and the Library of Congress collections with Disney and Star Wars. I would love to see this idea grow!

Minnesota Historical Society

Kathryn Otto of the Minnesota Historical Society (MHS) spoke next. She first gave an overview of traditional services provided by MHS for a fee, such as photocopies, reader-printer copies, microfilm sales, media sales, inter-library loan fees, classes and photograph sales. MHS also earned income via standard use fees and research services.

The first e-commerce initiative at MHS was the sale of Minnesota State Death Certificates from 1904 – 2001. Made available via the Minnesota Death Certificate Index they provide the same data as Ancestry.com, but the MHS index provides a better search interface. They have had users tell them that they couldn’t find something on Ancestry.com – but that they were able to find what they needed on the MHS site.

To their existing Visual Resources Database, MHS also added a buy button for most images. Extra steps were added into the standard buy process to deal with the addition of a use fee depending on how the purchaser claims the image will ultimately be used. One approach that did not work for them was to offer expensively printed pre-selected images. The historical society sells classes online and can handle member vs non-member rates. TheVeterans Graves Registration Index is a tiny database that was created by reusing the interface used for the death certificates.

The Birth Certificate Index provides “single, non-certified copies of individual birth certificates reproduced from the originals” via the website.. while “[o]fficial, certified copies of these birth certificates are available through the Minnesota Department of Health.” The MHS site provides much faster and easier service than the Department of Health as can be seen from this page detailing how to order a non-certified copy of a birth record from the DOH – which requires printing, filling out and either faxing or snail mailing a form.

Features to keep in mind as you branch into in e-commerce:

  • Statistics – Consider the types of statistics you want. Their system just gave them info about orders – not how much they made.
  • Sales tax – Figure out how is it handled
  • Postage/Handling fees – Look at the details! The MHS Library-Archives was stuck with the Museum Store’s postage rates because the e-commerce system could not handle different fees for different types of objects.
  • Can’t afford credit card fees? Consider PayPal.
  • Advertise what you are selling on your own website.

Godfrey Memorial Library, Middletown, CT

The final panelist was Richard Black, Director of the Godfrey Memorial Library in Middletown, Connecticut. The Godfrey is a small, non-profit, genealogical research library with approximately 120,000 genealogical items. They currently have 5 full time staff and 60 volunteers.

Services they provide:

About 3 years ago they had exhausted all of their endowment money and faced the strong possibility of closing the doors. They were down to one full time librarian and a few volunteers and were dependent mostly on donations and some minor income from other sources/services.

They had only a few options open to them:

  • find more money from other sources
  • merge with another library
  • close the doors
  • sell some of the content
  • others??

The first approach to raise funds was to create a subscription website. The Godfrey acquired Heritage Quest census records and added other databases as resources allowed. Subscriptions were sold for $35 a year. The board thought they might be lucky to get 100 subscriptions.. but they actually got approximately 14,000!

Now the portal provides access to sites for which a premium has been paid (so that subscribers don’t have to pay), sites that are available free on the Internet (but made easier to find) and sites unique to Godfrey, including digitized material in the library and other material that has been made available to them. They just added 95,000 Jewish grave-sites – brought to them by a local rabbi. Another recent addition was a set of transcriptions of a grave-site made as an Eagle Scout project. They also negotiated to have their books digitized for them for free. The company performing the digitization will pay a royalty to Godfrey as the books are used.

The costs to acquire data for the portal includes $60,000 a year for access to premium sites, the cost to digitize and transcribe unique content (there are opportunities to partner and reduce costs) and the cost to acquire patrons. The efforts of the Godfrey staff and volunteers is ‘free’ – but costs time.

The Godfrey subsequently lost access to the Heritage Quest material. This was like taking the anchor store out of the corner of a mall. It forced them to diversify their revenue streams and watch for new opportunities.

Current revenue source distribution:

  • online portal 45%
  • annual appeal 10%
  • patron requests 5%
  • contract services 35% (OCLC analytical cataloging that they do)
  • misc 5%

The endowment funds have been restored and the Godfrey’s staff is now growing again.

Questions

Question: Did you meet resistance in your institutions?
Answer: No.. Minnesota said they had such success that the 2 questions they here now are A) What do we put online next? B) How long can they protect their income from the rest of the institution?

Question: (From someone from a NJ archives) Is there a way to do e-commerce with government records and not have the money ‘stolen’ from them?
Answer: Minnesota – The department of health was happy for death and birth certificates business to go away? They do worry about the future when they might try to make a marriage index – because that territory is already ‘owned’ by a group that wants to keep that income.

Question: When you charge for use fees – are there people who don’t pay them?
Answer: Minnesota: Probably – no way to really know.
Mark (American Heritage Center): Our images are public domain – they can do what they like with them.

Question: Do you brand your images?
Answer: Mark: Yes.. a logo and URL goes with the images.

My Thoughts

I was particularly impressed by how much information was conveyed in the course of the 1 hour session. My personal highlights were:

  • As I mentioned above, I want Mark’s idea for a non-profit to sell co-located products based on archival images to gain support and momentum.
  • I was pleased by the point that the MHS makes money from their Minnesota Death Certificate Index partly due to their improved and powerful search interface. The data is available elsewhere – but they made it easier to find information, so they will become the destination of choice for that information.
  • The Godfrey’s story is inspirational. In an age when we hear more and more often about archives and libraries being forced to cut back services due to funding shortfalls, it is great to hear about a small archives that pulled themselves back from the brink of disaster by brave experimentation.

These three case studies gave a great glimpse of some of the ways that archives can get on the e-commerce bandwagon. There is no magic here – just the willingness to dig in, figure out what can be done and try it. That said – there is definitely lots of room to learn from others successes and mistakes. The more real world success and failure stories archives share with the archival community about how to ‘do’ e-commerce, the easier it will be for each subsequent project to be a success.

As is the case with all my session summaries from SAA2007, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.