Menu Close

Category: access

ISSUU: Interesting Platform for Online Publishing

Issuu, with the tag line ‘Read the world. Publish the world.’ and pronounced ‘issue’, gives anyone the ability to upload a PDF document and publish it as an online magazine. I am intrigued by the possibilities of using this service to publish digitized archival records – especially those that would lend themselves to a ‘book’ style presentation (thinking here of a ledger or equivalent).

I am not sure I totally understand the implications of the Issuu Terms of service… especially this part:

By distributing or disseminating Uploader Submissions through the Issuu Service, you hereby grant to Issuu a worldwide, non-exclusive, transferable, assignable, fully paid-up, royalty-free, license to host, transfer, display, perform, reproduce, distribute, and otherwise exploit your Uploader Submissions, in any media forms or formats, and through any media channels, now known or hereafter devised, including without limitation, RSS feeds, embeddable functionality, and syndication arrangements in order to distribute, promote or advertise your Uploader Submissions through the Issuu Service.

If I am following that properly, all the rights you are granting to the Issuu Service are only for the purposes of their distribution of your uploaded PDF.

Issuu has a special Copyright FAQ, which in combination with Peter Hirtle‘s page on Copyright Term and the Public Domain in the United States, should support those trying to figure out if they can upload what they want to upload without getting into copyright related hot water.

So how is it different from a plain old PDF? Take a look at the embedded Issuu viewer below showing a 1908 copy of The Colonial Book of The Towle Manufacturing Company Silversmiths.

I don’t think this would ever be the way you would want to give online access to digitized records in general – but I do think that this could be a great way to highlight a particularly impressive set or volume of documents. If an archives featured one of these a month on their homepage – would people subscribe to their RSS feed just to see the new one? On the actual page on which I found the above document, Issuu makes it easy to subscribe to the RSS feed for the Issuu author ‘silverlibrary’.

I don’t know why Issuu has decided that I must create an account before I may view document author silverlibrary’s user profile. I would hope that there was an elegant way for visitors to see a group of Issuu documents created by the same author without having to create an account first (or ever).

Want to know what others think? Take a look at Finally, a Web-based PDF Viewer That Does Not Suck (Issuu) over on TechCrunch. One interesting tidbit I picked up from that review is that Issuu is based in Denmark. I wonder what impact that has on which copyright rules apply to the documents uploaded into Issuu.

Want to read more about their vision? Of course they have a press release in the form of an Issuu publication and I have embedded it below. I think my favorite line is that Issuu is intended to be ‘YouTube for Publications’.

I would love to see a highlighted section created for ‘cultural heritage materials’ (or something like that anyway). Take a look around Issuu and let me know what you think. Is this a viable tool for an archives or manuscript collection to use to highlight parts of their collection?

LOC + Flickr equals Crowdsourced Tagging

Flickr/LOC: Lily Smith between 1910 and 1915 (LC-B2- 2350-8)It is no surprise that the Library of Congress announcing the publication of images on Flickr is news both in mainstream news outlets and in the blogosphere. From librarian.net‘s short and cheery LoC goes 2.0! post to ArchivesNext‘s pondering Is Flickr “legitimate” for archives now that LOC is there?, I have seen a lot of discussion of LoC and Flickr in my RSS feeds.

What is it all about?

In case you have missed the details, the Library of Congress has published two photo collections on Flickr in a new subsection of the website called The Commons. The two collections are:

  • 1930s-40s in Color: 1615 photos taken by photographers working for the US government’s Farm Security Administration (FSA) and the Office of War Information (OWI) and covering “rural areas and farm labor, as well as aspects of World War II mobilization, including factories, railroads, aviation training, and women working between 1939 and 1944.”
  • News in the 1910s: 1500 photos taken by photographers who worked for the Bain News Service. Topics include “sports events, theater, celebrities, crime, strikes, disasters, and political activities, with a special emphasis on life in New York City.”

I enjoyed reading Flickr’s own blog post on the subject, Many hands make light work. It gave me a glimpse of their vision. For them, these two collections from the Library of Congress make up a pilot project – this is just the first step.

On their page for The Commons they first talk about their goals for the project:

Back in June of 2007, we began our first collaboration with a civic institution to facilitate giving people a voice in describing the content of a publicly-held photography collection.

The key goals of this pilot project are to firstly give you a taste of the hidden treasures in the huge Library of Congress collection, and secondly to show how your input of a tag or two can make the collection even richer.

On the homepage for the Library of Congress Flickr pilot I found this introduction:

The Library of Congress invites you to explore history visually by looking at interesting photos from our collections. Please add tags and comments, too! More words are needed to help more people find and use these pictures.

So, here we have a project between two large and well known organizations, with their goals carefully aligned. Let’s get more people looking at the amazing photos from the Library of Congress. Let’s also harness the curiosity and enthusiasm of those who want to be more involved and want to tag content. I love it!

Considering the Tags

So then I started looking at photos and the tags they have. I wish (being my database geek self) that I could see the groupings in which tags were added (ie, that one person added tags 3 through 10). They don’t seem to be displayed alphabetically – but rather in the order in which they were added to the photo.

I considered this photo from the 1930s-40s in Color collection:

LOC Woman Airplane Photo

The list below shows all the tags that were assigned to it, in the order in which the tags are displayed beside the photo above on Flickr (listed separated by commas to preserve space). The ‘Library of Congress’ tag has already been assigned to every photo in the collections upon upload, and therefore always appears first:

Library of Congress, Long Beach, california, 1942, october, WW2, USA, aircraft, douglas, Palmer, WWII, women, manufacturing, yellow, stripes, overalls, engine, Douglas Aircraft, engine installation, military aviation, World War II, women at work, historical photographs, slide film, 4×5, large format, LF, transparency, transparencies, world war 2, technology

In a world with no controlled vocabulary, there seems to be a theory at work of covering all your bases. Rather than noticing that someone had tagged this photo ‘WW2’, it was also later tagged with ‘WWII’, ‘World War II’ and ‘world war 2’. On another photo in the collection I know I saw the tag ‘wwii’. As long as there is no ‘offical’ version for this tag, I see the wisdom in tagging it with all of them – just to be sure.

The official description of the photo is: “Women are trained to do precise and vital engine installation detail in Douglas Aircraft Company plants, Long Beach, Calif. (1942 Oct)”. The metadata provided by the Library of Congress also includes information about the format of the film itself.

These are the subject headings assigned by the Library of Congress catalogers:

  • Douglas Aircraft Company

  • Airplane industry

  • Women–Employment

  • World War, 1939-1945

  • Assembly-line methods

  • United States–California–Long Beach

It is interesting to note that the main things that the independent taggers have captured that the professional catalogers haven’t are either non-topical aspects of the image (‘yellow’ and ‘overalls’) as well as broader more general ideas (‘military aviation’ and ‘technology’).

Does the tag ‘women at work’ tell you more than the LOC subject heading ‘Women–Employment’? Maybe, maybe not – but if you view all the images tagged ‘women at work’ across Flickr, now you can see these women from the 1940s at work beside photos such as three vendors and Bozo village life. Now this is something different. This is knitting threads from the ivory tower of libraries and archives into the communal tapestry that is Flickr. Not only might the addition of the ‘women at work’ tag make these images more accessible to the average person looking for Library of Congress photos – but it also puts these photos in the everyday path of many more people. It brings us firmly back to Flickr’s goal stated above of giving more people a “taste of the hidden treasures in the huge Library of Congress collection”.

Copyright

Flickr has this to say on The Commons’ home page about copyright:

These beautiful, historic pictures from the Library represent materials for which the Library is not the intellectual property owner. Flickr is working with the Library of Congress to provide an appropriate statement for these materials. It’s called “no known copyright restrictions.”

Hopefully, this pilot can be used as a model that other cultural institutions would pick up, to share and redistribute the myriad collections held by cultural heritage institutions all over the world.

I am with ArchivesNext in hoping that this move by the Library of Congress will give archivists and librarians on the ground in other institutions a bit more ammunition with which to fight for posting their images on Flickr. Copyright is one of the issues that seems to give so many organizations pause – so it is interesting to see this new category having been created specifically for cultural institutions. I like that they link back to the Library of Congress’s official answer about what it means if the catalog record notes ‘No known restrictions on publication’. Flickr also explicitly mentions that “If the pilot works – or, when it works! – we’ll look to allow other interested cultural institutions the opportunity to extend the application of “no known restrictions” to their catalogues.” So clearly “no known copyright restrictions” has been created with cultural institutions in mind.

Final Thoughts

I am intrigued to see how this progresses. If nothing else is accomplished, more people will certainly see images from the Library of Congress collections than they would have had none of these photos been published on Flickr. Some will even surf back to the Library of Congress website to learn more about their photo collecitons. For the example photo I selected above, there were already subject headings assigned – but for most of the Bain News Service photos all that is available are bits of “unverified data provided by the Bain News Service on the negatives or caption cards”. Every tag that is added improves the chances that an interested party may find the photo they need.

I have posted before about the potential of crowdsourcing. I am in favor of it. Yes, all the tags won’t be perfect. Yes, there will be seven different ways of tagging for World War II. But when all is said and done, more people will find more photos. More eyes will see the treasures that once were only available to those who could get inside temperature and humidity controlled vaults. And more people will have the opportunity to learn a tiny bit more about why cultural institutions like the Library of Congress are great!

SAA2008 Here I Come! After the Revolution: Unleashing the Power of EAD

SAA2008 I got the word just before the holidays – the panel proposal of which I was a part has been accepted for SAA 2008 in San Francisco . The title of the panel is ‘After the Revolution: Unleashing the Power of EAD’ and the working title for my paper/presentation is ‘Visualizing Archival Collections: Leveraging the Power of EAD’.

My co-presenters are Max Evans (currently of the NHPRC, soon to be of the LDS Church Historical Department) and Elizabeth Yakel (of University of Michigan, School of Information). Jodi Allison-Bunnell from Northwest Digital Archives, Orbis Cascade Alliance is our panel Chair.

This is the description of our panel that we submitted with our proposal:

Encoded Archival Description (EAD) was created in 1995 to increase uniformity and interoperability of data about archival collections to facilitate discovery. It has yet to realize that goal: most online finding aids merely recreate paper documents. Speakers will demonstrate how the structured, standardized nature of EAD can form the basis of user-friendly interfaces and finding aids that can accommodate multiple perspectives and utilize graphical and visual interfaces–while faithfully recording and presenting the context, structure, and content of the collection. Panelists will also address the challenges of unleashing the power of EAD, including normalizing XML, the lack of standard values for cross-institutional aggregation of data, and different approaches to subject terms, with a discussion of the technological and practical issues that surround them. The session relates to the SAA strategic priorities of technology and public awareness and engages elemental questions of revolutionary and evolutionary change.

My portion of the panel will focus on my ArchivesZ information visualization project. I will be discussing both the power of this type of graphical interface to archival collections as well as addressing the roadblocks to their practical implementation. My plan is to continue the work I started last Spring over the course of this Spring and Summer – and show off a new version of ArchivesZ in San Francisco (as well as online here of course!).

Here are the descriptions of Max, Elizabeth and Jodi’s planned contributions (cribbed from our proposal submission):

  • Max Evans will explore the fundamental purposes of finding aids and explore what can be done to leverage EAD’s structure to render graphical, informative, and elegant finding aids online.
  • Elizabeth Yakel will discuss usability test findings and how these were incorporated into the EAD-based Polar Bear Expedition Digital Collections to allow communities to engage with collections in new ways.
  • Jodi Allison-Bunnell brings a lively interest in user-centered presentations of finding aids that emerge from her work as manager of a five-state EAD consortium.

I am so pleased and excited. So – who is planning on going to San Fransisco in August? I hope to see you there.

Image Credit: Society of American Archivists, ARCHIVES 2008: Archival R/Evolution & Identities web page.

Digital Preservation via Emulation – Dioscuri and the Prevention of Digital Black Holes

dioscuri.JPGAvailable Online posted about the open source emulator project Dioscuri back in late September. In the course of researching Thoughts on Digital Preservation, Validation and Community I learned a bit about the Microsoft Virtual PC software. Virtual PC permits users to run multiple operating systems on the same physical computer and can therefore facilitate access to old software that won’t run on your current operating system. That emulator approach pales in comparison with what the folks over at Dioscuri are planning and building.

On the Digital Preservation page of the Dioscuri website I found this paragraph on their goals:

To prevent a digital black hole, the Koninklijke Bibliotheek (KB), National Library of the Netherlands, and the Nationaal Archief of the Netherlands started a joint project to research and develop a solution. Both institutions have a large amount of traditional documents and are very familiar with preservation over the long term. However, the amount of digital material (publications, archival records, etc.) is increasing with a rapid pace. To manage them is already a challenge. But as cultural heritage organisations, more has to be done to keep those documents safe for hundreds of years at least.

They are nothing if not ambitious… they go on to state:

Although many people recognise the importance of having a digital preservation strategy based on emulation, it has never been taken into practice. Of course, many emulators already exist and showed the usefulness and advantages it offer. But none of them have been designed to be digital preservation proof. For this reason the National Library and Nationaal Archief of the Netherlands started a joint project on emulation.

The aim of the emulation project is to develop a new preservation strategy based on emulation.

Dioscuri is part of Planets (Preservation and Long-term Access via NETworked Services) – run by the Planets consortium and coordinated by the British Library. The Dioscuri team has created an open source emulator that can be ported to any hardware that can run a Java Virtual Machine (JVM). Individual hardware components are implemented via separate modules. These modules should make it possible to mimic many different hardware configurations without creating separate programs for every possible combination.

You can get a taste of the big thinking that is going into this work by reviewing the program overview and slide presentations from the first Emulation Expert Meeting (EEM) on digital preservation that took place on October 20th, 2006.

In the presentation given by Geoffrey Brown from Indiana University titled Virtualizing the CIC Floppy Disk Project: An Experiment in Preservation Using Emulation I found the following simple answer to the question ‘Why not just migrate?’:

  • Loss of information — e.g. word edits

  • Loss of fidelity — e.g. WordPerfect to Word isn’t very good

  • Loss of authenticity — users of migrated document need access to original to verify authenticity

  • Not always possible — closed proprietary formats

  • Not always feasible — costs may be too high

  • Emulation may necessary to enable migration

After reading through Emulation at the German National Library, presented by Tobias Steinke, I found my way to the kopal website. With their great tagline ‘Data into the future’, they state their goal is “…to develop a technological and organizational solution to ensure the long-term availability of electronic publications.” The real gem for me on that site is what they call the kopal demonstrator. This is a well thought out Flash application that explains the kopal project’s ‘procedures for archiving and accessing materials’ within the OAIS Reference Model framework. But it is more than that – if you are looking for a great way to get your (or someone else’s) head around digital archiving, software and related processes – definitely take a look. They even include a full Glossary.

I liked what I saw in Defining a preservation policy for a multimedia and software heritage collection, a pragmatic attempt from the Bibliothèque nationale de France, a presentation by Grégory Miura, but felt like I was missing some of the guts by just looking at the slides. I was pleased to discover what appears to be a related paper on the same topic presented at IFLA 2006 in Seoul titled: Pushing the boundaries of traditional heritage policy: Maintaining long-term access to multimedia content by introducing emulation and contextualization instead of accepting inevitable loss . Hurrah for NOT ‘accepting inevitable loss’.

Vincent Joguin’s presentation, Emulating emulators for long-term digital objects preservation: the need for a universal machine, discussed a virtual machine project named Olonys. If I understood the slides correctly, the idea behind Olonys is to create a “portable and efficient virtual processor”. This would provide an environment in which to run programs such as emulators, but isolate the programs running within it from the disparities between the original hardware and the actual current hardware. Another benefit to this approach is that only the virtual processor need be ported to new platforms rather than each individual program or emulator.

Hilde van Wijngaarden presented an Introduction to Planets at EEM. I also found another introductory level presentation that was given by Jeffrey van der Hoeven at wePreserve in September of 2007 titled Dioscuri: emulation for digital preservation.

The wePreserve site is a gold mine for presentations on these topics. They bill themselves as “the window on the synergistic activities of DigitalPreservationEurope (DPE), Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval (CASPAR), and Preservation and Long-term Access through NETworked Services (PLANETS).” If you have time and curiosity on the subject of digital preservation, take a glance down their home page and click through to view some of the presentations.

On the site of The International Journal of Digital Curation there is a nice ten page paper that explains the most recent results of the Dioscuri project. Emulation for Digital Preservation in Practice: The Results was published in December 2007. I like being able to see slides from presentations (as linked to above), but without the notes or audio to go with them I am often left staring at really nice diagrams wondering what the author’s main point was. The paper is thorough and provides lots of great links to other reading, background and related projects.

There is a lot to dig into here. It is enough to make me wish I had a month (maybe a year?) to spend just following up on this topic alone. I found my struggle to interpret many of the Power Point slide decks that have no notes or audio very ironic. Here I was hunting for information about the preservation of born digital records and I kept finding that the records of the research provided didn’t give me the full picture. With no context beyond the text and images on the slides themselves, I was left to my own interpretation of their intended message. While I know that these presentations are not meant to be the official records of this research, I think that the effort obviously put into collecting and posting them makes it clear that others are as anxious as I to see this information.

The best digital preservation model in the world will only preserve what we choose to save. I know the famous claim on the web is that ‘content is king’ – but I would hazard to suggest that in the cultural heritage community ‘context is king’.

What does this have to do with Dioscuri and emulators? Just that as we solve the technical problems related to preservation and access, I believe that we will circle back around to realize that digital records need the same careful attention to appraisal, selection and preservation of context as ‘traditional’ records. I would like to believe that the huge hurdles we now face on the technical and process side of things will fade over time due to the immense efforts of dedicated and brilliant individuals. The next big hurdle is the same old hurdle – making sure the records we fight to preserve have enough context that they will mean anything to those in the future. We could end up with just as severe a ‘digital black hole’ due to poorly selected or poorly documented records as we could due to records that are trapped in a format we can no longer access. We need both sides of the coin to succeed in digital preservation.

Did I mention the part about ‘Hurray for open source emulator projects with ambitious goals for digital preservation’? Right. I just wanted to be clear about that.

Image Credit: The image included at the top of this post was taken from a screen shot of Dioscuri itself, the original version of which may be seen here.

Will Crashed Hard Drives Ever Equal Unlabeled Cardboard Boxes?

Photo of Crashed Hard Drive - wonderferret on FlickrHow many of us have an old hard drive hanging around? I am talking about the one you were told was unfixable. The one that has 3 bad sectors. The one they replaced and handed to you in one of those distinctive anti-static bags. You know the ones I mean – the steely grey translucent plastic ones that look like they should contain space food.

I have more than one ‘dead’ hard drive. I can’t quite bring myself to throw them out – but I have no immediate plans to try and reclaim their files.

I know that there are services and techniques for pulling data off otherwise inaccessible hard drives. You hear about it in court cases and see it on TV shows. A quick Google search on hard drive rescue turns up businesses like Disk Data Recovery

Do archivists already make it a policy to hunt not just for computers, but for discarded and broken hard drives lurking in filing cabinets and desk drawers? Compare this to a carton of documents that needed special treatment to permit access to the records they contained and yet are appraised as valuable. If the treatment required were within budgetary and time constraints – it would be performed. Mold, bugs, rusty staples, photos that are stuck together… archivists generally know where to get the answers they need to tackle these sorts of problems. I suspect that a hard drive advertised or discovered to be broken would be treated more like an empty box than a moldy box.

For now I would stack this challenge near the bottom of the list below archiving digital records that we can access easily but that run on old hardware or software, but I can imagine a time when standard hard drive rescue techniques will need to be a tool for the average archivist.

Using WWI Draft Registration Cards for Research: NARA Records Provide Crucial Data

NARA:   	 World War I photograph, 1918 (ARC Identifier: 285374)

In the HealthDay article Having Lots of Kids Helps Dads Live to 100, a recent study was described that examined what increased the chances of a man living past 100.

A young, trim farmer with four or more children: According to a new study, that’s the ideal profile for American men hoping to reach 100 years of age. The research, based largely on data from World War I draft cards, suggests that keeping off excess weight in youth, farming and fathering a large number of offspring all help men live past a century.

The article mentions that this research was “spurred by the fact that a treasure trove of information about 20th-century American males has now been put online”. The study was based out of the University of Chicago’s Center on Aging. The paper, New Findings on Human Longevity Predictors, includes the following reference:

Banks, R. (2000). World War I Civilian Draft Registrations. [database on-line]. Provo, UT, Ancestry.com.

With an account on Ancestry.com, you too could examine the online database of World War I Draft Registration Cards. This Ancestry.com page notes the source of the original data as:

United States, Selective Service System. World War I Selective Service System Draft Registration Cards, 1917-1918. Washington, D.C.: National Archives and Records Administration. M1509, 4,582 rolls

NARA’s page for the World War I Selective Service System Draft Registration Cards, M1509 includes similar background information to what can be found on the Ancestry.com page, but of course – no access to the actual records.

It is frustrating to a study based on archival records that is making the news, but that does not make it clear to the reader that archival records were the source for the research. As I discussed at length in my post Epidemiological Research and Archival Records: Source of Records Used for Research Fails to Make the News, I feel that it is very important to take every opportunity to help the general public understand how archival records are supporting research that impacts our understanding of the world around us. I appreciate that partnering with 3rd parties to get government records digitized is often the only option – but I want people to be clear about why those records still exist in the first place.

Photo Credit: US. National Archives, World War I Photographs, 1918. Army photographs. Battle of St. Mihiel-American Engineers returning from the front; tank going over the top; group photo of the 129th Machine gun Battalion, 35th Division before leaving for the front; views of headquarters of the 89th Division next to destroyed bridge; Company E, 314th Engineers, 89th Division, and making rolling barbed wire entanglements. NAIL Control Number: NRE-75-HAS(PHO)-65