access | Spellbound Blog

Political Campaign Ads from the NBC News Archives Find New Audience on Hulu.com

October 28, 2008 2 Comments

Thinking about politics, but waxing nostalgic for the good old days of movie stars and snappy jingles? Surf over to Hulu.com’s new gallery of Historic Campaign Ads. These are from iCue, which bills itself as “A fun, innovative learning environment built around the video from the NBC News Archives“.

And what would a political video blog post be without a political video? If you don’t see the video below, you can click through to view the I Like Ike ad from 1954 I chose for your viewing pleasure.

This is a great example of finding new audiences for material from archives. In this case, I had to dig for a while to discover that these were from the NBC News Archives. The Hulu iCue network/studio home page doesn’t really tell me anything – but you can imagine using a page like this to supply more information if you wanted to stress the archival origin of a set of videos.

Celebrating NASA’s 50th with NASA Images

September 25, 2008 1 Comment

October 1st, 2008 marks the 50th anniversary of the National Aeronautics and Space Administration (NASA). It is interesting to contrast the feel of the NASA 50th Anniversary Web Site with the 15 NARA/NASA videos currently posted on Google Video, but my favorite site for celebrating NASA’s 50 year journey is the amazing NASA Images website.

I learned about this site on a tour of the Internet Archive while out in California for SAA 2008. It contains still and moving images from across NASA. For the first time these visual materials have been pulled together and methodically assigned appropriate metadata. This means that you can do really nice advanced searches and faceted browsing of search results. Before this effort, there was no standardized set of attributes describing these visual materials being created across NASA.

The NASA Images about page explains:

NASA Images is a service of Internet Archive ( www.archive.org ), a non-profit library, to offer public access to NASA’s images, videos and audio collections. NASA Images is constantly growing with the addition of current media from NASA as well as newly digitized media from the archives of the NASA Centers.

The goal of NASA Images is to increase our understanding of the earth, our solar system and the universe beyond in order to benefit humanity.

It goes on to explain that the Internet Archive receives no financial support from NASA and that NASA Images is currently funded through a grant from the Kahle-Austin Foundation. They are currently looking for new grants and sponsorships to fund upcoming projects.

Also, according to their published Terms and Conditions, they have made an effort to only include non-copyrighted images (though they don’t guarantee it). This is an amazing wealth of images and movies available for public use. The terms state that “You may use this NASA imagery for educational or informational purposes, including photo collections, textbooks, public exhibits and Internet Web pages (personal or otherwise). ”

I have embedded below footage of plumes of hot gas shooting across the surface of the sun. Check out this photo of the original 7 astronauts in their very shiny spacesuits from 1968. Happy Golden Anniversary NASA!

Image Credit: NASA 50th Anniversary logo designed by Crabtree + Company. Read more about Crabtree + Company’s thoughts behind the creation of the logo.

SAA2008: Revealing Archival Collections at the Web’s Surface (Session 102)

September 2, 2008 2 Comments

The official title of Session 102 was We’re Not the Destination, We’re the Journey: Revealing Archival Collections at the Web’s Surface. If you attended this session or don’t want to read through the details, you can skip to the end and just read my thoughts on this session.

California Digital Library

The first presentation was by Lena Zentall of the California Digital Library (CDL). I believe it was titled something like “Untitled <snappy name here>”. CDL is increasing visibility of primary sources by targeting primary sources to specific audiences. Lena described how they view the URL as a line to reel in new audiences. She started with an overview of how archival content traditionally makes its way online.

Start with a box -> described by finding aids -> digital copies of finding aids put on line and cherry picked individual items are digitized to be featured online.

Two Audiences, Two Sites

CDL has taken a new approach. They have two sites for two very different audiences:

Online Archive of California (OAC): presents both finding aids and digitized primary sources and targets archivists, historians & researchers
Calisphere – only takes primary sources (for now) and targets k-12 teachers, lifelong learners, and undergraduates

Collections can have home in several places. For example, the items about the Chinese in California can be found in:

OAC: Guide to the Chinese in California Virtual Collection
Calisphere: As a subset of the California Cultures: Asian Americans collection, including the Chinese Exclusion Act
Library of Congress American Memory: The Chinese in California, 1850-1925

Calisphere has created themed collections to highlight superstar digital objects. They pull images out of the finding aids and rearrange them for the target audience. These images are hand picked and associated with an essay. They pick striking objects with good metadata. This is what their audience wants – the teachers asked for it. Another example themed collection is the Goldrush Murder & Mayhem collection which includes this photo of the “old time San Francisco pickpocket” Jennie Hastings.

Hidden Gems: Untitled and No Metadata

The next part of the presentation discussed what happens to items that are untitled and associated with no metadata. Lena showed us the results when you searched the OAC images for for untitled. I found 12,315 items when I did this search. They really only live in the context of the finding aid. Of course the challenge is that people use words to find images. These hidden gems can be helped by inheriting the metadata of their parent container (such as collection level information) when there is nothing else.

3 Approaches

Digitize and release content to the web: low effort (after infrastructure is set up), very high return on investment. Over 40% of Calisphere traffic generated by google searches… but when users follow the link from google then they find the rich context.
Align with other aggregators: – low/medium effort, medium return. Calipshere content is also being pulled into aggregators. They can also pull back new data that is added by 3rd party partners – such as reading level added on a teacher site. These are three examples of Murder and Mayhem content in three different partner sites:
- CLRN: Murder and Mayhem – (California Learning Resources Network)
- EdZone: Murder & Mayhem
- OER: Murder and Mayhem (Open Educational Resources) lets users add tags and search by keywords
Cherry-picking the best items: high effort, promising returns – but it is also harder to measure the returns

Finding New Audiences and New Volunteers

The next step is to reach beyond standard cultural and education venues and move into different ares of the internet. For example, the CDL added links to Wikipedia. The perception of those involved with this effort was that it was a very convoluted process with lots of mysterious rules. They were unsure if the links would remain in place. It sometimes seemed like a lot of work when the links might just be removed. They added 33 links and found 53 links made by others not affiliated with the CDL. On the plus side, links like this puts the digital objects in a very specific context. Traffic initiated from these Wikipedia entries is almost certainly individuals seeking detailed information in the specific topic they are researching.

The next frontier involves blogs. CDL digital items are now featured in blogs, but soon CDL will be creating a blog for Calisphere to tell the story behind individual pictures. The final stop for this talk was an inspirational blog: Mustaches of the Nineteenth Century. This blog was presented as a way to achieve the fame that primary sources dream about.

Library of Congress

The second presentation, by Helena Zinkham from the Library of Congress Prints and Photographs division, was titled “The New Friends for Old Photos – putting pictures in your path with the Flickr commons and Web 2.0”. This talk focused on the pilot project of putting Library of Congress photos on Flickr in the new Flickr Commons.

People who want photos don’t think of libraries or archives. They go to museums and stock photo agencies. Helena wants to help people realize that archives are a great source of images.

There has been increasing progress with hidden collections. Lots of digitization and work with metadata has been done to help items make their way online. But this begs the question of whether we are just creating new hidden collections in corners of the Internet that the average person will never come in contact with. Collections like ArchiveGrid, DLF Aquifer, and OAC. The descriptions need to get out of the catalogs – most people find content on the web.. we need to put the images on the web in the path of the users.

The Flickr commons satisfied Helena’s desire to pull people in from Flickr back to discover the catalog world of archives. Flickr can be considered a virtual reading room and platform for a virtual volunteer corp. Helena showed the example of the image Weavers at Work. The comments on this photo included:

information that photo is of blind women weaving rugs
the photographer’s great grandchild identified the photographer as Percy Byron
the start of a discussion about what the cabinet or instrument might be shown to the far right of the photo

These commenters are new friends worth making!

Pros of Web 2.0

make collection available
gain information about collections – participatory description
increase the visibility of specific photos
win support for cultural heritage organizations

Risks of Web 2.0

disrespect for collections (smart aleck chat)
loss of meaning
reduce revenue from photo sales
excludes undigitized collections
higher costs (more money and time)
less chance for us to have fun as history detectives – other people are doing ‘our’ work

read powerhouse museums’ 3 month report about their experience. … Helena will post info about the nuts and bolts on the SAA site, but she also directed the audience to Powerhouse Museum’s Commons on Flickr First 3 Months Report.

Flickr Basics

Helena asked the session attendees who was familiar with flicker? Most of the room raised their hands. Who has accounts? Still good number. Who is adding archival content? A sprinkling of hands were raised.

Helena then explores Flickr basics and showed off the following neat search examples:

A search for germany schaefer in Google finds Flickr photos (as well as Flickr photo comments). The LOC Germany Schaefer photo was returned 4th on my list when I did the search when writing up this post.
A search for houston house search within Flickr co-mingles old and current photos

Logistics and Statistics

The LOC liked Flickr and felt it was a good fit because photographs are the main focus of the site. They did need one big change. Because LOC is not the owner or photographer (unlike most photo contributors), they needed a way to express that clearly. Flickr responded by creating The Commons. They also created a new rights statement of ‘no known copyright restrictions’ for members of The Commons to use. This is different from public domain. Flickr also appears (based on my hunt through the links) to permit each institutions in The Commons to link to their own explanation about what they mean by ‘no known copyright restrictions’. LOC deep links to a specific section of their Copyright and Other Restrictions page for Prints & Photographs. George Eastman House has a special George Eastman House & The Commons on Flickr page about copyright, as does the Brooklyn Museum.

Statistics from the first 6 moths on Flickr:

3,500 LOC photos posted
8 million views
30,000 favorites for 80% of the photos
14,000 Flickr members made LOC a contact
5,000 comments (3,300 people)
12,500 unique tags (59,000 total)
500 catalog records updated – Helena indicated that this could be considered a new kind of backlog, “but a backlog you can come to like”
20% increased traffic to p&p online catalog

There are 30,000 more photos from Bain News Service on the way, but they are only adding fifty photos a week. This number was recommended by Flickr as the largest they would want to push at any one time. This goes back to the tolerance of people who have Flickr in their friend photo stream. Fifty photos is about as many as people want to get at any one time. More than that and you increase the likelihood that people would remove you from their stream instead of be overwhelmed. They would have no chance to really look at more than that.

Contributors to The Commons can choose which features to enable. For example, the Portrait of Hine as small child standing by drum shows how george eastman house chooses to send people back to their institution for prints.

How much does it cost?

a Flickr pro account costs $24.95 a year
digitization costs
time: daily moderation on the account – LOC checks every day for uncivil discourse which takes about 10 minutes
15-20 hours a week to pull data from comments to update metadata

Flickr Comments

One of the greatest parts of this presentation was the examination of ways in which flicker users contributed through comments. Here are some examples:

Auto Polo: – comment includes link to an auto polo thread on the Jalopy Journal’s message board which includes newspaper images and an extended discussion.
Sylvia Sweets Tea Room – includes a very extensive history of the business added by the daughter of the original proprietor
Negro boy near Cincinnati, Ohio – the comments include a deep conversation about the title of the photo and the context of this title at the time it was taken (1942 or 1943).
Jones Barn where dynamite was found – Flickr members found the context and news article to go with this photo
Al Palzer – this photo’s original title was Al Palser – but the misspelling was pointed out in the comments. The comments also include a response from the LOC noting that the boxer’s name would be updated in the original catalog record.

Other Promotion Approaches

The Library of Congress has now started linking out from the LOC catalog entries to the Flickr image so that it is easy for users to discover any conversations associated with the Flickr version. Powerhouse museum has a Photo of the Day blog to highlight images from their collection. The Brooklyn Museum encourages people to upload photos of things happening in Brooklyn. Then and now photos can be taken – in this case see factory buildings in Lowell, Massachusetts in December 1940/January 1941 and then again in January of 2008.

The key to 2.0 is frequent, new content and interaction from archival staff. Helena is open to new ideas about how to use Flickr and closed with saying that Web 2.0 is right in our path.

Questions and Answers

Question: What is their view of the accuracy/inaccuracy user generated tags and comments?

Answer: Study done in the past comparing accuracy of official cataloging to comments – even if people make mistakes, but others will correct them.. LOC has a ‘hands off’ policy to not delete/change stuff unless it is defamatory or spam. Only 3 instances of this so far. LOC is citing the source as ‘Flickr commons’ and also include commenters’ sources – which are actually a lot more varied than you might expect (like the Jalopy Journal).

Question: Are you worried about an increase demand in staff time as you add more photos?

Answer: Yes.. there will be an increase in demand.. but the Flickr comments are there and since LOC is adding links back out to those records they are available for researchers even if they are not added to the original catalog record. Maybe they need more staff? depends on goals. Could work with expert teams and look for ‘formal trusted’ volunteers. A great example was the baseball history association who took photos and contributed expert information in a spreadsheet (if I heard correctly they gave LOC a spreadsheet identifying team, game, date and opponent for more than 3000 photos).

Question: Isn’t the link from the LOC catalog record to Flickr enough? Why update the LOC catalog records at all?

Answer: They are really only updating when it is a mistake (like Palser’s name mistake). Flickr also provides APIs and LOC pulls all the comments and tags into external database so that LOC can choose how to use the information over time.

Question: What are your thoughts and concerns about the longevity of Flickr as a platform?

Answer: What grows fast can die fast. Their perspective: Flickr is a copy.. and LOC has an extract of all the tags and comments – nothing lost if it disappears.

Question: Calipshere: how do they work with teachers to learn their needs and their satisfaction with the work that is done?

Answer: They hired Berkeley experts to talk to teachers about what they wanted. They used interviews and created personas to capture the audience needs. Targeting the K-12 audience was aimed at being a success by being clear about their audience. Teachers used to print out images, but now they do more with powerpoint and iPods plugged into TV in the classroom. The teachers say they are happy with the theme collections and they want more. They have an advisory board with teachers.. they use surveys and watch the bboards.

Question: Is there a crossover between Calishpere and OAC users?

Answer: They almost didn’t cross link to the finding aids from within Calisphere.. but they decided the information was so important. Reason for the upcoming blog – want to tell the story behind the photos.

Question: Do they have anlytics/evidence of pulling people back to their sites?

Answer: Yes.. they can see increases in usage from everything they have done.

Question: When you download the comments – are they dated so you can only look at the new ones? How hard was it to change the title in your catalog?

Answer: Everything is time/date stamped when you pull info out of Flickr. Quick and easy to update.. 10 minutes per picture to do the updates.. Flickr members are doing a great job with citations.

Question: Do you have advice about how to get historical society folks who are concerned about loosing the admission fee for people coming in to do research on board with these web 2.0 approaches?

Answer: You show them alternative revenue streams. In the museum world .. they realized that they weren’t making money from reproductions and a change is in process to let people use images for publishing.. all about improving the brand recognition. Helena: I would love ideas from people using Flickr.. and to hear from people who are dealing with multiple audiences.

Question: Have you had complaints? Any specifically from copyright holders?

Answer: Yes.. they have had complaints.. one “Why haven’t you cleaned up the photos?” LOC position is to provide the version they have.. and it is up to others to cleanup and do what they like with the photos. They also point out that instead of perfecting photos, they are spending money on providing access to more photos.

Question: Expectations of service. Are people expecting that if they ask a question about a photo that they will get an answer from a LOC representative?

Answer: Do you have to respond to everyone who asked to be a contact? No.. perhaps different expectations for institutions. They currently add a comment when they are updating the original catalog records. Might acknoledge big contributors (more than 10 photos) at the end of the pilot via a direct e-mail to individuals.

Question: Have people complained about rights – that is my grandmother.. don’t put it on the web?

Answer: No. They do have a policy in place. Most people are ‘pleased as punch’ to learn that their family heritage is alive and well. OAC: They haven’t had anyone ask to take the content down. In the case that people provide feedback for updates – since OAC is an aggregation of items from so many institutions – they have to pass corrections info along to original keeper of the metadata and leave it in their hands to do updates.

Question: Is there a fear that interest will decrease as more photos are added to the commons?

Answer: Bloggers in the web were in love with the idea that the photos would go into Flickr. There was a big peak at the start – but views and comments are still steady (but smaller) . The more additions.. more communities that will be touched. The Powerhouse Museum experienced a tripling of their traffic after posting images in the Flickr Commons.

Question: Have people come into the reading room because of the Flickr pilot?

Answer: Maybe? We don’t know. Lena said she did!

Question: Are we teaching the teachers how to teach with photos?

Answer: Calisphere has provided links to info about using primary sources and analysis tools.. resources for teachers. (Follow-up: Are they clicking those links? Good question!)

Question: Are you contacting the people who post negative comments?

Answer: Yes.. and most of them were more spam.

My Thoughts

Culture of Online Communities

There are a few different ideas I wanted to share related to the material from this presentation. First, I noticed that the online culture of both Flickr and Wikipedia were called out as having a clear impact. They are in fact two very different communities. In the case of the LOC and Flickr we heard that part of what seemed to keep the comments constructive and friendly was that Flickr’s users strive to keep a ‘play nice’ atmosphere in place. In contrast, we heard that Wikipedia was perceived as confusing and unpredictable when the CDL staff was updating pages to add links back to their primary sources. They never felt certain that the links they were working so hard to add wouldn’t be removed the next day.

These are just two examples of ways in which the archival community is beginning to bump into various online communities. We need to really understand the cultural rules for each of the communities in which we want to participate. Another excellent example of this was the revelation that LOC should only upload 50 new images a week into Flickr because of the way in which users view new images uploaded by their friends. It would be unfortunate for LOC to loose many of its Flickr friends because it overwhelmed their Flickr feeds with 1,000 images.

Personas: Targeting Real People

I was also very pleased to hear Lena discuss the creation of personas to define and target the audiences they want to serve. If you want to listen to a great presentation on personas – give a listen to the IA Summit 2008 presentation Data driven design research personas (2nd podcast down on the page) while going though the presentation slides up on slideshare. I promise it is a very accessible talk (ie, low on jargon and tech – high on real life examples) and very worth your time. It was one of the best sessions I saw at that conference.

Finding Images Without Words

While today it is generally true that people must use words to find images – someday people will be able to use images to find images. An example of this work in progress is an experimental service named retrievr. You can already use this tool to search for Flickr images either by uploading an image or by creating a sketch you want to match. Another interesting image search interface is found over on Xcavator.net. You pick a photo as your starting point – and then you can even trace a subsection of the image to be used for subsequent image matching. We are not there yet – but we will be someday. I can only image the number of Untitled images that will finally be found!

Vigilance

Your reward for reading this far is discovering my rationale for using the image I included at the top of this post. I think that many people are worried that we must be like the San Jose Vigilance Committee of 1906 – on our guard to stop people from stealing images from cultural heritage institutions when they are posted online. I would argue that the two projects described in this session show the benefits of a more open attitude. The Internet isn’t the wild west anymore. We should stop treating it that way. We don’t need Vigilance Committees online – we need ambassadors, interpreters and brave pioneers like Lena, Helena and the amazing teams of people who made the projects they described come to life.

Image credit: History San Jose Research Library via Calisphere.

As is the case with all my session summaries from SAA2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Freebase Parallax Search Interface: Exploring Olympic Games Facts

August 16, 2008 3 Comments

Well-formed data posted about a new Freebase project named Parallax. This new search interface takes faceted browsing another step – in this case making it easy to jump sideways from one dataset to another related dataset. Parallax still includes filters on the left side – but the twist comes from the opportunity to select what are called ‘Connections’ from the list in the upper right hand corner of the search results page.

This sort of thing makes the most sense when you can see examples. The creator of Parallax has published a great little video tour, but I also wanted to show you some neat data sets that were very easy to discover and embed in my blog. Since so many people are thinking about the Olympics right now, I thought I would start by exploring the Olympic Games Collection from Freebase. Below I have two data sets. On the left you will see a list of Olympic Games – and on the right you will see a list of Olympic event venues. (NOTE: to those reading this through a feed reader – you will likely have to click through to view the lists)

Now lets take a real sidestep and pull up a list of sports teams who use a former Olympic facility as a venue. This is the sort of question that you could figure out on your own, but it would be a pain in the neck to do by hand. See the list on the left below which took just as long to create as it took me to spot that Team (venue) was on the list of ‘more connections’ when my list of Olympic Venues was being displayed. The frame on the right below displays the one Olympic Venue that Freebase knows to have won an award (in this case the Structural Special Award).

Of course the lists above are only as good as the data behind them, but you can see how interesting it could be to use Parallax to explore connected information. Now take this idea to the world of archives and libraries, OPACs and finding aids and imagine the sorts of questions you can start asking. Yes – it does depend on the data being connected, but that is happening more and more all the time. The promise of the semantic web is structured data everywhere we turn.

Go play with Parallax. Look at Venture Funded Companies and then look at all the Games Developed by those companies. Examine the list of Bird Species and then see what schools have bird mascots… and THEN see a list of famous people who went to schools that have bird mascots.

Put in your own search from the Parallax homepage and play with the available connections. Map and timeline views are also available – though they only work if your data includes location and temporal data, respectively. If you find a great sequence of data sets – please share them!

Dipity: Easy Hosted Timelines

July 20, 2008 3 Comments

I discovered Dipity via the Reuters article An open-source timeline of the virtual world. The article discusses the creation of a Virtual Worlds Timeline on the Dipity website. Dipity lets anyone create an account and start building timelines. In the case of the Virtual Worlds Timeline, the creator chose to permit others to collaborate on the timeline. Dipity also provides four ways of viewing any timeline: a classic left to right scrolling view, a flipbook, a list and a map.

I chose to experiment by creating a timeline for Spellbound Blog. Dipity made this very easy – I just selected WordPress and provided my blog’s URL. This was supposed to grab my 20 most recent posts – but it seems to have taken 10 instead. I tried to provide a username/password so that Dipity could pull ‘more’ of my posts (they didn’t say how many – maybe all of them?). I couldn’t get it to work as of this writing – but if I figure it out you will see many more than 10 posts.

I particularly like the way they use the images I include in my posts in the various views. I also appreciate that you can read the full posts in-place without leaving the timeline interface. I assume this is because I publish my full articles to my RSS feed. It was also interesting to note that posts that mentioned a specific location put a marker on a map – both within the single post ‘event’ as well as the full map view.

Dipity also supports the streamlined addition of many other sources such as Flickr, Picasa, YouTube, Vimeo, Blogger, Tumblr, Pandora, Twitter and any RSS feed. They have also created some neat mashups. TimeTube uses your supplied phrase to query YouTube and generates a timeline based on the video creation dates. Tickr lets you generate an interactive timeline based on a keyword or user search of Flickr.

Why should archivists care? I always perk up anytime a new web service appears that makes it easy to present time and location sensitive information. I wrote a while ago about MIT’s SIMILE project and I like their Timeline software, but in some ways hosted services like Dipity throw the net wider. I particularly appreciate the opportunity for virtual collaboration that Dipity provides. Imagine if every online archives exhibit included a Dipity timeline? Dipity provides embed code for all the timelines. This means that it should be easy to both feature the timeline within an online exhibit and use the timeline as a way to attract a broader audience to your website.

There has been discussion in the past about creating custom GoogleMaps to show off archival records in a new and different way. During THATCamp there was a lot of enthusiasm for timelines and maps as being two of the most accessible types of visualizations. By anchoring information in time and/or location it gives people a way to approach new information in a predictable way.

Most of my initial thoughts about how archives could use Dipity related to individual collections and exhibits – but what if an archive created one of these timelines and added an entry for every one of their collections. The map could be used if individual collections were from a single location. The timeline could let users see at a glance what time periods were the focus of collections within that archives. A link could be provided in each entry pointing to the online finding aid for each collection or record group

Dipity is still in working out the kinks of some of their services, but if this sounds at all interesting I encourage you to go take a look at a few fun examples:

The 100 Most Influential Americans: The Atlantic recently asked ten historians to compose their own lists of the 100 most influential Americans.
Johnny Cash Recorded Appearances: Click on a few of these and you will see the amount of detail that has been added is amazing – video clips, map locations and set lists are included for most of these
Civil Rights Movement – apparently created by students in “Taft’s thrilling third period American history class at USM”

And finally I have embedded the Internet Memes timeline below to give you a feel of what this looks like. Try clicking on any of the events that include a little film icon at the bottom edge and see how you can view the video right in place:

Image Credit: I found and ‘borrowed’ the Dipity image above from Dipity’s About page.

Flickr Terms of Service, Unwritten Guidelines and Safety Levels

July 6, 2008 1 Comment

As more cultural heritage institutions add photos to Flickr, such as these sets added by the Smithsonian, an AP article discussing freedom of expression in online public spaces identifies some some issues that deserve attention. In ‘Public’ online spaces don’t carry speech, rights, Anick Jesdanun highlights a number of scenarios in which service providers (such as the Yahoo! owned Flickr) clash with their users, including this one (italics my own):

Dutch photographer Maarten Dors met the limits of free speech at Yahoo Inc.’s photo-sharing service, Flickr, when he posted an image of an early-adolescent boy with disheveled hair and a ragged T-shirt, staring blankly with a lit cigarette in his mouth.

Without prior notice, Yahoo deleted the photo on grounds it violated an unwritten ban on depicting children smoking. Dors eventually convinced a Yahoo manager that – far from promoting smoking – the photo had value as a statement on poverty and street life in Romania. Yet another employee deleted it again a few months later.

This image on Flickr gives more details about the photo being removed – and this is the reinstated photo in question. The article points out “Service providers write their own rules for users worldwide and set foreign policy when they cooperate with regimes like China. They serve as prosecutor, judge and jury in handling disputes behind closed doors.” It makes me wonder if the ‘unwritten guidelines’ are applied evenly across Flickr. With the creation of The Commons area, it would be easy to create two standards – one for the general public and another for ‘blessed’ institutions. Images that are acceptable from the Brooklyn Museum (consider this set of Behind The Scenes photos of the Ron Mueck exhibition) might not be accepted from the average person. In my research I discovered a set of Public Domain photos from the National Archives. Some of the photos included in this set are historically valuable images that I would not necessarily want a child to see. Does this mean they shouldn’t be on Flickr? I don’t think so, but that certainly isn’t up to me.

Here are the relevant passages of the Yahoo! Terms of Service:

You agree to not use the Service to:

upload, post, email, transmit or otherwise make available any Content that is unlawful, harmful, threatening, abusive, harassing, tortious, defamatory, vulgar, obscene, libelous, invasive of another’s privacy, hateful, or racially, ethnically or otherwise objectionable;

harm minors in any way;

You acknowledge that Yahoo! may or may not pre-screen Content, but that Yahoo! and its designees shall have the right (but not the obligation) in their sole discretion to pre-screen, refuse, or remove any Content that is available via the Service. Without limiting the foregoing, Yahoo! and its designees shall have the right to remove any Content that violates the TOS or is otherwise objectionable.

That bit about ‘otherwise objectionable’ could be used to cover removal of anything. Being subject to the terms of service of Internet service providers is nothing new, but as archives, libraries and other cultural heritage institutions look for ways to increase their revenue streams and explore innovative ways to bring more eyes to their materials it will become more import to understand these guidelines.

I understand (as the author of the article that inspired this post also points out) that Yahoo! is a business. Their priorities are not always going to be the same as those of the National Archives or the Brooklyn Museum. There are definitely images from history and the world of art that are only appropriate for adults, but isn’t that what Flickr’s content filter feature, named SafeSearch, is all about? These are the three ‘safety levels’ available on Flickr:

Safe – Content suitable for a global, public audience
Moderate – If you’re not sure whether your content is suitable for a global, public audience but you think that it doesn’t need to be restricted per se, this category is for you
Restricted – This is content you probably wouldn’t show to your mum, and definitely shouldn’t be seen by kids

It is interesting that Flickr has it’s own separate list of Community Guidelines, independent of Yahoo!’s terms of service. This is the passage from these guidelines about filtering content:

Take the opportunity to filter your content responsibly. If you would hesitate to show your photos or videos to a child, your mum, or Uncle Bob, that means it needs to be filtered. So, ask yourself that question as you upload your content and moderate accordingly. If you don’t, it’s likely that one of two things will happen. Your account will be reviewed then either moderated or terminated by Flickr staff.

I am still not sure what safety level I would use for a photo showing rows of dead in a concentration camp. I guess given the choices, ‘restricted’ is the best option – but that still doesn’t sit right with me somehow. I did an advanced Flickr search for ‘concentration camp’ with SafeSearch on – and those photos are not currently being marked as restricted. Who is it that we expect to be protecting using SafeSearch? From Flickr’s definition above it is supposed to at least be kids (and maybe your mom and Uncle Bob).

I think the question of the moment is how to know which images are appropriate to upload if some of the guidelines are unwritten. Flickr is a community and understanding the community is essential to success within that community. Once you believe your images are appropriate to include, then you must decide the right ‘safety level’. It is not clear to me how to tell the difference between an image that is not appropriate to be uploaded to Flickr and an image that is okay but needs to be marked with a safety level of ‘restricted’. I am very interested to see how this category of ‘appropriate but restricted’ evolves. For now, I am going to keep a watch on how the Flickr Commons grows and what range of content is included. The final answer for some of these images may be to only provide them via the institutions’ web sites rather than via service providers such as Flickr.

Image credit: Free Click by fikra (Sami Ben Gharbia) via Flickr

Clustering Data: Generating Organization from the Ground Up

May 14, 2008 2 Comments

My trip to the 2008 Information Architecture Summit (IA Summit) down in Miami has me thinking a lot about helping people find information. In this post I am going to examine clustering data.

Flickr Tag Clusters
Tag clusters are not new on Flickr – they were announced way back in August of 2005. The best way to understand tag clusters is to look at a few. Some of my favorites are the water clusters (shown in the image above). From this page you can view the reflection/nature/green cluster, the sky/lake/river cluster, the blue/beach/sun cluster or the sea/sand/waves cluster.

So what is going on here? Basically Flickr is analyzing groupings of tags assigned to Flickr images and identifying common clusters of tags. In our water example above – they found four different sets of tags that occurred together and distinctly apart from other sets of tags. The proof is in the pudding – the groupings make sense. They get at very subtle differences even though the mass of data being analyzed is from many different individuals with many different perspectives.

Tag clusters are very powerful and quite different from tag clouds. Tag clouds, by their nature, are a blunt instrument. They only show you the most popular tags. Take a look at the tag cloud for the Library of Congress photostream on Flickr. I do learn something from this. I get a sense of the broad brush topics, time periods and locations. But if you look at the full list of Library of Congress Flickr tags you see what a small percentage the top 150 really are (and yes.. that page does takes a while to load). Who else is now itching to ask Flickr to generate clusters within the LOC tag set?

Steve.Museum
Another example of cultural heritage images being tagged is the Steve Museum Art Museum Social Tagging Project which lets individuals tag objects from museums via Steve Tagger. It resembles the Library of Congress on Flickr project in that it includes existing metadata with each image and permits users to add any tags they deem appropriate. I think it would be fascinating to contrast the traffic of image taggers on Steve.Museum vs Flickr for a common set of images. Is it better to build a custom interface that users must seek out but where you have complete control over the user experience and collected data? Or is it better to put images in the already existing path of users familiar with tagging images? I have no answers of course. All I know is I wish I could see the tag clusters one could generate off the Steve.Museum tag database. Perhaps someday we will!

Del.icio.us Tags
del.icio.us related tags Del.icio.us, a web service for storing and tagging your bookmarks online, supports what they call ‘related tags’ and ‘tag bundles’. If you view the page for the tag ‘archives’ – you will see to the far right a list of related tags like those shown in the image here. What is interesting is that if I look at my own personal tag page for archives I see a much longer list of related tags (big surprise that I have a lot of links tagged archives!) and I am given the option of selecting additional tags to filter my list of links via a combination of tags.

Del.icio.us’s ‘tag bundles’ let me create my own named groupings of tags – but I must assemble these groups manually rather than have them generated or suggested. On the plus side, Del.icio.us is very open about publishing its data via APIs and therefore supporting third party tools. I think my favorite off that list for now has to be MySQLicious which mirrors your del.icio.us bookmarks into a MySQL database. Once those tags are in a database, all you need are the right queries to generate the clusters I want to see.

Clusty: Clustered Search Results
An example of what this might look like for search results can be seen via the search engine Clusty.com from the folks over at Vivisimo. For example – try a search on the term archives. This is one of those search terms for which general web searching is usually just infuriating. Clusty starts us with the same top 2 results as a search for archives on Google does, but it also gives us a list of clusters on the left sidebar. You can click on any of those clusters to filter the search results.

Those groups don’t look good to you? Click the ‘remix’ link in the upper right hand corner of the cluster list and you get a new list of clusters. In a blog post titled Introducing Clustering 2.0 Vivisimo CEO Raul Valdes-Perez explains what happens when you click remix:

With a single click, remix clustering answers the question: What other, subtler topics are there? It works by clustering again the same search results, but with an added input: ignore the topics that the user just saw. Typically, the user will then see new major topics that didn’t quite make the final cut at the last round, but may still be interesting.

I played for a while.. clicking remix over and over. It was as if it was slicing and dicing the facets for me – picking new common threads to highlight. I liked that I wasn’t stuck with what someone else thought was the right way to group things. It gave me the control to explore other groupings.

Ontology is Overrated
Clay Shirky’s talk Ontology is Overrated: Categories, Links and Tags from the spring of 2005 ties a lot of these ideas together in a way that makes a lot of sense to me. I highly recommend you go read it through – but I am going to give away the conclusion here:

It’s all dependent on human context. This is what we’re starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don’t recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we’re dealing with a significant break — by letting users tag URLs and then aggregating those tags, we’re going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

I currently spend my days working with controlled vocabularies for websites, so please don’t think I am suggesting we throw it all away. And yes, you do need a lot of information to reach the critical mass needed to support the generation of useful clusters. But there is something here that can have a real and positive impact on users of cultural heritage materials actually finding and exploring information. We can’t know how everyone will approach our records. We can’t know what aspects of them they will find interesting.

There Is No Box
Archivists already know that much of the value of records is in the picture they paint as a group. A group of records share a context and gives the individual records meaning. Librarians and catalogers have long lived in a world of shelves. A book must be assigned a single physical location. Much has been made (both in the Clay Shirky talk and elsewhere) that on the web there is no shelf.

What if we take the analogy a step further and say that for an online archives there is no box? Of course, just as with books, we still need our metadata telling us who created this record originally (and when and why and which record comes before it and after it) – but picture a world where a single record can be virtually grouped many times over. Computer programs are only going to get better at generating clusters, be they of user assigned tags or search results or other metdata. From where I sit, the opportunity for leveraging clustering to do interesting things with archival records seems very high indeed.

Of Pirates, Treasure Chests and Keys: Improving Access to Digitized Materials

April 23, 2008 1 Comment

Dan Cohen posted yesterday about what he calls The Pirate Problem. Basically the Pirate Problem can be summed up as “there are ways of acting and thinking that we can’t understand or anticipate.” Why is that a ‘Pirate Problem’? Because a pirate pub opened near his home and rather than folding shortly thereafter due to lack of interest from the ‘very serious professionals’ who populate DC suburbs – the pub was a rousing success due to the pirate aficionados who came out of the woodwork to sing sea shanties and drink grog. This surprising turn of events highlighted for him the fact that there are many ways of acting and thinking (some people even know all the words to sea shanties without needing sheet music).

Dan recently delivered the keynote speech at a workshop at the University of North Carolina at Chapel Hill. The workshop brought together dozens of historians to talk about how the 16 million archival documents of the Southern Historical Collection (SHC) should be put online. He devoted his keynote “to prodding the attendees into recognizing that the future of archives and research might not be like the past” and goes on in his post to explain:

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC.

Much of the stress of Dan’s article is on fear of new techniques of analysis. The choppy waters of text mining and pattern recognition threaten to wash away traditional methods of actually reading individual pages and “most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time”.

I certainly like the idea of new technologically based ways of analyzing large sets of cultural heritage materials, but I also believe that reading individual letters will always be important. The trick is finding the right letter!

And of course – we still need the context. It isn’t as if when we digitize major collections like the SHC that we are going to scan and OCR each page without regard to which box it came out of. We can’t slice and dice archival records and manuscripts into their component parts to feed into text analysis with no way back to the originals.

I like to imagine the combination of all the new technology (be it digitization, cross collection searching, text mining or pattern recognition) as creating keys to different treasure chests. Humanities scholars are treasure hunters. Some will find their gems through careful reading of individual passages. Others will discover patterns spread across materials now co-existing virtually that before digitization would have been widely separated by space and time. Both methods will benefit from the digitization of materials and the creation of innovative search and text analysis tools. Both still require an understanding of a material’s origin. The importance of context isn’t going anywhere – we still need to know which box the letter came from (and in a perfect world, which page came before and which came after). I want scholars to still be able to read one page from the box – I just want them to be able to do it from home in the middle of the night if they are so inclined with their travel budget no worse for wear.

Dan ties his post together by pointing out that:

… in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars.

In my opinion, the core message should be that we just found more locked treasure chests – and for those who are interested, we have some new keys that just might open those locks. I enjoyed the Pirate metaphor (obviously) and I appreciate that there are real issues here relating to strong discomfort with the fast changing landscape of technology, but I have to believe that if we do something that prevents historians from being able to read one letter at a time we are abandoning the treasure chests that are already open for the new ones for which we haven’t yet found the right keys. I am greedy. I want all the treasure!

Image credit: key to anything by Stoker Studios via flickr

SAA2008: PDFs of Conference Presentations

March 23, 2008

I found another reason recently to be excited about the progress of SAA’s online presence. Buried in the ARCHIVES 2008: Archival R/Evolution & Identities Checklist for Presenters is first tidbits of a plan to provide access to PDF versions of conference presentations on the SAA website.

Send an Electronic Copy of Your Presentation to SAA. The conference organizers would like to offer meeting attendees the opportunity to view presentations after the conference on the SAA 2008 Annual Meeting website (www.archivists.org). If you’ll supply a copy of your presentation, we’ll convert it to a PDF and post it. Please note that by sending SAA a copy of your presentation in electronic format, you grant permission for your presentation to be viewed by all SAA 2008 Annual Meeting attendees.

I am so pleased! I have always wanted access to the presentations – both for those sessions I attend and those I cannot. I have often been that person hovering at the edge of the stage after a panel, waiting to request a soft copy of the presentation.

I do wonder what they mean when they say that the presentations will be “viewable by meeting attendees”. In my heart of hearts I hope they go a step further and let the speakers sign off on these presentations being shared with the world (or at least with all of SAA). I haven’t gone through every Session Page on the SAA 2007 Un-Official Wiki, but I believe that not very many presenters took the opportunity to provide links to soft copies of their presentations. I hope that SAA is more successful on this front.

No matter the choices made relating to immediate access – I see this as a big step forward in the commitment to using technology. I think one of the best ways to learn is through getting your hands dirty. Technology is listed as one of SAA’s strategic priorities. Every choice that SAA makes that encourages their membership to become more tech-savvy is a step towards supporting that priority.

Big Digital Step For SAA: American Archivist Online

March 15, 2008

The Society of American Archivists has officially launched American Archivist Online (also available via the Members Only page once you login to archvists.org).

Here are a few key points that caught my eye from the FAQ :

Content is available as PDF files with embedded searchable text (one file per article or section of the journal)
It is hosted by MetaPress
The online version will be produced in parallel with the print version

What issues are online?

Fall/Winter 2000 (Volume 63 – Number 2) through the most recent issue – Fall/Winter 2007. The FAQ reports that additional back issues will be digitized over time.

How is it structured?

Each journal article is a separate PDF file. Talk about a boon to graduate students and archives professors everywhere! Even the front matter is there separated out – perfect for printing and attaching to your article printouts for future reference. Of course, if you are feeling green (and better at reading on screen than I am) you can bookmark them or save them locally for future reference.

Who can access it?

Officially, only members of SAA and individual or institutional subscribers to the journal can access all available issues. In reality, it appears most of the issues are available to everyone. Currently only the Fall/Winter issues of 2005, 2006 & 2007 restrict access to all the content. Even for these issues there is access to some of the articles – such as the Book Reviews section in both the 2005 and 2007 Fall/Winter issues.

The FAQ claims that non-subscribers must pay a fee to print an article – but I don’t see how they will enforce that. When viewing a PDF of an article from the most recent issue I was able to save it to my local desktop and print it without a problem. Not sure if that is a bug or how it will remain – or if maybe they are talking about official reprints that are sent through the mail?

Other features

Try the handy Article Category search links – like this one that shows all the Presidential Addresses.
Mark or save articles to your own private lists (if you are logged in)
Search the full text – either across the journal or within an individual issue.
Subscribe to the RSS feed (I spotted on the All Issues page). The feed includes the article abstract, category, author and source issue information. Be the first archivist on your block to know the instant the new issue is published online!

Final Thoughts

I think that everyone who heard President Adkins announce at SAA in Chicago that the American Archivist was going online was excited (well.. there was lots of clapping – that is for sure). That announcement was a strong indications to me of SAA’s commitment to improving their online offerings.

Finally seeing it available online is even better – action speaks louder than words.

Image Credit: SAA Logo from http://archivists.org/

Category: access

My Thoughts