Menu Close

SAA 2006 Session 103: “X” Marks the Spot: Archiving GIS Databases – Part II

Richard Marciano of the SALT interdisciplinary lab (Sustainable Archives & Library Technologies) at the San Diego Supercomputer Center delivered a presentation titled “Research Issues Related to Preservation of Geospatial Electronic Records” – the 2nd topic in the ‘X’ Marks the Spot session.

He focuses on research Issues related to preservation of geospatial electronic records. While not an archivist, he is a member of SAA. As a person coming to archival studies with a strong background in software development, I took great comfort in his discussion of their being a great future for IT and archivists to work together on topics such as this.

Richard gave us a great overview of the most recent work being done in this field, along with a snapshot of the latest up and coming projects on the horizon. If I had to pick one main point to empasize, it would be that IT can provide the infrastructure to automate much of what is now being done by hand – but there is a long way to go to achieve this dream and it will require extensive collaboriation between Archivists (with the experience of how things should be done) and the IT community (with the technical expertise to build the systems needed). His presentation was definitely more organized than my laundry list below – please do not take my notes below as an indication of the flow of his talk.

NHPRC Electronic Records/GIS projects:

  • CIESIN www.ciesin.columbia.edu/ger at Columbia University
  • Maine GeoArchives www.maine.gov/geoarch/index.htm Maine State Archvies (see Part III of the Session 103 posts for details on the Maine GeoArchives)
  • eLegacy (State California & SDSC) – California’s geospacial records archival appraisal, accessioning and preservation. Starting in 2006
  • InterPARES Van MAP (2005) –presentation of the City of Vancouver GIS Database

More IT related projects:

  • Archivists’ Workbench (2000) www.sdsc.edu/NHPRCS Methodologies for the long-term preservation of and access to software-dependent electronic records. Includes tools for GIS
  • ICAP (2003) www.sdsc.edu/ICAP change management
  • PAT (2004) www.sdsc.edu/PAT persistent archives testbed and the Michigan precinct voting records, spacial data ingestion

SDSC has a goal of infrastructure independence – they want to keep data and move it easily over time. Their current preferred approach uses Data Grids (see American Archivist Journal , volume 69 – Number 1: Building Preservation Environments with Data Grid Technology by Reagan W. Moore) which depend on the dual goals of data virtualization and trust virtualization. He recommended the SAA Electronic Records Section on Friday from 12 to 2 for good related presentations.

CIESIN www.ciesin.columbia.edu/ger at Columbia University
Common types of data loss:

  • loss of non-archived data
  • historical versions of data

North Carolina Geospatial Data Archiving Project (www.lib.ncsu.edu/ncgdap) Steve Morris – Instead of solving problems, it actually further complications. Complex databases can be difficult to manage over time due to complex data models, challenges of proprietary database models… has MANY levels of individual datasets or data layers.

e-Legacy – working from the California State Archives
July 2006 – July 2008
The staff is a mix of California State Archives staff and members of SDSC. They are using data grid technology to build a distributed community grid. Distributed storage permits addition of storage arbitrarily and in multiple locations.
Infrastructure is being deployed across multiple offices and the SDSC.

InterPARES VanMAP (University of British Columbia)
A big city centralized enterprise GIS system
Question of case study: What are the records? Where are the records? What do they look like – from the point of view of the city users?
What infrastructure would you need to do a historical query – to see what the city would look like in a specific date in the past? Current enterprise systems are meant to be a snapshot of the present with nothing in place to support storage of past records.

How did they approach this? They got representative data sets. Put all the historical data layers into a ‘dark archive’ repository. Built proof of concept.. put in date request – correct layers are brought back from the archive system and on the fly they are rendered to show the closest version of the historical map possible.

There is a list of 30 or so questions that is part of evaluating the system.

ICAP: preserving and using temporal and multi-versions of records
Keep track of versions of records. Being aware of a timeline of records and being able to ask significant historical questions of those records.

Took multiple time slices – and automatically create an XML database using the records from the time slices of data. XML database and spatial querying

PAT Testbed
Creating a joint consortium model for managing records across state boundaries. Distributed framework with local ‘Grid Block’ at each location. Local Storage Resources manage and populate their local resources.
Goal: how do we automate archival processes

Michigan Department of Community – preserving and accessing Michigan Historical voting records. Created a MySQL database with the records. Did automatic scrubbing and validation of records based on rules. Due to the use of GIS it permits viewing maps with data shown – red/blue voting statistics by county. Viewer permits looking at maps by election year.

In response to a question, he talked about a project to take 401 Certification permits (related to water) – aspect of the PAT project that looked at this.. digitized all the historical records within a watershed. Delivered it back to the state agency. Integrating all the government processes – to permit them to ask good questions about the permits and the related locations (upstream or downstream).

Posted in access, GIS, SAA2006