This paper comprises the Table of Contents, Preface, Introduction and Endnotes parts of a book by Charles Dollar. The book may be purchased through the publisher, Cohasset Associates. This summary is published here with the kind permission of the author and Cohasset Associates.
AUTHENTIC ELECTRONIC RECORDS: STRATEGIES FOR LONG-TERM ACCESS
by Charles M. Dollar,
University of British Columbia, Vancouver.
June, 1998
TABLE OF CONTENTS
PREFACE
INTRODUCTION
1.0 CONCEPTUAL FOUNDATIONS
2.0 OPTIONS AND ALTERNATIVES FOR ACCESS OVER TIME TO AUTHENTIC ELECTRONIC RECORDS
3.0 BEST PRACTICES, RECOMMENDATIONS, AND GUIDELINES
4.0 AGENDA FOR ACTION
ENDNOTES
BIBLIOGRAPHY
APPENDIX 1: TECHNOLOY PRIMER FOR ARCHIVISTS AND RECORDS MANAGERS: RECORD REPRESENTATION, STORAGE, RETRIEVAL, AND PORTABILITY
APPENDIX 2: NORDIC COUNCIL MEDIA SELECTION STUDY
APPENDIX 3: MEDIA STORAGE COSTS, NATIONAL ARCHIVES OF CANADA
APPENDIX 4: NATIONAL ARCHIVES OF THE UNITED STATES ELECTRONIC RECORDS PRESERVATION PROGRAM COSTS
APPENDIX 5: NATIONAL MEDIA LABORATORY MEDIA LIFE EXPECTANCY DISPOSITION CHARTS
APPENDIX 6: ARCHIVAL PRESERVATION SYSTEM
PREFACE
This study began in July of 1994 when the Image Technology Committee of the International Council on Archives (ICA) adopted a work initiative calling for an issues paper about the impact of information technology obsolescence on access to electronic records, especially those in a digital image format. I agreed to take on this task for several reasons.
First, I had a long-standing interest in information technology obsolescence that dated to the 1970s when I was the Director of the Machine-Readable Archives Division of the National Archives and Records Administration and to the 1980s and early 1990s when I was a member of the Archival Research and Evaluation Staff of the United States National Archives and Records Administration (1983 - 1994) and explored information technology standards and optical media technologies. Second, a number of archivists and records managers were beginning to ask for practical guidance about how to ensure long-term access to electronic records, and the time seemed right for a systematic and comprehensive approach to the problem.
In November of 1994 the Social Sciences and Humanities Research Council of Canada awarded me a small grant to examine the impact of information technology on access to electronic records with a Canadian viewpoint. With this funding, and support from the School of Library, Archival and Information Studies of the University of British Columbia, the project began. As the project unfolded, however, it became clear that information technology obsolescence and electronic records should be addressed from an international perspective and should not be confined to digital images. This was the focus of a status report on the project that I gave to the Image Technology Committee at its 1995 meeting in Seville, Spain, and a paper I delivered in June 1995 at the Annual Meeting of the Association of Canadian Archivists in Regina, Saskatchewan.
In the summer of 1995 the Italian National Archives and the University of Macerata agreed to support a meeting of experts to exchange views about critical issues in the preservation of electronic records and to consider recommendations and guidelines for storage repositories whose mandate is to ensure long-term access to electronic records. The plan for the experts meeting called for me to expand the status report I had prepared for the Image Technology Committee into an issues paper that would frame the discussion during the meeting. Funding constraints meant that the number of experts participating in the meeting would have to be limited to eleven individuals. After consultation with the National Archives of Italy and the University of Macerata the eleven individuals listed below were invited to participate in the experts meeting.
These individuals were invited to participate in the conference because of their special expertise, knowledge, or experience. The information technology perspective was broadly represented by Barbara von Halle, P. C. Hariharan, Seamus Ross, and John W.C. Van Bogart. The archives and records management communities were represented by Maria Guercio, Margaret Hedstrom, Pia Maria Mariani, Greg O’Shea, Deborah Skaggs, and Robert Williams. Each participant was requested to review the issues paper and to identify those areas where issues were not adequately explored, to identify new areas that should be covered, and to comment in detail on certain assigned topics.
During the course of the meeting, the issues paper provided the frame of reference for the participants to explore a wide range of issues that affect long-term access to electronic records. Among the issues discussed were diplomatics and archival science, definitions of copying, reformatting, and migrating electronic records, current electronic information practices and projected trends, the role of standards, database architecture, database architecture, the selection of electronic storage media, and guidelines and recommendations that would be generalizable to a variety of storage repositories in both the public and private sectors that provide long-term access to electronic records. Participants agreed that a publication that pulled this discussion together would be very helpful at the national and international levels.
Over the next y ear or so I drafted a report that attempted to reflect the views and general conclusions participants had reached. During this same period of time, several participants revised the draft studies that they had prepared for the meeting and subsequent discussion and presented them as conference papers. I continued to refine my own views on long-term access to electronic records and in so doing changed my views on several critical points. In this regard the book by Michael Brodie and Michael Stonebreak entitled Migrating Legacy Systems: Gateways, Interfaces, & the Incremental Approach (1995) greatly influenced my views on migration. I am indebted to Barbara von Halle for bringing the Brodie and Stonebreaker book to my attention.
The text of this report clearly is rooted in the discussions and exchanges that occurred in the experts meeting. However, as noted above my own views have matured and the report reflects this. The report also incorporates research findings that were not available at the time of the experts meeting. Nonetheless, the overall focus of the report to further an understanding of key issues associated with providing long-term access to electronic records is consistent with the original purpose of the experts meeting.
I am extremely grateful to the experts for their participation in the meeting and for their assistance in helping to establish both the intellectual and the technological framework of this study. The draft of the report was circulated to them for their comments and suggestions, but this is not to say that the report reflects the viewpoints of each participant. In the final analysis, I am accountable for any errors of fact or misinterpretation.
Readers familiar with the growing body of literature on electronic records in general and the preservation of electronic records in particular will recognize how much this study draws upon previous work. The various sources cited in the report bear witness to this indebtedness. I am particularly grateful to Bruce Ambacher (U.S. National Archives and Records Administration), Ulf Andersson (ASTRA Company), Wolf Buchman (Bundesarchiv of the Federal Republic of Germany), Borje Justrell (National Archives of Sweden), John McDonald (National Archives of Canada), Aaron D. Hagler (Muller Media Conversion, Inc.), and Bruce Walton (National Archives of Canada) for providing me with detailed information about specific topics. I am pleased to acknowledge the support of Ken Haycock, Director of the School of Library, Archival and Information Studies at the University of British Columbia. In addition, I wish to thank a number of individuals who read the report in draft form and provided useful comments and suggestions. They are Marie Allen (National Archives of the United States), Bruce Ambacher (National Archives of the United States), Ulf Andersson (ASTRA Corporation, Stockholm, Sweden), Rick Barry (Barry Associates), Brant Bady (British Columbia Information Management Services), Mikael Dahlin (ASTRA Corporation, Stockholm, Sweden), Mark Giguere (National Archives of the United States), Susan Hamman (Boeing Company), James Henderson (Maine State Archives), Greg Hunter (Long Island University), Eric Ketelaar (National Archives of the Netherlands and Universities of Leiden and Amsterdam), John McDonald (National Archives of Canada), Larry McCrank (ITT Technical Institute), Keith Parrot (Australian National Archives), Doug Taylor-Munro (National Archives of Canada), Bruce Walton (National Archives of Canada), and Amelia Winstead (Alabama Department of Archives and History), The thoughtful and insightful comments and suggestions they provided saved me from many errors and confusion. Whatever faults remain are entirely my own.
Finally, there are three persons whose support and assistance were invaluable and merit special recognition. First, Pia Maria Mariani of the National Archives of Italy and Professor Oddo Bucci of the University of Macerata were responsible for organizing the conference and securing the funding for it. In particular, without the support of Pia Maria Mariani the meeting would never have occurred. Professor Bucci extended the warm hospitality of the University of Macerata to the participants and ensured that the logistical arrangements were first-rate. The third person whose support and assistance were invaluable is my wife, Deborah Skaggs. She took on the formidable task of reading very carefully the entire manuscript several times. Her probing questions and suggestions invariably greatly improved the readability and accuracy of the report. Only she and I know the full extent of her contribution.
INTRODUCTION
Future historians are likely to view the last three decades of the twentieth century as a watershed where the convergence of digital technologies reshaped the information landscape and thereby fundamentally altered how people communicate, create, retrieve, use, and view information. This convergence is particularly evident in the telecommunication industries where audio, traditional print, still pictures, motion pictures, and telephone signals increasingly are being stored and retrieved in a common digital base.
The traditional distinction between information objects such as letters, books, audio recordings, maps, photographs, movies, video, and telephony based upon the means of transmission or carrier of the information that has supported separate technologies, disciplines, professions, and industries is being eroded. The magnitude of this transformation and its long-term implications for society are barely recognized, much less understood, although many contemporary observers believe that the transformation is similar to what happened with the introduction of writing three millennia ago.
Although few contemporaries fully understand the magnitude of this transformation, several generalizations can be offered. First, every indication is that reliance on digital information will increase in virtually every segment of society but especially in businesses and government. One indication of this are the estimates for the volume of information in digital form. One estimate asserts that the volume of information in digital form is increasing between twenty and fifty percent annually, and that by the year 2000 between 600 and 1,000 Petabytes (PB) will have been accumulated. This is the equivalent of information conveyed in print contained in thirty-six billion to sixty billion 500 page books. Even assuming that this estimate is off by a factor of one hundred, the volume of information in digital form is enormous and will continue to grow, giving rise to an environment of digital information and requiring an infrastructure to support it. It is likely, therefore, that the penetration of digital technologies into the fabric of society and life of individuals will exceed that of the telephone. Ironically, such ubiquity will be complete when we view and can use computers and digital technologies as easily as we use telephones and telecommunications.
This observation leads to the challenge of ensuring on-going access to digital material as digital technology changes. Recently, computer scientist Jeff Rothenberg addressed this question of "Ensuring the Longevity of Digital Objects," in the January 1995 issue of Scientific American. Rothenberg argued that the limited life expectancy of storage media for digital information along with the inevitable obsolescence of hardware and software means that the current generation of digital documents is in jeopardy of being lost to future generations because these documents will become unreadable. The most effective strategy, he asserted, is to transfer the bit stream of digital documents to new media as necessary and to encapsulate these bit streams with specifications about the software used to create and use the digital records. The ubiquity of software today, he suggests, makes it likely that the software itself and its specifications will be widely available in the future. Hardware emulators ("programs that mimic the behavior of computers"), could be developed that would run the obsolete software and encapsulated bit streams of the digital documents and then display them as they were originally viewed by their creators and users. Absent a systematic and significant effort to develop tools and techniques that substantially mitigate the consequences of limited media life expectancy and hardware and software obsolescence, "… we risk substantial practical loss, as well as the condemnation of our progeny for thoughtlessly consigning to oblivion a unique historical legacy."
At the time of Rothenberg’s article, several major projects were underway to identify critical issues associated with electronic records in general, or to address the question of how to mitigate the effects of technological obsolescence and "ensure technological compatibility, flexibility, and migratability"? These projects, which are reviewed in some detail in chapter 1 in order to establish the conceptual foundations for this study, contribute significantly to our understanding of key challenges that electronic records pose for storage repositories with the responsibility for providing long-term access to them. None of these projects or studies, however, seems to address all of the relevant issues and challenges involving digital technology obsolescence and long-term access to electronic records. Therefore, a more comprehensive study is in order.
Six primary considerations shaped the scope of this current study. The first is a focus on electronic records no longer required for use in an operational environment that have been set aside for future use. Although the study addresses the conditions that give rise to reliable and trustworthy electronic records, it is assumed that these conditions prevailed at the moment of their creation and existed during their use in a production or operational environment. Therefore, a full consideration of a generic process map for the entire continuum of electronic records is beyond the scope of this study.
Secondly, this study does not view archival description as a primary means of preserving the integrity of electronic records by "freezing" them in time in relation to other electronic records as Luciana Duranti and Heather MacNeil suggest. Instead, this study views the preservation of the context of creation, use, and transmission of individual electronic records as the most effective way of ensuring their integrity.
Thirdly, this study delineates an access strategy that differentiates between the maintenance of the processability of electronic records and their migration.
Processable electronic records means that they can be read and correctly interpreted by current computer hardware and software and they can be easily transferred to a new technology platform using an import/export software functionality. Maintaining the processability of electronic records involves reformatting, copying, and conversion activities. In contrast, migration involves the transfer of electronic records that can only be read and correctly interpreted by legacy computer hardware and software to a new technology platform. This transfer requires the design of gateways from the legacy system to the new technology platform and writing special purpose code or programs to transfer the records and the software functionality. Typically, the migration of electronic records involves a number of complex issues, is very costly, and requires more time to complete than is projected.
Implicit in this differentiation between maintaining the processability of authentic electronic records and the migration of authentic electronic records are two key factors. Because migration is so complex, difficult, and costly, the term should be employed with a narrow and precise meaning. The other factor is that too much attention has been devoted to ensuring access to electronic records fifty or one hundred years from when we have no way of forecasting what kinds of technology will be available then. Instead, we should do two things: first, focus on a much shorter time frame, perhaps on the order of ten to twenty years or so, during which time information technologies are likely to be relatively stable; Second, we should ensure that the way we use digital technologies to support access to electronic records over time minimizes the likelihood of creating intractable problems for their future custodians and users.
The fourth issue involves the question of the custodianship of electronic records. This report recognizes that the costs of maintaining processable electronic records and then migrating them to new technologies are likely to be substantial and exceed the human and financial resources available to many storage repositories. Consequently, new organizations that are not traditional centralized storage repositories may need to be created to provide this service. One possible model is outsourcing as exemplified by the Northeast Document Conservation Center, which is a regional preservation facility that provides preservation services on a cost recovery basis to archives that do not have the financial resources to support a preservation staff and conservation lab. This study, therefore, employs the concept of a storage repository that is a trusted third party whose mission is to store inactive records and adhere to best archival practices to protect the records from corruption, alteration, or loss. Best archival practices means, among other things, creating a wall between the records and those individuals and organizational units who created, used, and maintained the records while the records were operational. A trusted third party can be any designated organizational unit, including the organization’s archives, a public archives, or local or regional a service bureau. References in the text to a storage repository should be understood as referring to a trusted third party.
The fifth consideration is an emphasis on technical issues and problems associated with ensuring long-term access to authentic electronic records, especially non-proprietary information technology standards, and practical guidelines for media selection and storage. Non-proprietary information technology standards are especially important because they help support open systems, applications connectivity, and document portability, which in the long run may significantly enhance the prospects of long-term access to electronic records. Only those international or non-proprietary technology standards having a substantial market place implementation merit consideration for inclusion in the standards recommended for incorporation into a long-term access strategy for electronic records. Vendors have a vested interest in retaining their customer base and therefore are likely to develop software products that are compatible with existing ones. This requirement precludes, for example, general consideration of Abstract System Notation 1 (ASN.1), in a long-term access strategy for electronic records.
The sixth consideration involves a focus on products, tools, and techniques that have an established commercial presence. This excludes research and laboratory products with great potential, such as High Density Read Only Memory (HD-ROM), that have not been established as viable commercial products. In the future HD-ROM along with other products may in fact become huge commercial successes but it is imprudent at this time to base a long-term access strategy on unproven products, tools, and techniques.
These issues and problems are covered in four chapters. Chapter 1 lays out the conceptual foundations of the study, including a review of six research projects and a discussion from an Archival science perspective of nine concepts—document, record, authenticity, archive, copy, reformat, convert, migrate, and technology obsolescence. The concept of migrate is especially important because it is viewed as part of an access strategy that addresses a specific facet of digital technology obsolescence rather than as an macro strategy that the Task Force Report on Archiving Digital Information advocates.
Chapter 2 discusses an access strategy for electronic records over time but with the proviso that no "one-size fits all" strategy will accommodate all formats of digital materials or all circumstances under which access can be supported. Therefore, this chapter reviews alternative approaches from which an organization may select the methods appropriate for its requirements and resources, which are as effective as possible for the particular formats of materials under consideration. The context for this review is a set of more particular issues taking into account the concepts and general issues reviewed in chapter 1. Many of the general observations in this chapter are rephrased in chapter 3 in the context of best practices. Chapter 2 concludes with a long-term access logical process model and a summary of data standards that support access for a variety of formats. The logical process model is especially useful because it summarizes the functions and activities that can help achieve the ultimate objective -- providing long-term access to authentic electronic records.
Best practices and recommended guidelines are the focus of chapter 3. Many of the "best practices" draws upon the experience of several national archives in mounting electronic records programs. The chapter offers a number of recommendations and guidelines for organizations in formulating how they will ensure long-term access to authentic electronic records. These recommendations and guidelines are intended to be generalizable and applicable to a wide variety of organizational settings and circumstances.
Chapter 4 offers a brief action agenda that identifies areas where additional research and study are needed that will further our understanding of how to manage effectively the preservation of electronic records. Included in this agenda for action are low cost environmental control environments, auditing procedures, the Open Archival Information System (OAIS), and a continuing education program for archivists, records managers, and other information professionals.
The study concludes with six appendices, the first of which is an information technology primer for archivists and records managers interested in detailed explanations of certain digital technology issues or who may find it useful as a reference source to terms used in the body of the study. The primer examines technical problems of electronic records in five contexts: (1) data representation of records, (2) the structure of records, (3) the storage of records, (4) and the portability of records. This primer also reviews a number of technology issues associated with each of the problems listed above, especially the identification of international, national, and industry standards to minimize certain impediments to long-term access to electronic records. The primer may be used as a stand-alone document or it may be used as a reference source. Explanations of terms and concepts in the text that are in bolded italics can be found in this primer.
Appendices 2 through 6 contain information that elaborates upon key points made in the body of the study. Appendix 2 consists of an excerpt from a study conducted by the Nordic Council that summarizes the criteria and evaluation of selected digital storage media along with recommendations for storage media for electronic records. Appendix 3 is an Executive Summary and cost data in Canadian dollars for various electronic records storage media taken from a study prepared by the National Archives of Canada. Cost data for the preservation of electronic records derived from the experience of the National Archives of the United States comprise Appendix 4. Appendix 5 consists of a series of media disposition life expectancy charts based upon studies conducted by the National Media Laboratory. Appendix 6 incorporates excerpts from the system manual for the Archival Preservation System, a software package in use at the U.S. National Archives and Records Administration.
ENDNOTES
Janet H. Murray offers a provocative perspective in her book, Hamlet on the Holodeck: The Future of Narrative in Cyberspace (New York: Simon & Schuster, 1997).
Linda Kempster and Mark Kempster, Advanced Storage Requirements and Capabilities. A White Paper Prepared for the Association for Information and Image Management International (Silver Spring, Md.: Association of Information and Image Management, 1997): 2. I am grateful to Mark Kemstper for sharing his insights regarding digital technology trends. His paper on "Storage Trends into the 21st Century" [Electronic file] is accessible on http://www2.ari.net/thic/.
This estimate is based upon the following: one 500 page book is equal to about 1.5 million bytes. A Gigabyte is equal to 600 500 page books while a Terabyte is roughly equivalent to 60,000 500 page books. A single Petabyte, which is 1,000 Terabytes, is the informational equivalent of 60,000,000 500 page books.
Rothenberg has a revised version of this paper that may be obtained from him. He can be reached by e-mail at jeff@rand.org.
Rothenberg, "Ensuring the Longevity of Digital Information," Scientific American (January 23, 1995): 15.
Ibid.: 17.
Richard Kesner, "Teaching Archivists About Technology Concepts: A Needs Assessment," American Archivist, 56 (Summer 1993): 435.
I am indebted to Ulf Andersson of the ASTRA Corporation for sharing with me a preliminary report on this topic prepared by a work group of the European Union DLM-Forum.
See Luciana Duranti and Heather MacNeil, "The Protection of the Integrity of Electronic Records; An Overview of the UBC-MAS Research Project," Archivaria 42 (Fall 1996): 57. Heather McNeil argues that archival arrangement and description also are "a means of preserving, perpetuating, and authenticating the network of documentary and administrative relationships that have shaped the records over time." See her article, "Implications of the UBC Research Results for Archival Description in General and the Canadian Rules for Archival Description in Particular, " Archivi & Computer VI (3,4, 1996): 245.
Much of the subsequent discussion of maintaining processability and migrating electronic records is foreshadowed in Charles Dollar and Robert Williams, "A New Strategy for Migrating Long-Term Electronic Records: Meeting Operational Needs with Less Risk at Lower Cost," in Proceedings of the 1995 Managing Electronic Records Conference (Chicago: Cohasset Associates, 1995) and Charles Dollar, "Electronic memory and the Redefinition of Preservation" in Gregoria M. Morelli (ed.) L’Eclisse Della Memoria (Laterza and Figli: Gius, 1994).
This definition has its roots in a seminar I gave at OCLC in Columbus, Ohio in 1986 when I defined preservation of electronic records as maintaining access over time and in "continuous processing," a term that Douglas van Houweling of the University of Michigan proposed in 1990.
It is conceivable, of course, that technology then will be vastly "smarter" than today’s technology and therefore problems that are now intractable will be dealt with routinely.
Information technologists such as Mark Kempster argue that technology is changing so rapidly that a four to five year time frame is the maximum life of any digital technology. So far as archives and records management programs are concerned, my sense is that there is considerable more stability in technology applications at a practical level. This point is elaborated upon in chapter 1.
ASN.1 is an international standard that defines a high level syntax (i.e., a data definition language) that is used primarily for describing protocols and information that are intended to be exchanged between systems through a gateway or interface.
Nicholas Maftei, a consultant in archival automation based in Vancouver, British Columbia, makes the same point in "Technology and Standards for Archives Automation: A Status Report" (November 1997): 5.
Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, Commissioned by the Commission on Preservation and Access and the Research Libraries Group, INC (Washington: Commission on Preservation and Access, May 1996), 5.