WWW vs. Recordkeeping

    Richard E. Barry, Barry Associates, "Factoring Web Technologies into the Knowledge Management Equation...for the Record," keynote presentation to the Records Management Association of Australia, March 1999.


    Richard E. Barry, Barry Associates. Catching Up with the Last Technology Train at the Next Station. This paper is an update of one that originally appeared in the September 1996 issue of The Record, a publication of the U. S. National Archives and Records Administration.  It reflects significant changes in technology and in the use of technology since it was first written in the summer of 1996.  

    F. Boudrez, "<XML/> and electronic recordkeeping

    F. Boudrez and S. Van den Eynde, "Archiving websites" 

    Timo Burkard, "Herodotus: A Peer-to-Peer Web Archival System" submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY, May 2002, © Timo Burkard, MMII. All rights reserved. Like the Wayback Machine web archive, Herodotus periodically crawls the world wide web and stores copies of all downloaded web content. However, Herodotus does not rely on a centralized server farm. Rather, many individual nodes across the Internet collaboratively perform the task of crawling and storing the content, allowing a large group of contributors' idle computer resources to jointly achieve the goal of creating an Internet archive. Herodotus uses replication to ensure the persistence of data as nodes join and leave.

    Chandra Chekuri, Michael H.Goldwasser, Computer Science Department, Stanford University, Prabhakar Raghavan, Eli Upfal, IBM Almaden Research Center, "Web Search Using Automatic Classification," Currently available search tools suffer either from poor precision (i.e., too many irrelevant documents) or from poor recall (i.e., too little of the Web is covered by well-categorized directories). We address this by developing a search interface that relies on the automatic classification of Web pages. Our classification builds on the Yahoo! taxonomy, but differs in that it is automatic and thus capable of covering the whole Web substantially faster than the (human-generated) Yahoo! taxonomy.

    Chief Information Officers Council, "Securing Electronic Government," the report of the Council's Security, Privacy, and Critical Infrastructure Committee, January 19, 2001.

    Patricia Daukantas, What on Web merits saving?  Webmasters agree that not everything is archive-worthy. Exactly which parts of an agency’s Web site constitute federal records, subject to rules governing retention and disposition, depends on the agency in question.

    "Guidelines for State Government websites," Government of Western Australia Department of Industry and Technology (DoIT). Governments around the world have recognised the need for a consistent approach to online service delivery. DoIT has also recognised this need and has released a set of Guidelines for State Government Web Sites, (July 2002), which were approved by State Cabinet in June 2002.

    Jonathan Lazar,   Dr. Charles R. McClure and Dr. J. Timothy Sprehe. "Solving Electronic Records Management (ERM) Issues for Government Websites: Policies, Practices, and Strategies: Conference Report on Questionnaire and Participant Discussion, April 22, 1998.

    William LeFurgy, "Records and Archival Management of World Wide Web Sites," April 2001. By now, virtually all organizations have set up web sites to provide information and conduct business. As web sites grow, so does dependency on them for accountability, evidence, and other purposes that require recorded documentation. Organizations must take steps to manage content on web sites as information resources and, in some cases, as records. This is an enormous challenge. 

    Susan S. Lukesh"E-mail and Potentail Loss to Future Archives and Scholarship or The Dog that Didnt Bark,"  First Monday, Peer-Reviewed Journal on the Internet, Volume 4 Number 9 — September 6th 1999   pattern has emerged in starting presentations on the preservation of electronic materials: Disaster! In 1975, the U.S. Census Bureau discovered that only two computers on earth can still read the 1960 census. The computerized index to a million Vietnam War records was entered on a hybrid motion picture film carrier that cannot be read. The bulk of the National Aeronautics and Space Administration's research since 1958 is threatened because of poor storage. These tales are akin to Jorge Luis Borges's short story in which the knowledge of the world is concentrated in one mammoth computer - and the key is lost. The essential question for the Information Age may well be how to save the electronic memory (Stielow: 333)." 

    Charles R. McClure,  and  J. Timothy Sprehe, consultants to NHPRC. Final Report developed as part of an NHPRC grant project. "Analysis and Development of Model Quality Guidelines for Electronic Records Management on State and Federal Websites: Final Report.

    Charles R. McClure, and J. Timothy Sprehe, consultants to NHPRC. This accompanies the Final Report above, developed as part of an NHPRC grant project. "Guidelines for Electronic Records Management on State and Federal Agency Websites.

    Nigel McFarlane , "XML simply the best ,"  an excellent current status, forward and backward review of XML, Sydney Morning Herald, December 10 2002. 

    Sarah Mitchell, LOC to save data 'born digital',        February 21, 2003, on the Library of Congress plan for preserving Web sites, CDs, electronic journals and other digital information as part of the National Digital Information Infrastructure and Preservation Program and how archivists face the daunting task of figuring out just how to save information that was 'born digital.'" 

    Glyn Moody, "A New Dawn" in New Scientist, 30 May 1998, a lay article on HTML, XML, XSL, RDF and other emerging metalanguage standards.

    Public Records Office, UK. "Management of electronic records on websites and intranets: an ERM toolkit" (pdf) "Website management has often been seen as the preserve of IT specialists, press/communications functions and librarians. In government, it also needs rigorous records management input. This is a point that has often been overlooked. The primary intended audience for this toolkit is records managers in government, web project managers or IT and information managers with information and records management responsibilities. Some aspects may be of assistance to business managers. It assumes a reasonable level of general IT and information literacy but is not written from a technical IT perspective."

    Brian Robinson, "A voice from the near future,"  FCW  March 18, 2002, on recently developed VoiceXML specifications. "VoiceXML is a variation of Extensible Markup Language, which serves as something of a universal translator, tagging data so that different computers know how to process or present it. In the case of VoiceXML, it's a matter of translating dialogue between humans and computers."

    Thomas J. Ruller, "Open All Night: Using the Internet to Improve Access to Archives: A Case Study of the New York State Archives and Records Administration." This is an excellent commentary on how the access to archival assets (not only indices) can be of an enormous value to the client community but also to the archival organization making use of this technology.

    MacKenzie Smith, Associate Director for Technology
    MIT Libraries,
    "DSpace: An Open Source Dynamic Digital Repository." In March 2000, Hewlett-Packard Company (HP) awarded $1.8 million to the MIT Libraries for an 18-month collaboration to build a Durable Digital Depository, DSpace™, a dynamic repository for the intellectual output in digital formats of multi-disciplinary research organizations. As an open source system, DSpace is now freely available to other institutions to run as-is, or to modify and extend as they require to meet local needs. From the outset, HP and MIT designed the system to be run by institutions other than MIT, and to support federation among its adopters, in both the technical and the social sense. Links for downloading the free open source are located in the Special Resources section. 

    Smithsonian Institution Archives Website Archives Project. Below are three reports on the Smithsonian Institution Archives Website Archives Pilot Project. While designed to meet the dynamic needs of the Smithsonian Institution, the project has considerable relevance to most organizations facing the challenge of archiving enterprise websites.

    1) Charles Dollar, Dollar Consulting, "Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices," July 20, 2001. This paper has been substituted for Dollar's slide presentation on this subject to the 2001 Society of American Archivists Annual Conference, previously located here. It summarizes Dollar's white paper for the Smithsonian Institution on issues and options relating to the management of websites for both regulated and unregulated organizations in an emerging content management environment.

    2) Charles Dollar, Dollar Consulting, "Archival Preservation of Web Resources: HTML to XHTML Migration Test Considerations, Evaluatin, and Recommendations,"  July 1, 2002. "This report presents the results of a study undertaken by Dollar Consulting for the Smithsonian Institution Archives (SIA) as part of a larger effort to test and evaluate the feasibility of preserving Web sites and HTML pages in an accessible, usable and trustworthy form for as far into the future as is necessary." 


    3) Smithsonian Institution Archives Records Management Team, “Archiving Smithsonian Websites: An Evaluation and Recommendation for a Smithsonian Institution Archives Pilot Project,”   May 20, 2003. "Over the past eight years, the Smithsonian has greatly expanded its presence on the web, using its website to display virtual exhibits, expeditions, and field trips; provide primary and secondary research information and educational tools; and promote involvement in Smithsonian programs and commerce through business ventures, development, and museum shop sales. The presentation of this information and much of the information itself is often unique to the Smithsonian website...[It] is now apparent that historical documentation of such information will be lost if not captured electronically....The SIA Records Management (RM) Team reviewed the requirements outlined in Dollar Consulting's reports, and sought advice from Thomas J. Ruller (Independent Consultant for archivists and records managers working with records in electronic form), to evaluate the feasibility and requirements needed for implementing a project incorporating those recommendations." See Dollar's reports of July 20, 2001 and July 1, 2001.

    Adam K. Watts,  "XML Briefing for Managers" in Government Technology, August 2001, on how XML differ from HTML? Is it something new that will sweep away all the hard work on your portal, or can XML and HTML co-exist?

    WEB ARCHIVES ON LINE: Visit three web archive sites to get a feel for how this dynamic information is being captured on the World Wide Web. See Special Resources section. 

Back to Home Page