WWW vs. Recordkeeping
Richard E.
Barry, Barry Associates,
"Factoring
Web Technologies into the Knowledge Management Equation...for the Record,"
keynote presentation to the Records Management Association of Australia, March
1999.
Richard E. Barry, Barry
Associates. Catching
Up with the Last Technology Train at the Next Station. This
paper is an update of one that originally appeared in the September 1996 issue
of The Record, a publication of the U. S. National Archives and Records
Administration. It reflects
significant changes in technology and in the use of technology since it was
first written in the summer of 1996.
F. Boudrez, "<XML/>
and electronic recordkeeping"
F. Boudrez and S. Van
den Eynde, "Archiving
websites"
Timo Burkard, "Herodotus:
A Peer-to-Peer Web Archival System"
submitted to the Department of
Electrical Engineering and Computer Science in partial fulfillment of the
requirements for the degree of Master of Engineering in Electrical Engineering
and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY, May 2002, ©
Timo Burkard, MMII. All rights reserved. Like
the Wayback Machine web archive, Herodotus periodically crawls the world wide
web and stores copies of all downloaded web content. However, Herodotus does not
rely on a centralized server farm. Rather, many individual nodes across the
Internet collaboratively perform the task of crawling and storing the content,
allowing a large group of contributors' idle computer resources to jointly
achieve the goal of creating an Internet archive. Herodotus uses replication to
ensure the persistence of data as nodes join and leave.
Chandra
Chekuri, Michael
H.Goldwasser, Computer Science
Department, Stanford University, Prabhakar
Raghavan, Eli Upfal,
IBM Almaden
Research Center, "Web
Search Using Automatic Classification," Currently available search
tools suffer either from poor precision (i.e., too many irrelevant
documents) or from poor recall (i.e., too little of the Web is covered
by well-categorized directories). We address this by developing a search
interface that relies on the automatic classification of Web pages. Our
classification builds on the Yahoo! taxonomy,
but differs in that it is automatic and thus capable of covering the whole Web
substantially faster than the (human-generated) Yahoo! taxonomy.
Chief Information Officers Council,
"Securing Electronic
Government," the report of the Council's Security, Privacy, and Critical Infrastructure Committee, January 19, 2001.
Patricia Daukantas, What
on Web merits saving? Webmasters agree
that not everything is archive-worthy. Exactly which parts of an
agency’s Web site constitute federal records, subject to rules governing
retention and disposition, depends on the agency in question.
"Guidelines
for State Government websites," Government of Western
Australia Department of Industry and Technology (DoIT). Governments around the
world have recognised the need for a consistent approach to online service
delivery. DoIT has also recognised this need and has released a set of Guidelines
for State Government Web Sites, (July 2002), which were approved by State
Cabinet in June 2002.
Jonathan Lazar,
Dr. Charles R. McClure and Dr. J. Timothy Sprehe. "Solving Electronic Records Management
(ERM) Issues for Government
Websites: Policies, Practices, and Strategies: Conference Report on
Questionnaire and Participant Discussion, April 22, 1998."
William
LeFurgy, "Records
and Archival Management of World Wide Web Sites," April 2001. By
now, virtually all organizations have set up web sites to provide information
and conduct business. As web sites grow, so does dependency on them for
accountability, evidence, and other purposes that require recorded
documentation. Organizations must take steps to manage content on web sites as
information resources and, in some cases, as records. This is an enormous
challenge.
Susan
S. Lukesh, "E-mail
and Potentail Loss to Future Archives and Scholarship or The Dog that Didnt Bark,"
First Monday, Peer-Reviewed Journal on
the Internet, Volume 4 Number 9 — September 6th 1999 pattern
has emerged in starting presentations on the preservation of electronic
materials: Disaster! In 1975, the U.S. Census Bureau discovered that only two
computers on earth can still read the 1960 census. The computerized index to a
million Vietnam War records was entered on a hybrid motion picture film carrier
that cannot be read. The bulk of the National Aeronautics and Space
Administration's research since 1958 is threatened because of poor storage.
These tales are akin to Jorge Luis Borges's short story in which the knowledge
of the world is concentrated in one mammoth computer - and the key is lost. The essential question for the Information Age may well
be how to save the electronic memory (Stielow: 333)."
Charles R. McClure,
and J. Timothy Sprehe, consultants to NHPRC. Final Report developed as
part of an NHPRC grant project. "Analysis and Development of Model Quality Guidelines for Electronic
Records Management on State and Federal Websites: Final Report."
Charles R. McClure, and J. Timothy Sprehe, consultants to NHPRC. This accompanies the Final
Report above, developed as part of an NHPRC grant project. "Guidelines for Electronic Records Management on State and Federal Agency
Websites."
Nigel
McFarlane
, "XML
simply the best
," an excellent current status, forward and backward review
of XML, Sydney Morning Herald, December 10
2002.
Sarah
Mitchell, LOC
to save data 'born digital',
February 21, 2003, on the Library of Congress plan for preserving Web
sites, CDs, electronic journals and other digital information as part of the National
Digital Information Infrastructure and Preservation Program and how archivists face the daunting task of
figuring out just how to save information that was 'born digital.'"
Glyn Moody, "A New
Dawn" in New Scientist, 30 May 1998, a lay article on
HTML, XML, XSL, RDF and other emerging metalanguage standards.
Public
Records Office, UK. "Management
of electronic records on websites and intranets: an ERM toolkit" (pdf)
"Website management has often been
seen as the preserve of IT specialists, press/communications functions and
librarians. In government, it also needs rigorous records management input. This
is a point that has often been overlooked. The primary intended audience for
this toolkit is records managers in government, web project managers or IT and
information managers with information and records management responsibilities.
Some aspects may be of assistance to business managers. It assumes a reasonable
level of general IT and information literacy but is not written from a technical
IT perspective."
A
voice from the near future,"
March 18, 2002, on recently developed VoiceXML specifications. "VoiceXML
is a variation of Extensible Markup Language, which serves as something of a
universal translator, tagging data so that different computers know how to
process or present it. In the case of VoiceXML, it's a matter of translating
dialogue between humans and computers."
Thomas
J. Ruller, "Open
All Night: Using the Internet to Improve Access to Archives: A Case Study of
the New York State Archives and Records Administration." This is an
excellent commentary on how the access to archival assets (not only indices) can
be of an enormous value to the client community but also to the archival
organization making use of this technology.
MacKenzie
Smith, Associate Director for Technology
MIT Libraries, "DSpace:
An Open Source Dynamic Digital Repository."
In March 2000, Hewlett-Packard Company (HP) awarded $1.8 million to the
MIT Libraries for an 18-month collaboration to build a Durable Digital
Depository, DSpace™, a dynamic repository for the intellectual output in
digital formats of multi-disciplinary research organizations. As an open source
system, DSpace is now freely available to other institutions to run as-is, or to
modify and extend as they require to meet local needs. From the outset, HP and
MIT designed the system to be run by institutions other than MIT, and to support
federation among its adopters, in both the technical and the social sense. Links
for downloading the free open source are located in the Special
Resources section.
Smithsonian Institution Archives
Website Archives Project. Below are three reports on the Smithsonian Institution
Archives Website Archives Pilot Project. While designed to meet the dynamic
needs of the Smithsonian Institution, the project has considerable relevance to
most organizations facing the challenge of archiving enterprise websites.
1) Charles Dollar, Dollar Consulting, "Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices," July 20, 2001. This paper has been substituted for Dollar's slide presentation on this subject to the 2001 Society of American Archivists Annual Conference, previously located here. It summarizes Dollar's white paper for the Smithsonian Institution on issues and options relating to the management of websites for both regulated and unregulated organizations in an emerging content management environment.
2) Charles Dollar, Dollar Consulting, "Archival Preservation of Web Resources: HTML to XHTML Migration Test Considerations, Evaluatin, and Recommendations," July 1, 2002. "This report presents the results of a study undertaken by Dollar Consulting for the Smithsonian Institution Archives (SIA) as part of a larger effort to test and evaluate the feasibility of preserving Web sites and HTML pages in an accessible, usable and trustworthy form for as far into the future as is necessary."
3)
Smithsonian
Institution Archives Records Management
Team, “Archiving
Smithsonian Websites: An Evaluation and Recommendation for a Smithsonian
Institution Archives Pilot Project,”
Adam
K. Watts, "XML
Briefing for Managers"
in Government Technology, August
2001, on how XML differ from HTML? Is it something new that will sweep away all the hard
work on your portal, or can XML and HTML co-exist?
WEB ARCHIVES ON LINE: Visit three web archive sites to get a feel for
how this dynamic information is being captured on the World Wide Web. See Special
Resources section.
Back to Home Page