by Barbara Reed
[NB: This paper was
originally published in the (
Barbara Reed is a practicing
consultant in the field of records, information and archives management as a
Director of Recordkeeping Innovation Pty Ltd, an Australian based company
delivering recordkeeping consulting and training services in
The digital world introduces new challenges to recordkeeping professionals. The initial response has been to transfer our traditional recordkeeping systems to automated solutions. Increasingly these are being challenged for fit in dynamic organizational environments. Web services as a building block for next generation software applications are growing in acceptance both in governments and innovative product offerings. This article outlines the concepts of web services architectures and begins an exploration of the uses that recordkeeping professionals may define for such a potentially radical change to the way we deliver recordkeeping functionality. It challenges recordkeeping professionals to ensure that they are very firmly grounded in best professional practice in recordkeeping in order to grasp such technology opportunity.
We are living in a world of rapid technological change. Organisations are demanding more functionality from integrated applications; new computing techniques to exploit, reuse and repurpose legacy data; and, quicker deployment of technology to suit rapidly changing structures and business focus. And some of these drivers actually contain contradictions – we want it simple, but it has to be compliant with a raft of complex requirements; we want it quickly, but it has to be comprehensive etc; we want it designed to meet the future, but don’t forget the valuable data that we’ve accumulated in the past. Some fundamentally new ways of thinking about technology are emerging to manage this complex set of requirements – using existing technologies but putting them together quite differently. As recordkeeping professionals we need to be monitoring the shifts in both organizational expectations and trends in technology to ensure that we are able to make our own discipline specific contributions to the environment and ensure that recordkeeping continues to be delivered in ways that suit the environment we are presented with.
To some outside our profession there is a perception that we are narrowly focused and somewhat obsessed with implementing recordkeeping according to a prescriptive framework which users don’t like. Our methods and the technologies they are implemented on are sometimes seen as expensive and difficult to implement, maintain and sustain. These perceptions are not always the whole truth, but undoubtedly there are some grains of truth in them.
Why? What are the problems:
Implementing electronic document and records management is not simple. It needs to be deployed organization wide. It involves a significant change to embedded organizational and technology practices. Costs and scope of projects are difficult to estimate and benefits are often intangible and hard to quantify. Organisations take fright when the implications of implementing electronic document and records management systems begin to evolve into whole-scale change management exercises.
The existing application market is complex. Packages that we know and trust to deliver recordkeeping outcomes are being swallowed by bigger enterprise suites (Oracle taking over Stellant, IBM taking over FileNet, Vignette taking over Tower Technology, Interwoven taking over iManage etc.). The drive to meet multiple purposes of information management in addition to recordkeeping, makes configuration of the systems to meet multiple purposes a matter of product priority. But it also means that we cannot assume that an organization implementing one of these types of products will automatically have reliable and authentic recordkeeping capacity. They can, of course, do this, but typically they they require significant configuration. Are we all as skilled as we might be, or need to be to guarantee the appropriate configuration to deliver reliable records?
And are we as recordkeeping professionals flexible enough to be able to appreciate that doing things differently may achieve the same end, even though it might look really different to the traditional ways of achieving the outcome? Some of our rules are paper based hangovers – it is what we know and where we came from. Some however need to be rearticulated into the electronic world (and quite a few of our tools are inherently paper based – eg disposal authorities!).
Another big problem is expense. What used to be a hidden cost of staff time to run recordkeeping is now exposed as an organizational cost. Electronic document and records management systems need to be rolled out to all employees and the models for pricing are correspondingly high. The cost of the purchase, licence fees, individual seats and maintenance is regarded as very high and often represents the major cause in reluctance to proceed to implementation.
For recordkeeping professionals, and increasingly on the radar screen of organizations, are problems associated with the proprietary nature of most of the EDRMS or enterprise content systems on the market. They use tightly locked protocols and processes which are not, generally, open to public scrutiny and innovative re-use. They tend to be quite large pieces of software, written as proprietary code, and based on closed architectures, which makes integration across multiple business systems very challenging indeed. Professionally we’ve been working on metadata strategies to counter this problem (see ISO 23081 Parts 1 and 2). With consistent application and use, the impact of standards on the vendor market and on the consumer market will increase.
We have done a great job in espousing the notion that recordkeeping and the creation/capture of records is everyone’s business. But it is difficult to be complacent about this when we can’t deliver affordable and effective solutions to manage their voluminous email and documents electronically.
Some exemplar implementations have achieved the roll out of recordkeeping applications to everyone’s desktop, but even here we’ve had to force users to use our recordkeeping protocols – a common criticism of records application packages are that they are designed for recordkeeping professionals but rolled out to all users. Even if deployment of recordkeeping applications to all desktops is achieved, will this really achieve the goal of comprehensive recordkeeping? EDRMS applications typically only manage the ‘office’ type documents, rather than the business systems which create the bulk of records in most organizations. To achieve the goal of comprehensive recordkeeping we will need records embedded in work processes – to happen wherever business takes place.
The business systems deployed in organizations are constantly changing and shifting shape. In the last few years we have seen some successful examples of integration between business systems and EDRMS systems – records automatically created in the business system being captured into EDRMS, using exposed APIs (Application Programming Interfaces). APIs are defined by Wikipedia as ‘a source code interface that a computer application, operating system or library provides to support requests for services to be made of it by a computer program.’ So, APIs are a part of the complex environment needed to support system to system integration. But at the moment all of the integrations are hard wired – this system to that system using the data, protocols and interfaces specific to the two (or however many) programs that are tightly bound or coupled together. If you change one or the other, the interface programming will not work. There is a very real problem of maintaining the sustainability of these hardwired implementations.
In outlining the problems above I am not trying to throw brickbats at vendors who do an admirable job trying to meet conflicting demands in different markets at various layers of sophistication. The problems I have identified are generic and a result of a competitive marketplace. New product offerings are coming along all the time. Two of the most interesting initiatives currently offered are open source software and software as a service offerings (previously commonly known as application service providers).
The technological environment
If that is a broad view of the challenges that we face in managing appropriate deployment of records software in the electronic environment, what is the technical environment in which organizations are working?
Monolithic legacy business systems are the norm in almost all organizations. They cannot just be ditched because of the cost of replacement is simply too high, and the value of the data stored in them is too high to contemplate not having access and migration may be too costly. Such systems have grown like topsy, with bits and pieces bolted on as new enhancements are made or deployed.
On the other hand, the business drivers are towards electronic delivery of service, requiring agile programming to support rapid deployment of technology for new services and often requirements to work across previously rigid organizational boundaries in new ways. The desirable traits for software development are commonly identified as being “agile,” “nimble,” “flexible,” and
“adaptive.” In that environment, new start up businesses have a significant advantage because they can adopt new, more effective business systems without the drag of legacy systems that need to be converted. The legacy systems with their silo specific and purpose built business requirements are impeding the development of fast responses and data sharing.
The web has spawned quite new ways of doing things – organisations that have been designed for working in the web environment are often doing so in ways that completely challenge traditional thinking – for example, think of Netscape and Google giving away browser software; Amazon with its success through referential linking; Or eBay (or Trade Me); Or You Tube. Or Al Gore’s interactive cable TV channel. In this world, traditional notions of intellectual property, ownership, copyright, digital rights management and asset valuation and use are fundamentally challenged and often ignored. Our kids (or perhaps you) are expert in social networking, mashing and blogging. This is the emerging world of Web 2.0, a concept surrounded by more hype than the 1990’s ‘information superhighway’ and about as developed as that very early expression from the 1990s. But at base, the developments and innovations are possible at least partly because they are being reinvented from nothing – not dependent on legacy systems or legacy thinking.
The web environment, electronic commerce and internet applications are heavily dependent on a set of protocols known as web services. ‘Web services’ is one of those tricky phrases that mean different things to different people. And at its heart it is not all that new. At base, it refers to a set of code that involves sending a message or request for information or an action to an independent service and receiving an answer from that independent service. It also involves establishing a directory, or a place that people lodge services and others find them. The formal explanatory diagrams illustrate web services like this:
This basic architecture has been around for some time (at least since CORBA in the late 1990s) and is the basis for distributed systems. However, what has changed is that it is now a formally accepted way of constructing programmes and is firmly based on standards protocols – as you see in the diagram, SOAP (simple object access protocol) messages and WSDL (web services description language) as a formal language for the services themselves.
Web services are now an accepted part of the computing environment. If we asked our EDRMS vendors about web services, they would indicate with perfect truth that they use web services extensively. They use them because the basic protocols are what make web based front ends and the distributed load of most EDRMS systems work at present. But is this really the end of the utility of web services?
Web services are moving beyond being merely a computing technique used within an application programme. More and more they are being seen as products in themselves. They’re components of larger programmes that perform specific functionality and can stand alone. They can be broadly defined as reusable components which operate independently but which can be used by many applications seeking to do the same thing. Instead of tight integration with specific applications (like the current integrations between business systems and EDRMS using APIs) web services are being defined so that they provide “a very loose coupling between an application that uses the Web service and the Web service itself.” This allows either piece to change without negatively affecting the other, “as long as the interface remains unchanged.” This flexibility allows software to be built by assembling individual components into more complete process flows.
Web services are either public (registered in service registries on the internet) or private (maintained within an organisation’s own firewalled space). At the moment, because of issues of security and the need to have messages cross organizational firewalls, most enterprises would favour bringing the services within their own walls. However, public services are increasing in number. They are registered with UDDI (universal description, discovery and integration) registries, and those seeking services to reuse or build upon query registries to find the services they require. Initially these registries were experimental, but as the take up of services is growing, independent companies such as xManage or very recently Sourceforge, the open source software listing site, are including web services listings.
In reality it is just the beginning – and we can only see pretty early instances of how this might really work to enable us to construct things differently. A web service is really only a very small part of a bigger picture. But it is an important part. To construct web services, they have to be defined at an appropriate level of granularity – that is, what will stand alone and be a useful level of service, what will other people want to reuse, what needs to be bundled together for it to operate in a consistent way, so that regardless of the technical implementation environment, they will produce a consistent and predictable outcome. Web services also need to be constructed so that they can be consumed in any technological environment – in technical speak, loosely coupled – able to be linked in using their definitions, but also able to standalone.
Published web services by themselves are not a lot of use at present, but the business utility of these services is rapidly increasing. Possibly the most useful type of service to us at the moment are those that perform conversions between software formats – from ppt to pdf or Microsoft office to open document. Others exist as small utilities that we might use as individuals – for example currency converters. Such services are increasingly being used to deliver componentized functionality serving business ends and available for re-use in any environment.
Web services operate at a degree of granularity which is variable, and also because of the difference in interpretation – are we talking just about the technique which uses messaging protocols in a particular way, or are we talking about the notion of reusable components – there are different opinions on what web services are and how they can be used.
To make web services work together as components which together deliver a predictable business outcome, we need to string them into sequences which actually perform specified functions. We need to link them to business processes. The linking protocols that have been defined to do this are currently known as orchestration or choreography (depending on what level and complexity we are talking about). Both look very similar to work process or business process flows. So, a process or work flow needs to be defined, and services mapped into the specific steps in the process, which together combine to deliver the business outcome required. This area, too, is increasingly standardized, using formalized standard-based languages to document the flow, such as BPEL (business process execution language for web services) or WSCI (web service choreography interface). The diagram below offers an example of an orchestration.
Service oriented architectures
It is fairly clear that reconceptualising business as a set of reusable components, either sourced from within the organization, or from the internet, needs a clear framework which will enable the management of the technology and the components used.
The notion of information
architecture or enterprise architecture frameworks is not particularly new,
having antecedents dating back into the 1980s.
Increasingly Enterprise Architecture is being linked to ‘service oriented architectures’ – that is a business model that involves organizations using a specific reference model of a framework to map and manage the uptake of many services which can potentially be defined. This view of organizations seeks to find common elements across application environments and business lines which define components that can be re-used from one application to another. Service oriented architecture is far more than the capacity to use web services (as defined above). It is a complete rethink of the way organizations conceptualise and structure their information systems. It is not a short term project – rather one that involves a significant organizational commitment, establishing revised priorities for infrastructure replacement and deployment, a new governance model and a significant lead time. Only a few organizations have really committed to this as a whole-scale enterprise architecture.
However, the stakes are
getting higher. In the
To transform the Federal government to one that is citizen-centered, results-oriented, and market-based, the Office of Management and Budget (OMB) is developing the Federal Enterprise Architecture (FEA), a business-based framework for government-wide improvement.
The Australian Information Information Management Office (AGIM) has recently (June 2007) released version 1 of its own Architecture Reference Models, which are described as being ‘a very lightly customised version of the OMB’s FEA Consolidated Reference Model Document Version 2.1, taking into account the Australian context within which it exists’ . The aim is to establish a framework for development that:
• provides a common language for agencies involved in the delivery of cross-agency services
• supports the identification of duplicate, re-usable and sharable services
• provides a basis for the objective review of ICT investment by government
• enables more cost-effective and timely delivery of ICT services through a repository of standards, principles and templates that assist in the design and delivery of ICT capability and, in turn, business services to citizens.
While not mandated in
Four implication arenas for recordkeeping
Where does recordkeeping come back into this emerging picture? There are at least four areas where recordkeeping is, or should/could be, involved in these initiatives. They are
Services as a document-centric technology
Web services were explored in the Monash Clever Recordkeeping Metadata Project. Web services are based on documents – it is a message based technology – it works by exchange of messages in xml format. These messages are at a very low level of transactional performance. But they involve a quite formalized protocol of communication. Where a publicly available web service is used, it involves a number of sub processes and quite a lot of message based transactions. There is:
Each of these separate components parts of executing a service are based on messages being exchanged automatically between various parts of the technology.
This is probably no different to the myriad of technology calls that go into operating an email system – negotiation of the network pathways, the acceptance, validation and receipt of the email. As end users we just see what we get in our proprietary based email in-box. But underneath that there is a conversation about just what constitutes a dispatched and received email. What constitutes a valid transmission? What will trigger the automatic generation of the somewhat ubiquitous ‘Mail delivery failed: returning message to sender’?
We as recordkeeping professionals are definitely not in the conversation about what constitutes a record in the network to network communication world – but should we be? If our organizations start using public web services (or even private web services) to do some of their business, don’t we want to know some things about those services, the agreements implicit or explicit in using the service, the versions of the software and what constitutes an acceptable assurance of quality or validity? In email some of these calls, routes etc are available in the internet headers, able to be queried in the native software application – but when we save email out of the native software (outlook etc), these headers are not saved with the message – or at least not by your average user. In the world of web services, some of these transmission messages will be essential to be able to assert authenticity – who is making decisions about what is required for authenticity? Are they transitory or should some of them be regarded as a part of the record?
In the web services world these messages are what makes the action take place. They are very small and happen at phenomenal speed over networks. What is not transparent is the decision process of what should be retained, for how long and what should be persistently linked to the results. This is an appraisal process built into the operations of web service. There are likely to be different requirements for different types of service. Such decisions need to be documented and made explicit in the use of the service.
Orchestrations for business processes using services
Delivering business through web services within a truly service oriented environment will involve development of orchestrations to suit each business process, stringing together and coordinating the calls on each web service which together meet the business outcome required. The definition of the business or work process, and the granularity of what is strung together to provide a business service, will differ according to what the specific business process is designed to achieve. But some of the service components can definitely be about capturing records. Here is another great opportunity for recordkeeping to proactively become involved in defining recordkeeping requirements in conjunction with newly defined business processes, and building the records processes into the business processes – just what we want!
The potential for adding
records services as web services into business process orchestrations has been
identified and recognized, it seems, by only one government archival authority
– and the work of the National Archives and Records Administration (NARA)
provides a model for us all.
This was a project well ahead of the rest of us. It has been working since 2005 on refining and developing articulation of records management services, culminating in September 2006 with the issue of ‘Functional Requirements, Attributes and Unified Modeling Language Class Diagrams for Records Management Services’.
The 7 services defined are:
The definitions of the services occur within the US Federal Government context – and the drivers behind the model of records management in that sphere are:
US models for records management are not totally compatible with the Australasian articulation of recordkeeping. We don’t hold some of the same concepts – such as ‘putting aside’ and managing at the end of the current business. Rather, we are more likely to want recordkeeping integrated from the commencement of the business. The records service initiative acknowledges that implementing services ‘will allow the management of records to begin much earlier in their life cycle than is currently practicable’ but the components being defined are still alien enough from our practice to it difficult to see how our more proactive processes would use these service definitions. Some of the discussion required is about the granularity issue – what is appropriate for packaging as a service.
We need some thinking about what services we would
find most useful at the point of integration into business processes. In
Each service would have a built in requirement to document the specific event in recordkeeping metadata. There are probably many others that we could think about. For example, we might define some type of monitoring service which would pick up alterations to the business process orchestration, and trigger an alert for action.
The pioneering work of
Delivering recordkeeping functionality as a set of web services
But what if, in the future, we could define recordkeeping functionality totally as web services – not just the integration of recordkeeping into business but the whole gamut of recordkeeping. We might not need applications in that future. We might find that clever people write and publish orchestrations which link specific services reflecting records processes together from multiple sources and sell those orchestrations. Perhaps we would just buy specific services in once a year – for example, lets check the validity of formats or the checksum integrity checks for digital records, every year on September 1. Perhaps all we would need is a repository or storage place somewhere – and as we’ve seen in debates over custody, this doesn’t even have to be within the boundaries of the creating organization.
This is a very long way from the current environment.
We need a whole lot of infrastructure to make this even an idea existing as a twinkle in the eye. We have recognized some of the building blocks. We are working on some of the very basic building blocks, prime amongst them being metadata standards. To make this world a reality, we would need very clear articulation of various services that make up recordkeeping functionality. We’re closer than we were, but we are quite a long way from that reality.
Some leads are available to us through vendors offering products in the software as a service environment. But most of these offerings are still a cradle to grave offering looking very like traditional applications but offering outsourcing the storage of documents and records, and licencing technology in different ways. There is little as yet that might enable us to buy or licence a specific component to undertake a specific module or part of record functionality to plug into another application.
This type of world would need recordkeepers that really really know their business – individual orchestrations for processes would be needed at every business, and the availability of really good, replicable and reliable services. We are a long way from this reality and it might take years to get there, if we ever do.
Interim strategies using specific web services
But, in the meantime, there are many instances of things that would potentially find immediate utility if we packaged them as web services. And here are a few examples:
The Monash Clever Recordkeeping Metadata project delivered a working prototype of a service to map and convert metadata from one scheme to another. This would find immediate use in organisations which need to capture records from business systems. Rather than hardwiring the interfaces, this type of service would sit external to specific systems, and enable business system metadata to be converted to recordkeeping metadata. To make this work we need xml schema for the relevant systems (business system and records at an appropriate degree of granularity). These would be registered in a metadata registry probably (at this time) maintained within an organisation’s own boundaries. When invoked, a conversion process has been demonstrated to work, converting and transforming the metadata into different schemes. This has the immediate and distinct advantage that when any one of the technical applications change, the only thing that needs to be updated is the mappings of the metadata scheme – done once and replicable across multiple instances.
On a local and micro level, disposal authorities are re-isuued to suit changing environments. In the NSW jurisdiction, State Records NSW has recently issued a new general disposal authority for administrative records – the new GDA 28. This replaces the older GDA 2. The requirement is that all records previously sentenced with GDA 2 now have to be resentenced into GDA 28 – and there is not always a one to one correspondence – some disposal classes have been amalgamated, some have been defined in greater detail, that is one previous class is now two or more current classes. Every organization that has allocated disposal classes on creation, now has the problem of having to update their existing sentencing. Wouldn’t it be great if, in addition to issuing the flat documents, State Records was in a position to also publish a little service that could be plugged into the x most commonly used software packages which would extract the GDA 2 disposal metadata, and replace it with GDA 28 metadata, create a report for human resolution where the resolution of conflicting classes needed human judgement, write an event in the event log to note the updating of the disposal provisions applicable to every record changed and provide an exception report. Now that would be useful.
In the digital preservation space, many projects are thinking about using services – it is a logical space for innovation as there is not much established practice to have to integrate. At the moment most of the available tools are packaged as executable programs, but some of them are crying out for conversion to services. DROID, for example, is The National Archives of UK’s format checking tool. It works by executing a client side program which issues calls back to TNA’s PRONOM file registry to report if a file format is still current. While not currently packaged as a service, it would not be a big step to make this a service.
PLANETS, Preservation and Long Term Access through Networked Services, a collaborative program funded under the European Union’s 6th Framework Programme, will be developing practical services and tools to help ensure long term access to digital culture and scientific assets. While nothing is yet available as outputs from this new-ish project, the intention is clearly to publish the tools as web services.
Protocols like the Open Archives Initiatives metadata harvesting tool again, have demonstrated the potential impact of small bits of functionality disseminated as web services. Tools like the collaborative National Library of NZ and British Library ‘web curator’ could also be packaged as services. In fact these types of tools could be generalized to work in organizational environments as well as internet environments, and therefore immediately meet recordkeeping ends in addition to their primary aim.
It is likely that processes such as transfer of digital records to archival custody will be managed by services. Again, this is a new process, a completely electronic one, and therefore suited to the use of new techniques.
The emerging world of service oriented architectures built on reusable web services provides recordkeeping professionals with new opportunities. Web services are with us now - that is the technology capacity is available. But we are really at the very beginning of working out how to implement them within truly service oriented architectures that are sustainable for our organizations. The reality of the service oriented vision is probably at least 5-10 years away. The technology is available, but the integration, governance and business thinking is not yet ready for it. But thinking into the future offers recordkeeping professionals an opportunity to tool ourselves up to move away from some of the inherently paper based practices that we have persisted with, and possibly offers us a chance to deliver recordkeeping quite differently.
 This paper derives from research work undertaken as
a part of
 For access to the ISO standards, use the ISO product catalogue http://www.iso.org/iso/iso_catalogue.htm
 Quoted from US Center for Digital Government ‘ Service-Oriented Architecture: Making Collaborative Government Work’ http://www.centerdigitalgov.com/publications.php?pub_id=39
 Examples are http://www.lettos.com/index/, http://www.artofsolving.com/opensource/jodconverter. With thanks to Andrew Wilson and Michael Carden of National Archives of Australia for these examples.
 What the basic technologies do not give us is the rich behavioural detail that describes the role the service plays as part of a larger, more complex collaboration. When these collaborations are collections of activities designed to accomplish a given business objective, they are known as a business process. A business process may extend across one or more organisations. The description of the sequence of activities that make up a business process is called an orchestration.
Other terms such as choreography, flow composition, and workflow have been applied to this area and all, essentially, describe the same thing – the way in which separate Web Services can be brought together in a consistent manner to provide a higher value service. Orchestration includes the management of the transactions between the individual services, including any necessary error handling, as well as describing the overall process.
Orchestration can therefore be considered as a construct between an automated process and the individual services which enact the steps in the process.
 The terms orchestration and choreography describe two aspects of emerging standards for creating business processes from multiple Web services. The two terms overlap somewhat, but orchestration refers to an executable business process that can interact with both internal and external Web services. Orchestration always represents control from one party's perspective. This distinguishes it from choreography, which is more collaborative and allows each involved party to describe its part in the interaction.Proposed orchestration and choreography standards must meet several technical requirements that address the language for describing the process workflow and the supporting infrastructure. Chris Peltz, "Web Services Orchestration and Choreography," Computer, vol. 36, no. 10, pp. 46-52, Oct., 2003
 AGIMO, Australian Government Architecture Reference Models Version 1.0, June 2007, page 1 http://www.agimo.gov.au/__data/assets/pdf_file/57517/AGA_Reference_Models_Version_1.0.pdf
 Available from http://www.archives.gov/era/rms/rms-documents.html
 Functional Requirements , Attributes and UML Class Diagrams for Records Management Services’ September 2006, p viii http://www.archives.gov/era/rms/rms-documents.html