|
Technical Considerations When Storing Public Records in
Digital Format
On July 6, 2000, Illinois Governor George H. Ryan signed an amended Local Records
Act (50 ILCS 205). This bill allows local governments to reproduce public records
in either microfilm or digitized electronic formats. The new law stipulates
that if the local government keeps a public record in an electronic format,
the method must be a “trustworthy manner so that the records, and the information
contained in the records, are accessible and usable for subsequent reference
at all times while the information must be retained.” This practice is only
allowable if the electronic records are reproduced on a “durable medium that
accurately and legibly reproduces the original record in all details,” and “does
not permit additions, deletions, or changes to the original document images.”
Each agency is also under the obligation to file a Records Disposal Certificate
with the appropriate Local Records Commission before any original record may
be disposed of and before the reproduced digital record is disposed of.
When indicating his support for the bill, Governor Ryan noted that there are
no universal standards for the creation and storage of electronic documents
and called on local governments to “be cautious in the way in which they maintain
public records and protect the public interest.” He urged local governments
to use the months between the signing of the bill and its effective date of
January 1, 2001, to carefully develop strategy and methodologies for implementing
digitized electronic document creation and storage.
Unfortunately, you cannot simply buy a scanner and a CD burner and hope to
effectively convert paper records into reliable electronic records. The process
of converting records to electronic format is complex and requires careful planning
and vendor selection in order to be effective.
The challenge with digital records storage is to ensure that records remain
in usable form regardless of changes in technology or obsolescence of particular
file formats and storage media. Another significant challenge is selecting
an appropriate storage medium (CD-ROM, DVD, etc.) that will be sufficiently
dependable for the entire life of a record. At present, there is confusion
and uncertainty over the long-term dependability and viability of certain media
types.
Contrary to popular belief, magnetic and optical media do not last forever,
and may only be relied upon for a few years before the records stored in a particular
media must be transferred in order to avoid the inevitable electronic file corruption
that physical degradation of media will bring. Guidelines for digital media
are always changing, and media quality is highly variable depending upon the
manufacturing quality of the media. As an example, a CD-ROM may have an expected
usable life ranging from only a couple of years to more than 20 years depending
on the manufacturing quality and the quality of storage.
The same is true of other magnetic and optical media types. Quality of storage
and manufacture are very significant in determining the overall usable life
of the media. Obviously, care in choosing a reliable media source is essential
to ensure data reliability.
Standards and File Formats
While there are no formal standards in place to guide an agency considering
implementation of a electronic records storage system, there are a number of
accepted industry and even proprietary standards that can increase the likelihood
of long-term data accessibility. Industry standards that have gained common
acceptance are GIF, TIFF and JPEG for images, ASCII and RTF for text documents,
and HTML and XML for documents that are to be displayed on the Internet. GIF,
a graphics format owned by Compuserve, is also an industry standard but due
to past attempts to levy royalty fees on creators of GIF files, this may be
a more problematic standard than TIFF or JPEG.
It may also be desirable to store text documents both as searchable text and
as an image. Documents could be stored just as image files, but one of the
primary advantages of digital document storage is the ability to search for
particular documents based on contents. This can only be done through a full-text
search which requires the conversion of the image to text.
The Adobe Portable Document Format (PDF) is a proprietary standard that has
become a de facto standard for publishing fully formatted documents on
the Web. In order to read these documents, however, a user must have a copy
of the Adobe Acrobat Reader, available free from Adobe, installed on their computer.
The creator of the document must have a copy of the full Adobe Acrobat installed
on her PC. Adobe charges for the full version of Adobe Acrobat.
Adobe PDF format has the advantage of being able to display the exact format
of the original document with all graphics and text formatting intact, but it
has some disadvantages. The most significant of these is the inability to revise
the document without the full version of the software, though this may be considered
an advantage under certain circumstances.
It generally makes more sense to store word processing documents as searchable
text rather than PDF, but the question is whether or not to store the document
in the native word processing format, or store it in a more universally accessible
format like ASCII or RTF. RTF has the advantage of retaining most formatting
and spacing; ASCII will not retain formatting. Regardless of the format chosen
for storage, the document should be ultimately reducible to ASCII.
Compatibility and File Conversion
Those who have had experience with various word processors over the years know
that compatibility between versions is not always guaranteed and while vendors
try and maintain backward compatibility, forward compatibility is almost impossible
to maintain. As an example of this, old versions of Microsoft Word cannot access
files created in newer versions of Word. The same is true of older and newer
versions of WordPerfect.
Conversion of documents between word processors like Word and WordPerfect is
frequently problematic. Many users who have attempted migration of documents
between the two packages discovered that conversion to a format such as RTF
or ASCII that is compatible with both word processors is more dependable than
direct conversion, particularly when a document must be converted back and forth
several times. RTF can be used when transferring documents between most versions
of Microsoft Word and WordPerfect or any other current word processing package.
RTF is also the default file format for certain programs like Microsoft’s Outlook
e-mail software.
Browser-based Formats
Hypertext Markup Language (HTML) can also be used as a medium of exchange between
word processing programs and other software programs that display text. Most
word processing programs today automatically detect HTML files and display them
appropriately.
One emerging development is the storage and exchange of documents in eXtensible
Markup Language (XML) format. Microsoft seems to be moving in the direction
of making XML its choice as a common medium of exchange between its various
programs, as well as between Microsoft software programs and the programs of
other software companies.
XML shows much promise as a standard for transferring data from one system
to another. As an example of how XML can work, imagine an everyday business
letter, which includes an inside address, a salutation, a date, a body and a
closing. In XML, each of these elements would be enclosed by “tags” identifying
each of the elements. When transferred to a different system, the receiving
system would always correctly identify each element and display it appropriately.
Risks Associated with Native File Formats
Another factor in deciding whether to store a document in the native word processor
format (the native format is the proprietary default format used by the creator
of the word processor) is the long-term viability of a particular manufacturer.
Few market sectors are as volatile as information technology, and the continued
availability of a particular software package cannot be taken for granted.
In recent years, we have seen a number of software companies quickly lose market
share to competitors and find it unprofitable to continue supporting software.
As a result, several previously common word processor packages, such as WordStar,
PFS Write, and Volkswriter, have become unavailable. This indicates that long
term storage of documents in native formats may be risky.
Records Management Systems and Vendor Selection
No matter which media or file format is chosen for storage of electronic documents,
images and other digital objects, a system is needed that can organize and reliably
retrieve the objects. Such a system is called various things by different groups
and vendors, but for the purpose of this article it will be called a Records
Management System (RMS).
Choosing the right RMS—and a reliable vendor to provide support for the system—may
be the most important decision you make concerning electronic records retention.
The RMS must dependably catalog and index all of the documents and images, and
quickly retrieve those objects. This may seem to be a trivial matter, but think
about all of the files and documents your agency deals with every day. Documents
need to be accurately classified and organized into files, and files must be
grouped into larger subject areas. As with paper files, if you electronically
misfile the document, you will have difficulty ever finding it again.
To retrieve individual documents, each must be uniquely identified and indexed
in a database. The identifier could be a system-generated sequential number
but a better choice for the identifier would be an existing number like a case
number. Remember though that any number of individual documents must be “filed”
under this one case number, so you may also need individual document numbers
in addition to the case number. The combination of a case number and the document
number could make up the unique key number assigned to each document. In addition
to retrieving a document through the key number, you may also wish to search
for documents based on key words or individuals associated with the file or
document.
In order to perform these searches, all of identifiers must be stored in the
database, Once the database itself and its search indexes are constructed,
the RMS can quickly retrieve the digitized files.
Records do not have to be kept on-line at all times, but can be stored off-line
in such a way that the RMS can locate the records and then restore them it the
online system so that they can be retrieved. Such a process could take just
minutes, but depending on your system, it could take up to several hours to
load and read tapes or other media containing archived records. Obviously,
a mix of on-line and off-line storage would be suitable for different needs.
On-line storage is best for frequently accessed records, and off-line is generally
adequate for rarely accessed records.
It would be advantageous to integrate the RMS to an existing case management
system so that your staff will only have to use one system to access case information.
However, integration of two systems can be tricky, especially when integrating
legacy systems with newer relational database driven systems. Having an experienced
vendor is key to successful integration.
Choosing the Right Vendor
The importance of selecting a highly qualified vendor when planning and implementing
a RMS cannot be over-emphasized. The complexity and expense of such systems
makes vendor selection perhaps the most important single decision you will make
in the process of acquiring electronic document capabilities.
There are no hard and fast rules for selecting a vendor but there are a few
guidelines:
- Make sure the vendor has a track record of successful projects of the type
you wish to accomplish.
- Verify that the vendor can actually provide you with the people who have
worked on the projects they provide as references.
- Confirm that the vendor has the financial, human and other resources available
to finish your project.
You should prepare for vendor evaluation and selection by learning as much
as you can about electronic records management before starting the process.
Most agencies must issue a Request for Proposals (RFP) in order to procure
vendor services, so it would be prudent to begin this learning process before
creating the RFP.
Steve Prisoc is Associate Director of the Illinois Criminal Justice Information
Authority and can be reached at 312-793-8550 or by e-mail at sprisoc@icjia.state.il.us
Glossary
ASCII (American Standard Code for Information Exchange) - The ASCII
character set is the basic set of characters that can be displayed on a PC screen.
When text documents are converted from their native wordprocessor file formats
to ASCII they lose all special formatting like bold, italics, underline and
more.
CD-ROM (Compact Disc-Read Only Memory) - A CD-based storage
medium.
ERMS (Electronic Records Management Systems) - A system for managing
electronic records
GIF (Graphics Interchange Format ) - A graphics format primarily used
in creating small images for the Web.
HTML (Hypertext Markup Language) - A tag-based language used to display
text on the Web.
JPEG (Joint Photographic Experts Group) - A graphics file format that
reduces large image files into smaller files that can be more easily stored
and transferred. JPEG is primarily used for display of photographs on the Web
and also for digital photography.
Migration of Digital Data - Limits problems associated with continuing
to store documents in old file formats by migrating to more current formats.
PDF (Portable Document Format) - A proprietary format that has become
a de facto standard for documents displayed on the Web.
Refreshing Digital Data - Periodically transfering files from older
physical storage medium to a newer medium to avoid physical decay or obsolescence.
RMS (Records Management System) - A system for indexing and organizing
documents and other records.
RTF (Rich Text Fomat) - This text file format preserves special formatting
like italics, underlining and spacing.
TIFF (Tagged Image File Format) - A rastor-based (bitmapped) graphics
file format which maintains high resolution.
|