Information Technology for Economics and Management
ISSN 1643-8949 ITEM, 1 (2001), e-journal
Silesian Technical University, POLAND
SPECIFIC PROBLEMS OF ELECTRONIC DOCUMENT
Summary: The specific problems, related to information in the enterprise environment are discussed, especially from the point of view of the modern information media – the electronic document. There are also topics related to national characters coding and representation.
In the first chapter general definitions of data, document and information are given with the emphasis on information importance in business and production practice. In this chapter also so-called information anomaly is discussed.
Different aspects of contemporary electronic documents, e.g. format, usage, data migration and automation of input process are discussed in chapter two. Some information about Optical Character Recognition software is given.
In the chapter three there is an analysis of national characters coding and influence of existing coding methods on different parts of information systems. Also short description of contemporary approach to the software structure is given.
In the conclusion pros and cons of electronic documents are discussed. 1. Data, documents and information
In the everyday enterprise activity we can meet different terms related to acquisition of information, processing, storage and visualization processes. The basic term is Data. Data can be facts, observations, measurements & assertions - it is the raw material for documents and business. Documents are used as vehicles for the storage and communications of a package of data.
Data itself is meaningless. We can use the data only after conversion to information.
Information is data that's been put together in such a way as to give it meaning. It means that proper data classification (categorization) was applied together with suitable relations and conditions .
Information is the raw material for the smallest transaction or a strategic decision. It is one of the five key business resources:
Capital • • • • • Materials Plant People Information
Information empowers people, but only the useful one. The key to useful information is its effective selection . It's possible when information is stored in structured form. The main reason behind of structured data is to assure access:
to the right information, •
for the right person, at the right time.
Using computer aids to meet this objective is the new element and offers many opportunities to establish both competitive and commercial strengths.
But according to Xerox, "The Document Company", only 10% of data is structured and held as computer data, the remaining 90% of a company’s data is held within paper documents . We can see specific business information anomaly (Table 1).
Table 1. Business information anomaly
Parameters Unstructured data Structured data
Amount of company
data 90 % 10 %
Amount of money
spent for IT resources 10 % 90 %
Percentage of used
data 85 % 15 %
It's because only 10% of the IT spend is actually on managing a company’s unstructured information and only 15% of the workforce actually handle this structured data. The remaining 85% of the workforce handle a company’s document based information to make decisions and carrying out their job function. In general a lot of time is lost during process of information finding.
The only way to resolve this "anomaly" is to store as much data as possible in computer databases .
2. Electronic Document
Investigations of the Gartner Group  show that the creation and management of documents may cost a company between 6% and 15% of its gross revenue. It therefore makes commercial sense to improve the efficiency and standards of creating, using and managing a company’s document based information. An efficient solution of this problem is an Electronic Document.
The most important feature of the electronic document - a computer file in general – is it portability. It means that the documents should have both the same form and the content, despite the place of origination (source) and usage (utilization).
At present there are no legislation standards in this matter. The most often specific de-facto standards are used based on popular and wide used applications.
We can name several such standards of electronic documents' formats: Hyper Text Markup Language - htm(l)
• • • • • • • • • • •
Portable Document Format - pdf
Envoy Document - evy
Word/WordPerfect Document - doc
ASCII Text File - txt
Microsoft Book Reader - lit
Usage of the specific format depends on many factors, e.g. application(s) used, problems solved, habits and tradition .
Another question is the electronic documents' input into the computer system. This input depends on form of information and his structure. In general we can divide data into two groups directly related to the input forms:
data in electronic form – direct input
data on traditional media – special devices and technologies must be used:
- keyboard for manual text input;
- scanner (for texts, pictures and drawings);
- digitizer (for pictures and drawings);
- voice recognition (for text data).
The three last named techniques are used for automatic input of data stored on traditional media and the scanning is used most often . For this process we must have not only scanner – the input device, but also special optical character recognition (OCR) software to deal with text. Among OCR applications we can point the most popular:
Recognita™ (by Recognita®) Omnipage™ (by Caere®) Readiris™ (by Iris®)
When automatic text recognition process is used a new, very important factor must be taken into consideration, which stipulates process effectiveness and data authenticity – the national characters problem.
3. The National Characters Problem
The national characters, characteristic of specific language (different to English one) are a big problem in computer's technology for many years because of lacking of fixed and exact coding. For example, in our everyday activity we can meet the three following coding arrangements most often:
ISO (8 bits) → code page, e.g. 437, 852 (DOS applications) • • • • • • • • • •
ANSI (8 bits) → code page, e.g. 1033, 1250 (Windows 3.x/95 applications)
UNICODE (16 bits) → alphabet, e.g. Western, Eastern Europe (Windows 9x applications)
Because applications, which generate specific document types, can operate in different software environment, in the same type documents can present different forms of national characters coding. As an example we can see:
*.txt → DOS or Windows 3.x/95 document?
*.doc → Word 6.0 or Word 2000 (maybe WordPerfect) document?
National characters have influence on different aspects of computer technology usage. The main areas are as follow:
hardware: - printers,
- modems and fax-modems, - video terminals; software: - operating systems, - applications; Internet: - html documents, - e-mail.
Because of different coding and some additional aspects, the influence of national characters can results:
in predictable manner, e.g.: - no national characters at all, - wrong characters present; in unpredictable manner, e.g.:
- missing characters on printout despite they are present on the screen, - different font on printout,
- modifications of application configuration files, - other...
OS or Application User Language Module 2 User Language Module 1 Processing Module Data
Fig. 1. Contemporary software structure.
• • • • • • • • •
Integrated application Microsoft® Office 2000 is a good example of mentioned approach .
Analyzing the processes of implementation and using electronic documents we can see several positives and a few negatives.
The positive features include: faster information circulation,
embed proper route (hierarchy) for the document , e.g. preparation → verification
elimination of unnecessary copies (group works with the same one document  not with the different copies),
easier editing tasks.
On the other side we have had also negative features:
increase paper consumption (it's paradoxically but true, because some legislation requirements and document proofing using printouts),
negative feelings related to higher risk than in traditional environment (good example may be a Y2K symptom),
degradation of smaller (weaker or with small population) nations, computer systems' dependency.
Comparison all pros and cons shows, that positive factors are stronger. And practically it is not possible to find another way to information management, without computer technology, without electronic document. But it is also necessary take into consideration all negatives and remember about them.
 Business Information in Practice, “Xerox Report”, Xerox and Aetra Life, Palo Alto 1997.
 Culver E., Jump Start Your Relational Database Design, Filemaker Pro Advisor, Escondido, January 1998.
 Electronic Document Management, Pros and Cons, Gartner Group, 1996/97.  Managing Information, “Color In Color Out” CD, Epson, 1996
 Michalski A.M., Multi-layered Information Structure in Industrial Transformation Process in Upper Silesia Region, paper on International Conference and Exhibition “GIS Croatia 98”, Osijek 1998.
 Michalski A., Specific Aspects of Electronic Document Management, paper of International Conference and Exhibition “Spatial Information Management in the New Millennium”, SILGIS, Kraków-Katowice, 1999.
 The New Microsoft Office 2000,
http://shop.microsoft.com/description/office2000.html, Microsoft, 1999.  Workgroup Software that Works the Way You Do,