Files & Application Programs Archiving Files
Prof. R. A. Mihajlović 2013‐2014‐2021
Topics
• Why word “file”?
• File structure (organization)
• Applications & file structure
• Applications & file structure
– Application programs interpret (understand) file structure – File structure is defined by code/commands for the application
program
• Editor application program & text data file
• OS loader & executable programs
• Unarchive application programs & archive files
• Document render applcation programs & document files – HTML document
– MS Word document
Office File and Computing File Parallel
• The office file idea is used in the computing data bits organization on the storage media.
3
Computing File as a Collection of Bits
• Behind a file name, behind a simple abstract label, there exists a file/array of bits and bytes.
All fil d f bit – All files are made of bits.
– File bits (Bytes) are produced by some data encoder program built into the application related to the given file.
File Name: fileX File No: 1078
File Bits
• A data bit in any file may be either:
– Data code bit representing some data unit/object, or
– Program code bit representing instruction for some processor.Program code bit representing instruction for some processor.
File Bits
• Bit organization/structure is built in by encoder programs and interpreted by decoder programs.
– Bits or the same file may be organized (interpreted) in many ways (by different decoders) but only one is right (matching‐ encoder.)
Question: Operating System & Files
• Does operating system interpret internal file structure?
a) Yes b) No b) No
Answer: Operating System & Files
• Does operating system interpret internal file structure?
a) Yes b) No b) No
• Yes‐ Operating systems trivially interpret files as collections of standard data blockswhich is good for file creation, copy and deletion job of OS internal modules.
• Yes/No‐ Operating system programs use their standard simplified file structure view regardless of the original encoders that have produced file bits.
• No ‐ OS applies no decoder program to processed or handled file bits.
Answer: Operating System & Files
• Different OS programs/modules look at bits of files that they process differently.
– High level OS File system or OS‐FS module looks at files as blocksHigh level OS File system or OS FS module looks at files as blocks of 8b Bytes.
– OS memory manager looks at file bits as blocks of 32b words – OS low level storage manager looks at files as physical blocks or
sectors of 1sector=512B=0.5kB in size.
– OS low level file system looks at files as logical blocks or clusters of physical blocks, (e.g. 1cluster=8sectors=4kB)
OS storage I/O manager looks at files as I/O buffer blocks of – OS storage I/O manager looks at files as I/O buffer blocks of
several logical blocks or clusters.
Operating System as a Program & Files
• Different parts of OS use different bit block units.
• All files to OS programs are simple arrays/strings of bit‐blocks.
– No other meaning is assigned to file bits.
Operating System
High level OS File System ‐ Bytes Low level OS File System ‐ Cluster
UI Shell Programs ‐ File
Storage Management ‐Sector I/O Management ‐Buffer
Memory Management ‐Word
. . .
Internal File Structure
• Internal file structure/parts‐organization depends upon
particular application that uses such a file and is mostly beyond general operating system (OS) view of files
– OS programs views all files as just a sequence of bytes or storageOS programs views all files as just a sequence of bytes or storage blocks.
11
Question: Operating System & Files
• Does operating system perform complex interpretation and decoding of internal file structure?
a) Yes a) Yes b) No
Answer: Operating System & Files
• Does operating system perform complex interpretation and decoding of internal file structure?
a) Yes a) Yes b) No
• Complete or semantic interpretation of “meaning” of bits in a file is performed by the specific and appropriate application program, with matching decoders and data processing software.
Internal File Structure
• From the application structure point of view there are many sorts of files
– Variable length array of records, (e.g., text files) – Fixed size data base record sequence
– Fixed size data base record sequence – Fixed size data base record tree – Text files made of encoded characters – Graphics files made of encoded pixels, etc.
– . . . .
Question: System Commands and Data
• Do some system commands in CLI shell program have data decoder programs built in?
a) Yes b) N b) No
15
Answer: System Commands and Data
• Do some system commands in CLI shell program have data decoder programs built in?
a) Yes b) N
• All commands for inspection/rendering of the file content have appropriate data decoders, (e.g., shell typecommand).
b) No
Question: Text File Structure
• How are text lines defined? What code is used to encode line structurein a stream of bits in a file that is assumed to be just a plain, vanilla text file?
) Li l h
1101010101100001100110010100010110101 . . . 11110001100101010110001110100 . . .
a) Line length
b) Line end character
c) Carriage return and new line feed character d) New line character
. . .
17
Answer: Text File Structure
• How are text lines defined? What code is used to encode line structure in a stream of bits in a file that is assumed to be just a plain, vanilla text file?
) Li l h a) Line length
b) Line end character
c) Carriage return and new line feed character (CR‐LFcharacter) d) New line character (‘\n’ character in C/C++ and Java)
1101010101100001100110010100010110101 . . . 11110001100101010110001110100 . . .
<CR‐LF>
<CR‐LF>
CR LF
. . . <CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
• The most populartext data code is ASCII code.
• ASCII is a standard code
CR‐LF Character
ASCII is a standard code with 7/8 bits per one American character symbol.
• Example: ASCII code for new line CR‐LF uses hexadecimal short hand hexadecimal short hand code 0Dfor bits, 0000 1101 or decimal order number13.)
19
Question: Plain Text File
• Does plain ASCII text file documentX.txthave any structure?
a) Yes a) Yes b) No
Answer: Plain Text File
• Does plain ASCII text file documentX.txt have any structure?
a) Yes a) Yes b) No
1101010101100001100110010100010110101 . . . 11110001100101010110001110100 . . .
<CR‐LF>
<CR‐LF>
CR LF
• Editors recognizeNew‐Line(CR‐LF) character as a commandto display the following character on the new line.
. . . <CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
<CR‐LF>
Archive File – Collection of Files in One File
• Compound file, i.e., a file
that is made of several Table of Content
Archive FilefileB files with the table of
content in the header is known as archive file.
• Example:
– UNIX tarfiles – Java jarfiles
WindowsZIPorRAR
fileA
– Windows ZIPor RAR files
fileC
Archive Files & Archiving Tools
• Special applicationsare used to create archive files, to add files to the archive set of files or extract individual files.
• Example:p
– UNIX /sbin/tar utility
• Uncompressed archive –Simple but not secure – Java JAR.EXEutility
• Compressed archive used mostly by Java programmers –Archive can be digital signature protected.
– WindowsWindows PKZIP.EXEPKZIP.EXEandand PKUNZIP.EXEPKUNZIP.EXEprogramsprograms
• Standard Windows OS utility/systems‐tool –Archive can be password protected.
– Windows WINRAR.EXEand RAR.EXEprograms
• Popular Windows OS utility/systems‐tool –Archive can be password protected.
23
Question: Archive Files
• Do archive files have structure?
a) Yes b) No
Answer: Archive Files
• Do archive files have structure?
a) Yes b) No
• To have structure means to have parts and organized position of parts. Archive file parts are files.
Answer: Archive Files
• Do archive files have structure?
a) Yes b) No
• Double clickon archive file opens unarchive‐application program that decodes the structure code and shows parts.
Executable Files
• Executable files containing binary
encoded object code (Target object is the
) h l d b
CPU) have special structure interpreted by the loader program.
• In order to create process image in memory loader must follow the instructions in the header of the executable file.
• Since different systems have different
Object Code
File Header (Loading Data)
• Since different systems have different loaders, executable file structure is OS dependant.
• Example: UNIX
27
Question: Executable Program File
• Which program “understands” the internal organization of the executable program.
a) Program editor
b) E t bl l d
b) Executable program loader c) Program viewer
d) Program printer
Object Code
File Header (Loading Data)
Answer: Executable Program File
• Which program “understands” the internal organization of the executable program.
a) Program editor
b) E t bl l d
b) Executable program loader c) Program viewer
d) Program printer
Object Code
File Header (Loading Data)
• When you start a program using some UI shell, loader program copies program bits into the memory.y
Question: Executable/Program files
• What are the bits in the main body of the executable file representing?
a) Metadata instructions for loader program b) Executable instruction bits
c) Program configuration bits d) Program resource bits.
File Header
Main bits of a file (File body)
Answer: Executable/Program files
• What are the bits in the main body of the executable file representing?
a) Metadata instructions for loader program
Meta Data b) Executable instruction bits
c) Program configuration bits d) Program resource bits.
File Header
Metadata
Main bits of a file
Instruction code bits
31
Question: Executable/Program files
• What are the bits in the header of the executable file representing?
a) Metadata instructions for loader program b) Executable instruction bits
c) Program configuration bits d) File type identification
File Header
Main bits of a file (File body)
• What are the bits in the main body of the executable file representing?
a) Metadata instructions for loader program
Answer: Executable/Program files
b) Executable instruction bits c) Program configuration bits d) File type identification
Meta Data File Header
Loading Instructions
Main bits of a file
Instruction code bits
33
File Content and Related Applications
• File content structure (data code used) and the application program with the appropriate encoder/decoder association or matching can be implemented by the operating system:
Meta Data – Externallyon the UI shell level (Used in Windows), and – Internallyin the file header metadata section of the file itself,
(Used in UNIX/Linux/MacOS).
File Header
File bit‐code identifier
File Name
File bit‐code identifier
Data bits of a file Data bits of a file
Windows: File Content and Data Decoders
• Externally on the UI shell level files can be recognized as belonging to different applications by the:
• CLI file name extension, – fileX.pdf – fileY.html – fileZ.doc
• GUI File icon appearance,
35
Summary
• Electronic or soft documents are easier to organize and manage than paper documents.
The End
• All files are not created as equal!