structure
I.8 Data section
4546
A Data section is always present in a Packed Object, except in the case of a Directory 4547
Packed Object or Directory Addendum Packed Object (which encode no data elements), 4548
the case of a Data Addendum Packed Object containing only Delete operations, and the 4549
case of a Packed Object that uses No-directory compaction (see I.7.1). When a Data 4550
section is present, it follows the Object Info section (and the Secondary ID and Aux 4551
Format sections, if present). Depending on the characteristics of the encoded IDs and 4552
data strings, the Data section may include one or both of two subsections in the following 4553
order: a Known-Length Numerics subsection, and an AlphaNumerics subsection. The 4554
following paragraphs provide detailed descriptions of each of these Data Section 4555
subsections. If all of the subsections of the Data section are utilized in a Packed Object, 4556
then the layout of the Data section is as shown in Figure I 8-1. 4557
Figure I 8-1: Maximum Structure of a Packed Objects Data section 4558
Known-Length Numeric subsection
AlphaNumeric subsection
A/N Header Bits Binary Data Segments 1st KLN Binary 2nd KLN Binary … Last KLN Binary Non- Num Base Bit(s) Prefix Bit, Prefix Run(s) Suffix Bit, Suffix Run(s) Char Map Ext’d. Num Binary Ext’d Non- Num Binary Base 10 Binary Non- Num Binary 4559
I.8.1 Known-length-Numerics subsection of the Data Section
4560
For always-numeric data strings, the ID table may indicate a fixed number of digits (this 4561
fixed-length information is not encoded in the Packed Object) and/or a variable number 4562
of digits (in which case the string’s length was encoded in the Aux Format section, as 4563
described above). When a single data item is specified in the FormatString column 4564
(see J.2.3) as containing a fixed-length numeric string followed by a variable-length 4565
string, the numeric string is encoded in the Known-length-numerics subsection and the 4566
alphanumeric string in the Alphanumeric subsection. 4567
The summation of fixed-length information (derived directly from the ID table) plus 4568
variable-length information (derived from encoded bits as just described) results in a 4569
“known-length entry” for each of the always-numeric strings encoded in the current 4570
Packed Object. Each all-numeric data string in a Packed Object (if described as all- 4571
numeric in the ID Table) is encoded by converting the digit string into a single Binary 4572
number (up to 160 bits, representing a binary value between 0 and (1048-1)). Figure K-1 4573
in Annex K shows the number of bits required to represent a given number of digits. If 4574
an all-numeric string contains more than 48 digits, then the first 48 are encoded as one 4575
160-bit group, followed by the next group of up to 48 digits, and so on. Finally, the 4576
Binary values for each all-numeric data string in the Object are themselves concatenated 4577
to form the Known-length-Numerics subsection. 4578
I.8.2 Alphanumeric subsection of the Data section
4579
The Alphanumeric (A/N) subsection, if present, encodes all of the Packed Object’s data 4580
from any data strings that were not already encoded in the Known-length Numerics 4581
subsection. If there are no alphanumeric characters to encode, the entire A/N subsection 4582
is omitted. The Alphanumeric subsection can encode any mix of digits and non-digit 4583
ASCII characters, or eight-bit data. The digit characters within this data are encoded 4584
separately, at an average efficiency of 4.322 bits per digit or better, depending on the 4585
character sequence. The non-digit characters are independently encoded at an average 4586
efficiency that varies between 5.91 bits per character or better (all uppercase letters), to a 4587
worst-case limit of 9 bits per character (if the character mix requires Base 256 encoding 4588
of non-numeric characters). 4589
An Alphanumeric subsection consists of a series of A/N Header bits (see I.8.2.1), 4590
followed by from one to four Binary segments (each segment representing data encoded 4591
in a single numerical Base, such as Base 10 or Base 30, see I.8.2.4), padded if necessary 4592
to complete the final byte (see I 8.2.5). 4593
I.8.2.1 A/N Header Bits 4594
The A/N Header Bits are defined as follows: 4595
• One or two Non-Numeric Base bits, as follows: 4596
• ‘0’ indicates that Base 30 was chosen for the non-numeric Base; 4597
• ‘10’ indicates that Base 74 was chosen for the non-numeric Base; 4598
• ‘11’ indicates that Base 256 was chosen for the non-numeric Base 4599
• Either a single ‘0’ bit (indicating that no Character Map Prefix is encoded), or a ‘1’ 4600
bit followed by one or more “Runs” of six Prefix bits as defined in I.8.2.3. 4601
• Either a single ‘0’ bit (indicating that no Character Map Suffix is encoded), or a ‘1’ 4602
bit followed by one or more “Runs” of six Suffix bits as defined in I.8.2.3. 4603
• A variable-length “Character Map” bit pattern (see I.8.2.2), representing the base of 4604
each of the data characters, if any, that were not accounted for by a Prefix or Suffix. 4605
I.8.2.2 Dual-base Character-map encoding 4606
Compaction of the ordered list of alphanumeric data strings (excluding those data strings 4607
already encoded in the Known-Length Numerics subsection) is achieved by first 4608
concatenating the data characters into a single data string (the individual string lengths 4609
have already been recorded in the Aux Format section). Each of the data characters is 4610
classified as either Base 10 (for numeric digits), Base 30 non-numerics (primarily 4611
uppercase A-Z), Base 74 non-numerics (which includes both uppercase and lowercase 4612
alphas, and other ASCII characters), or Base 256 characters. These character sets are 4613
fully defined in Annex K. All characters from the Base 74 set are also accessible from 4614
Base 30 via the use of an extra “shift” value (as are most of the lower 128 characters in 4615
the Base 256 set). Depending on the relative percentage of “native” Base 30 values vs. 4616
other values in the data string, one of those bases is selected as the more efficient choice 4617
for a non-numeric base. 4618
Next, the precise sequence of numeric and non-numeric characters is recorded and 4619
encoded, using a variable-length bit pattern, called a “character map,” where each ‘0’ 4620
represents a Base 10 value (encoding a digit) and each ‘1’ represents a value for a non- 4621
numeric character (in the selected base). Note that, (for example) if Base 30 encoding 4622
was selected, each data character (other than uppercase letters and the space character) 4623
needs to be represented by a pair of base-30 values, and thus each such data character is 4624
represented by a pair of ‘1’ bits in the character map. 4625
I.8.2.3 Prefix and Suffix Run-Length encoding 4626
For improved efficiency in cases where the concatenated sequence includes runs of six or 4627
more values from the same base, provision is made for optional run-length 4628
representations of one or more Prefix or Suffix “Runs” (single-base character sequences), 4629
which can replace the first and/or last portions of the character map. The encoder shall 4630
not create a Run that separates a Shift value from its next (shifted) value, and thus a Run 4631
always represents an integral number of source characters. 4632
An optional Prefix Representation, if present, consists of one or more occurrences of a 4633
Prefix Run. Each Prefix Run consists of one Run Position bit, followed by two Basis 4634
Bits, then followed by three Run Length bits, defined as follows: 4635
• The Run Position bit, if ‘0’, indicates that at least one more Prefix Run is encoded 4636
following this one (representing another set of source characters to the right of the 4637
current set). The Run Position bit, if ‘1’, indicates that the current Prefix Run is the 4638
last (rightmost) Prefix Run of the A/N subsection. 4639
• The first basis bit indicates a choice of numeric vs. non-numeric base, and the second 4640
basis bit, if ‘1’, indicates that the chosen base is extended to include characters from 4641
the “opposite” base. Thus, ‘00’ indicates a run-length-encoded sequence of base 10 4642
values; ‘01’ indicates a sequence that is primarily (but not entirely) digits, encoded in 4643
Base 13; ‘10’ indicates a sequence a sequence of values from the non-numeric base 4644
that was selected earlier in the A/N header, and ‘11’ indicates a sequence of values 4645
primarily from that non-numeric base, but extended to include digit characters as 4646
well. Note an exception: if the non-numeric base that was selected in the A/N header 4647
is Base 256, then the “extended” version is defined to be Base 40. 4648
• The 3-bit Run Length value assumes a minimum useable run of six same-base 4649
characters, and the length value is further divided by 2. Thus, the possible 3-bit Run 4650
Length values of 0, 1, 2, … 7 indicate a Run of 6, 8, 10, … 20 characters from the 4651
same base. Note that a trailing “odd” character value at the end of a same-base 4652
sequence must be represented by adding a bit to the Character Map. 4653
An optional Suffix Representation, if present, is a series of one or more Suffix Runs, each 4654
identical in format to the Prefix Run just described. Consistent with that description, note 4655
that the Run Position bit, if ‘1’, indicates that the current Suffix Run is the last 4656
(rightmost) Suffix Run of the A/N subsection, and thus any preceding Suffix Runs 4657
represented source characters to the left of this final Suffix Run. 4658
I.8.2.4 Encoding into Binary Segments 4659
Immediately after the last bit of the Character Map, up to four binary numbers are 4660
encoded, each representing all of the characters that were encoded in a single base 4661
system. First, a base-13 bit sequence is encoded (if one or more Prefix or Suffix Runs 4662
called for base-13 encoding). If present, this bit sequence directly represents the binary 4663
number resulting from encoding the combined sequence of all Prefix and Suffix 4664
characters (in that order) classified as Base 13 (ignoring any intervening characters not 4665
thus classified) as a single value, or in other words, applying a base 13 to Binary 4666
conversion. The number of bits to encode in this sequence is directly determined from 4667
the number of base-13 values being represented, as called for by the sum of the Prefix 4668
and Suffix Run lengths for base 13 sequences. The number of bits, for a given number of 4669
Base 13 values, is determined from the Figure in Annex K. Next, an Extended- 4670
NonNumeric Base segment (either Base-40 or Base 84) is similarly encoded (if any 4671
Prefix or Suffix Runs called for Extended-NonNumeric encoding). 4672
Next, a Base-10 Binary segment is encoded that directly represents the binary number 4673
resulting from encoding the sequence of the digits in the Prefix and/or character map 4674
and/or Suffix (ignoring any intervening non-digit characters) as a single value, or in other 4675
words, applying a base 10 to Binary conversion. The number of bits to encode in this 4676
sequence is directly determined from the number of digits being represented, as shown in 4677
Annex K. 4678
Immediately after the last bit of the Base-10 bit sequence (if any), a non-numeric (Base 4679
30, Base 74, or Base 256) bit sequence is encoded (if the character map indicates at least 4680
one non-numeric character). This bit sequence represents the binary number resulting 4681
from a base-30 to Binary conversion (or a Base-74 to Binary conversion, or a direct 4682
transfer of Base-256 values) of the sequence of non-digit characters in the data (ignoring 4683
any intervening digits). Again, the number of encoded bits is directly determined from 4684
the number of non-numeric values being represented, as shown in Annex K. Note that if 4685
Base 256 was selected as the non-Numeric base, then the encoder is free to classify and 4686
encode each digit either as Base 10 or as Base 256 (Base 10 will be more efficient, unless 4687
outweighed by the ability to take advantage of a long Prefix or Suffix). 4688
Note that an Alphanumeric subsection ends with several variable-length bit fields (the 4689
character map, and one or more Binary sections (representing the numeric and non- 4690
numeric Binary values). Note further that none of the lengths of these three variable- 4691
length bit fields are explicitly encoded (although one or two Extended-Base Binary 4692
segments may also be present, these have known lengths, determined from Prefix and/or 4693
Suffix runs). In order to determine the boundaries between these three variable-length 4694
fields, the decoder needs to implement a procedure, using knowledge of the remaining 4695
number of data bits, in order to correctly parse the Alphanumeric subsection. An 4696
example of such a procedure is described in Annex M. 4697
I.8.2.5 Padding the last Byte 4698
The last (least-significant) bit of the final Binary segment is also the last significant bit of 4699
the Packed Object. If there are any remaining bit positions in the last byte to be filled 4700
with pad bits, then the most significant pad bit shall be set to ‘1’, and any remaining less- 4701
significant pad bits shall be set to ‘0’. The decoder can determine the total number of 4702
non-pad bits in a Packed Object by examining the Length Section of the Packed Object 4703
(and if the Pad Indicator bit of that section is ‘1’, by also examining the last byte of the 4704
Packed Object). 4705