• No results found

Data section

In document GS1 EPC Tag Data Standard 1.6 (Page 180-184)

structure

I.8 Data section

4546

A Data section is always present in a Packed Object, except in the case of a Directory 4547

Packed Object or Directory Addendum Packed Object (which encode no data elements), 4548

the case of a Data Addendum Packed Object containing only Delete operations, and the 4549

case of a Packed Object that uses No-directory compaction (see I.7.1). When a Data 4550

section is present, it follows the Object Info section (and the Secondary ID and Aux 4551

Format sections, if present). Depending on the characteristics of the encoded IDs and 4552

data strings, the Data section may include one or both of two subsections in the following 4553

order: a Known-Length Numerics subsection, and an AlphaNumerics subsection. The 4554

following paragraphs provide detailed descriptions of each of these Data Section 4555

subsections. If all of the subsections of the Data section are utilized in a Packed Object, 4556

then the layout of the Data section is as shown in Figure I 8-1. 4557

Figure I 8-1: Maximum Structure of a Packed Objects Data section 4558

Known-Length Numeric subsection

AlphaNumeric subsection

A/N Header Bits Binary Data Segments 1st KLN Binary 2nd KLN Binary … Last KLN Binary Non- Num Base Bit(s) Prefix Bit, Prefix Run(s) Suffix Bit, Suffix Run(s) Char Map Ext’d. Num Binary Ext’d Non- Num Binary Base 10 Binary Non- Num Binary 4559

I.8.1 Known-length-Numerics subsection of the Data Section

4560

For always-numeric data strings, the ID table may indicate a fixed number of digits (this 4561

fixed-length information is not encoded in the Packed Object) and/or a variable number 4562

of digits (in which case the string’s length was encoded in the Aux Format section, as 4563

described above). When a single data item is specified in the FormatString column 4564

(see J.2.3) as containing a fixed-length numeric string followed by a variable-length 4565

string, the numeric string is encoded in the Known-length-numerics subsection and the 4566

alphanumeric string in the Alphanumeric subsection. 4567

The summation of fixed-length information (derived directly from the ID table) plus 4568

variable-length information (derived from encoded bits as just described) results in a 4569

“known-length entry” for each of the always-numeric strings encoded in the current 4570

Packed Object. Each all-numeric data string in a Packed Object (if described as all- 4571

numeric in the ID Table) is encoded by converting the digit string into a single Binary 4572

number (up to 160 bits, representing a binary value between 0 and (1048-1)). Figure K-1 4573

in Annex K shows the number of bits required to represent a given number of digits. If 4574

an all-numeric string contains more than 48 digits, then the first 48 are encoded as one 4575

160-bit group, followed by the next group of up to 48 digits, and so on. Finally, the 4576

Binary values for each all-numeric data string in the Object are themselves concatenated 4577

to form the Known-length-Numerics subsection. 4578

I.8.2 Alphanumeric subsection of the Data section

4579

The Alphanumeric (A/N) subsection, if present, encodes all of the Packed Object’s data 4580

from any data strings that were not already encoded in the Known-length Numerics 4581

subsection. If there are no alphanumeric characters to encode, the entire A/N subsection 4582

is omitted. The Alphanumeric subsection can encode any mix of digits and non-digit 4583

ASCII characters, or eight-bit data. The digit characters within this data are encoded 4584

separately, at an average efficiency of 4.322 bits per digit or better, depending on the 4585

character sequence. The non-digit characters are independently encoded at an average 4586

efficiency that varies between 5.91 bits per character or better (all uppercase letters), to a 4587

worst-case limit of 9 bits per character (if the character mix requires Base 256 encoding 4588

of non-numeric characters). 4589

An Alphanumeric subsection consists of a series of A/N Header bits (see I.8.2.1), 4590

followed by from one to four Binary segments (each segment representing data encoded 4591

in a single numerical Base, such as Base 10 or Base 30, see I.8.2.4), padded if necessary 4592

to complete the final byte (see I 8.2.5). 4593

I.8.2.1 A/N Header Bits 4594

The A/N Header Bits are defined as follows: 4595

• One or two Non-Numeric Base bits, as follows: 4596

• ‘0’ indicates that Base 30 was chosen for the non-numeric Base; 4597

• ‘10’ indicates that Base 74 was chosen for the non-numeric Base; 4598

• ‘11’ indicates that Base 256 was chosen for the non-numeric Base 4599

• Either a single ‘0’ bit (indicating that no Character Map Prefix is encoded), or a ‘1’ 4600

bit followed by one or more “Runs” of six Prefix bits as defined in I.8.2.3. 4601

• Either a single ‘0’ bit (indicating that no Character Map Suffix is encoded), or a ‘1’ 4602

bit followed by one or more “Runs” of six Suffix bits as defined in I.8.2.3. 4603

• A variable-length “Character Map” bit pattern (see I.8.2.2), representing the base of 4604

each of the data characters, if any, that were not accounted for by a Prefix or Suffix. 4605

I.8.2.2 Dual-base Character-map encoding 4606

Compaction of the ordered list of alphanumeric data strings (excluding those data strings 4607

already encoded in the Known-Length Numerics subsection) is achieved by first 4608

concatenating the data characters into a single data string (the individual string lengths 4609

have already been recorded in the Aux Format section). Each of the data characters is 4610

classified as either Base 10 (for numeric digits), Base 30 non-numerics (primarily 4611

uppercase A-Z), Base 74 non-numerics (which includes both uppercase and lowercase 4612

alphas, and other ASCII characters), or Base 256 characters. These character sets are 4613

fully defined in Annex K. All characters from the Base 74 set are also accessible from 4614

Base 30 via the use of an extra “shift” value (as are most of the lower 128 characters in 4615

the Base 256 set). Depending on the relative percentage of “native” Base 30 values vs. 4616

other values in the data string, one of those bases is selected as the more efficient choice 4617

for a non-numeric base. 4618

Next, the precise sequence of numeric and non-numeric characters is recorded and 4619

encoded, using a variable-length bit pattern, called a “character map,” where each ‘0’ 4620

represents a Base 10 value (encoding a digit) and each ‘1’ represents a value for a non- 4621

numeric character (in the selected base). Note that, (for example) if Base 30 encoding 4622

was selected, each data character (other than uppercase letters and the space character) 4623

needs to be represented by a pair of base-30 values, and thus each such data character is 4624

represented by a pair of ‘1’ bits in the character map. 4625

I.8.2.3 Prefix and Suffix Run-Length encoding 4626

For improved efficiency in cases where the concatenated sequence includes runs of six or 4627

more values from the same base, provision is made for optional run-length 4628

representations of one or more Prefix or Suffix “Runs” (single-base character sequences), 4629

which can replace the first and/or last portions of the character map. The encoder shall 4630

not create a Run that separates a Shift value from its next (shifted) value, and thus a Run 4631

always represents an integral number of source characters. 4632

An optional Prefix Representation, if present, consists of one or more occurrences of a 4633

Prefix Run. Each Prefix Run consists of one Run Position bit, followed by two Basis 4634

Bits, then followed by three Run Length bits, defined as follows: 4635

• The Run Position bit, if ‘0’, indicates that at least one more Prefix Run is encoded 4636

following this one (representing another set of source characters to the right of the 4637

current set). The Run Position bit, if ‘1’, indicates that the current Prefix Run is the 4638

last (rightmost) Prefix Run of the A/N subsection. 4639

• The first basis bit indicates a choice of numeric vs. non-numeric base, and the second 4640

basis bit, if ‘1’, indicates that the chosen base is extended to include characters from 4641

the “opposite” base. Thus, ‘00’ indicates a run-length-encoded sequence of base 10 4642

values; ‘01’ indicates a sequence that is primarily (but not entirely) digits, encoded in 4643

Base 13; ‘10’ indicates a sequence a sequence of values from the non-numeric base 4644

that was selected earlier in the A/N header, and ‘11’ indicates a sequence of values 4645

primarily from that non-numeric base, but extended to include digit characters as 4646

well. Note an exception: if the non-numeric base that was selected in the A/N header 4647

is Base 256, then the “extended” version is defined to be Base 40. 4648

• The 3-bit Run Length value assumes a minimum useable run of six same-base 4649

characters, and the length value is further divided by 2. Thus, the possible 3-bit Run 4650

Length values of 0, 1, 2, … 7 indicate a Run of 6, 8, 10, … 20 characters from the 4651

same base. Note that a trailing “odd” character value at the end of a same-base 4652

sequence must be represented by adding a bit to the Character Map. 4653

An optional Suffix Representation, if present, is a series of one or more Suffix Runs, each 4654

identical in format to the Prefix Run just described. Consistent with that description, note 4655

that the Run Position bit, if ‘1’, indicates that the current Suffix Run is the last 4656

(rightmost) Suffix Run of the A/N subsection, and thus any preceding Suffix Runs 4657

represented source characters to the left of this final Suffix Run. 4658

I.8.2.4 Encoding into Binary Segments 4659

Immediately after the last bit of the Character Map, up to four binary numbers are 4660

encoded, each representing all of the characters that were encoded in a single base 4661

system. First, a base-13 bit sequence is encoded (if one or more Prefix or Suffix Runs 4662

called for base-13 encoding). If present, this bit sequence directly represents the binary 4663

number resulting from encoding the combined sequence of all Prefix and Suffix 4664

characters (in that order) classified as Base 13 (ignoring any intervening characters not 4665

thus classified) as a single value, or in other words, applying a base 13 to Binary 4666

conversion. The number of bits to encode in this sequence is directly determined from 4667

the number of base-13 values being represented, as called for by the sum of the Prefix 4668

and Suffix Run lengths for base 13 sequences. The number of bits, for a given number of 4669

Base 13 values, is determined from the Figure in Annex K. Next, an Extended- 4670

NonNumeric Base segment (either Base-40 or Base 84) is similarly encoded (if any 4671

Prefix or Suffix Runs called for Extended-NonNumeric encoding). 4672

Next, a Base-10 Binary segment is encoded that directly represents the binary number 4673

resulting from encoding the sequence of the digits in the Prefix and/or character map 4674

and/or Suffix (ignoring any intervening non-digit characters) as a single value, or in other 4675

words, applying a base 10 to Binary conversion. The number of bits to encode in this 4676

sequence is directly determined from the number of digits being represented, as shown in 4677

Annex K. 4678

Immediately after the last bit of the Base-10 bit sequence (if any), a non-numeric (Base 4679

30, Base 74, or Base 256) bit sequence is encoded (if the character map indicates at least 4680

one non-numeric character). This bit sequence represents the binary number resulting 4681

from a base-30 to Binary conversion (or a Base-74 to Binary conversion, or a direct 4682

transfer of Base-256 values) of the sequence of non-digit characters in the data (ignoring 4683

any intervening digits). Again, the number of encoded bits is directly determined from 4684

the number of non-numeric values being represented, as shown in Annex K. Note that if 4685

Base 256 was selected as the non-Numeric base, then the encoder is free to classify and 4686

encode each digit either as Base 10 or as Base 256 (Base 10 will be more efficient, unless 4687

outweighed by the ability to take advantage of a long Prefix or Suffix). 4688

Note that an Alphanumeric subsection ends with several variable-length bit fields (the 4689

character map, and one or more Binary sections (representing the numeric and non- 4690

numeric Binary values). Note further that none of the lengths of these three variable- 4691

length bit fields are explicitly encoded (although one or two Extended-Base Binary 4692

segments may also be present, these have known lengths, determined from Prefix and/or 4693

Suffix runs). In order to determine the boundaries between these three variable-length 4694

fields, the decoder needs to implement a procedure, using knowledge of the remaining 4695

number of data bits, in order to correctly parse the Alphanumeric subsection. An 4696

example of such a procedure is described in Annex M. 4697

I.8.2.5 Padding the last Byte 4698

The last (least-significant) bit of the final Binary segment is also the last significant bit of 4699

the Packed Object. If there are any remaining bit positions in the last byte to be filled 4700

with pad bits, then the most significant pad bit shall be set to ‘1’, and any remaining less- 4701

significant pad bits shall be set to ‘0’. The decoder can determine the total number of 4702

non-pad bits in a Packed Object by examining the Length Section of the Packed Object 4703

(and if the Pad Indicator bit of that section is ‘1’, by also examining the last byte of the 4704

Packed Object). 4705

In document GS1 EPC Tag Data Standard 1.6 (Page 180-184)