• No results found

The rustar File Format

In document The Guile 100 Programs Project (Page 40-48)

3.4 Problem 4: tar file archives

3.4.2 The rustar File Format

First, I will describe our restricted ustar file format, which, I’m going to dub rustar for restricted ustar, just so that we’re clear that I’m talking about something more specific than the ustar format.

File Structure

A rustar file contains a set of logical records. Each logical record represents the contents of a file plus its metadata. The logical records appear sequentially in the file, one after another, and there is no global header in the file. At the end of the file is a footer.

Logical Records

Each logical record consists of two parts, a header segment, and the contents of the file a.k.a the data segment. Of these, only the header requires a detailed explanation.

Header

The header segment is a 512 byte block that contains metadata for a file. The block is broken up into 17 fields of fixed length. Each field contains data in one of three types.

Header Types

Here we describe the three types that can appear in a header. Each type has the annotation [N]. The N indicates that this field is a fixed-size that takes up N bytes.

1. rustar-string[N] is a fixed-width string that contains only the codepoints listed below. It is stored in the ASCII encoding, and, if necessary, is right padded with NULL bytes to ensure it occupies the whole of its N bytes. NULL bytes can only appear at the end of the string. The string need not end with NULL bytes if it fills the whole of its fixed witdh.

The list of allowed codepoints is

• U+20 to U+22

• U+25 to U+3F

• U+41 to U+5A

• U+5F

• U+61 to U+7A

• and U+00, but, U+00 can only be followed by more U+00.

2. rustar-0string[N] — note the ‘0’ — is a fixed-width string with the same format and restrictions as a rustar-string[N] but with an addition restriction. It must end with at least one NULL byte.

3. rustar-number[N] is an unsigned integer stored as a fixed-width string. The string contains the the text representation of the integer in octal format. The last byte (and only the last byte) of the string must be NULL. The string is left-padded with the ‘0’

character to ensure the number occupies the whole of its fixed width buffer.

For example, a rustar-number[8] field for the integer 10 will be the string “0000012”

followed by one byte of NULL. 12 octal equals 10 decimal.

Header Fields

The 17 fields in the 512 byte header block of a logical record are

Field Format Description

Name string[100] The filename by itself, with no directory information. The path separator character (U+2F), is not allowed.

Mode number[8] A bitfield of the permissions. See below.

UID number[8] The User ID of the file

GID number[8] The Group ID of the file

Size number[12] The length of the file in bytes

mtime number[12] The 32-bit integer modification time of the file.

Checksum number[8] 256 + the sum of all the bytes in this header except the checksum field.

Typeflag string[1] Always “0”.

Link name string[100] Always 100 bytes of NULL.

Magic 0string[6] The string “ustar” plus a NULL.

Version string[2] The string “00”.

uname 0string[32] The uname of the file.

gname 0string[32] The gname of the file

Dev-Major number[8] Always zero.

Dev-Minor number[8] Always zero.

Prefix string[155] Path information for this file. If this file has no additional path information, this is all NULL. Directory separation is represented by ‘/’ forward slash. The slash at the end is assumed, and should not be included ex-plicitly.1

Padding 0string[12] 12 bytes of NULL.

The mode bitfield is a standard permissions bitfield:

• 0x001 execute permission for ’other’

• 0x002 write permission for ’other’

• 0x004 read permission for ’other’

• 0x008 exeute permission for ’group’

• 0x010 write permission for ’group’

• 0x020 read permission for ’group’

• 0x040 execute permission for ’owner’

• 0x080 write permission for ’owner’

• 0x100 read permission for ’owner’

• 0x200 (unused)

• 0x400 if is setgid

• 0x800 if is setuid

Data

After the 512-byte header block, the binary contents of the file are stored. The data segment is NULL-padded so that it ends on a 512-byte block boundary.

Footer

The footer is 1024 bytes of NULL that appears at the end of the file.

1 For example: prefix “foo” + name “bar” forms “foo/bar”. Prefix “foo/” + name “bar” forms “foo//bar”.

Don’t do that.

The Archive Script

Jez Ng contributed a script that meets the above requirements quite nicely. One thing to note here is the use of the procedures cut and cute. These let you, in effect, pass a subset of the required parameters to a procedure. In a later call, you can add the remaining parameters to the procedure and then truly call it.

#! /usr/bin/env guile \ -e main -s

!#

(use-modules (rnrs bytevectors) (rnrs io ports)

(srfi srfi-1) ; map, reduce (srfi srfi-26) ; cut, cute (ice-9 format))

(define write-bytevector (cut put-bytevector (current-output-port) <...>)) (define block-size 512)

(define (cat)

(define bv (make-bytevector block-size 0))

(let ((read-count (get-bytevector-n! (current-input-port) bv 0 block-size))) (unless (eof-object? read-count)

(write-bytevector bv)

(unless (< read-count block-size) (cat))))) (define rustar-char-set (define (make-fixed-string length string) (let ((bv (make-bytevector length 0)))

(string-for-each-index (lambda (i)

(let ((c (string-ref string i))) (unless (valid-rustar-char? c)

(throw ’ustar-error "encountered invalid character")) (bytevector-u8-set! bv i (char->integer c))))

string)

bv))

(define (make-rustar-string length string) (if (<= (string-length string) length)

(make-fixed-string length string)

(throw ’ustar-error "’~a’ is too long for tar header" string))) (define (make-rustar-0string length string)

(if (< (string-length string) length) (make-fixed-string length string)

(throw ’ustar-error "’~a’ is too long for tar header" string))) (define (make-rustar-number length number)

(let* ((num (number->string number 8))

(padding (- length (string-length num) 1))) (if (>= padding 0)

(make-fixed-string length (string-append (make-string padding #\0) num)) (throw ’ustar-error "~a is too large for tar header" num))))

;; Unlike dirname, this doesn’t return "." for files in the cwd.

(define (raw-dirname path) (define st (lstat filename))

(unless (eq? (stat:type st) ’regular)

(throw ’ustar-error "Only regular files are supported")) (let* ((uid (stat:uid st))

(gid (stat:gid st))

; We only really need an a-list for the purposes of modifying

; checksum in-place. The other keys are not used. However, they do

; serve as documentation.

(header

‘((filename . ,(make-rustar-string 100 (basename filename))) (mode . ,(make-rustar-number 8 (stat:perms st)))

(uid . ,(make-rustar-number 8 uid)) (gid . ,(make-rustar-number 8 gid))

(size . ,(make-rustar-number 12 (stat:size st))) (mtime . ,(make-rustar-number 12 (stat:mtime st)))

(checksum . ,(make-bytevector 8 (char->integer #\space))) (typeflag . ,(make-rustar-string 1 "0"))

(link-name . ,(make-rustar-string 100 ""))

(magic . ,(make-rustar-0string 6 "ustar")) (version . ,(make-rustar-string 2 "00"))

(uname . ,(make-rustar-0string 32 (passwd:name (getpwuid uid)))) (gname . ,(make-rustar-0string 32 (group:name (getgrgid gid)))) (dev-major . ,(make-rustar-number 8 0))

(dev-minor . ,(make-rustar-number 8 0))

(path . ,(make-rustar-string 155 (raw-dirname filename))) (padding . ,(make-rustar-0string 12 ""))))

(sum (cut reduce + 0 <>))

(checksum (sum (map (compose sum bytevector->u8-list cdr) header)))) (set! header (assq-set! header ’checksum (make-rustar-number 8 checksum))) (for-each (compose write-bytevector cdr) header)))

(define (tar archive filenames) (with-output-to-file archive

(lambda ()

(for-each (lambda (filename)

(write-file-header filename)

(with-input-from-file filename cat #:binary #t)) filenames)

(write-bytevector (make-bytevector (* block-size 2) 0)))

#:binary #t)) (define (main args)

(define perror (cut format (current-error-port) <...>)) (define (system-error-handler . args)

(perror "error: ~a~%" (strerror (system-error-errno args))) (exit 1))

(define (ustar-error-handler . args) (perror "error: ")

(apply perror (cdr args)) (perror "~%")

(exit 1))

(catch ’ustar-error (lambda ()

(catch ’system-error

(cute tar (cadr args) (cddr args)) system-error-handler))

ustar-error-handler))

Later, Mark Weaver contributed a more featureful script that handles almost all of the capabilites of the ustar archive format. It does directories and links as well as files. Also, he uses a very common hack to allow longer path names. He puts whatever part of the path that will fit within the 100 character field for the filename. You can find his script in the appendix, See Section A.1 [ustar Archives], page 48.

4 Theme 2: Web 1.0

he second theme in this project is “Web 1.0”, where we’ll talk about interacting with the Internet as it existed in the 1990s.

The 1990s began with emergence of Gopher clients and servers. The Internet Gopher protocol visualized the world as a series of folders. The folders usually contained plain-text documents or media files likeGIFs or AU audio. This was before bothHTML and PDF, so mixing text and graphics in a single file wasn’t as common, and, if it did occur, it was in formats such as PostScript.

The HTTP-and-HTML-based internet is linked to the appearance of the NCSA Mosaic browser and theNCSA httpdserver. There were precursors, but, as a practical matter, 1993 was the beginning of theHTTP/HTMLweb.

But, in those days, before AJAXor Flash, most of the content was staticHTMLcontent or dynamic content created by CGI scripts. In this context, before the concept of cookies was developed in 1994, personalization of content for different users was not practical.

JavaScript appeared in Netscape Navigator 2.0 in 1995 and Internet Explorer 3.0, in late 1996, but, with incompatibilities between the two implementations. Before 1996, almost all content was static and generated on the server side. This early Web had a more strongly defined separation between client and server.

The early Web pages had stylistic quirks that are less common today. BeforeCSS2, Web page layouts were often created by using tables. Blinking text, animated GIFs, embedded MIDItunes were common.

By the end of the decade, Linux, Apache, MySQL, and PHPwere all quite functional.

Those programs, in conjunction with Perl, which first appeared in the 80s, became the building blocks of the famous LAMPstack. This free, open software stack allowed for some of the common types of interactivity to which we have become accustomed.

PHP used a model that allowed for rapid generation of Web pages, where code was embedded within otherwise staticHTMLweb pages. When those pages were requested, the embeddedPHPcode was run, and its output became HTMLcontent.

So, in our second theme we’ll imagine what the world would have been like if GUILE were part of the ecosystem that made up the 1990s Internet experience. Specifically, we’ll take a look at using Guile for

• on-the-fly evaluation of code embedded within HTMLdocuments

• the Internet Gopher protocol

CGI scripting

• the Linux Apache GUILE MySQL stack

• and the animated GIF format.

And away we go.

In document The Guile 100 Programs Project (Page 40-48)

Related documents