This work is licensed under GFDL 1.3+ (GFDL 1.3+).
Table of Contents
1
Preface . . . .
1
2
Acknowledgements . . . .
2
3
Theme 1: “/bin” . . . .
3
3.1 Problem 1: Echo and Cat . . . 3
3.1.1 Echo . . . 3
3.1.2 Cat . . . 7
3.2 Problem 2: ‘ls’ . . . 11
3.2.1 An Implementation of ‘ls’ . . . 14
3.3 Problem 3: LZW Compression . . . 18
3.4 Problem 4: tar file archives . . . 36
3.4.1 ustar Script . . . 36
3.4.2 The rustar File Format . . . 36
4
Theme 2: Web 1.0 . . . .
43
4.1 Problem 5: PHP-Style GUILE . . . 44
4.2 Problem 6: MySQL . . . 46
4.3 Problem 7: Animated GIF Badges . . . 46
Appendix A
Other Examples . . . .
48
A.1 ustar Archives . . . 48
5
References . . . .
56
1 Preface
his book aspires to be a useful set of examples about how one might use GNUGuile. One of the interesting things about the Scheme community is that they are perhaps too clever. The depth and complexity of their thinking about computer languages is intense and wonderful.
And yet, some times you just want to do something mundane. Where are the resources for how to use Scheme – and specifically Guile – for quotidian tasks?
2 Acknowledgements
hanks the many people who have helped us develop this book.
• Chris K Jester-Young contributed the original version of the echo and cat scripts for Problem 1.
• Jez Ng contributed the original version of ls for Problem 2. He also contributed an example ustar generation script for Problem 4.
• Daniel Harwig contributed the LZW compression routines for Problem 3.
3 Theme 1: “/bin”
very project has to start somewhere, so we may as well begin at the beginning. Guile can be used as a scripting language. Programs can be written as plain text files, and then run from the command line by using the Guile interpreter. As such, most scripts run on Unix-like shells will begin with a sha-bang #! invocation. And most scripts must start off doing the same chores: parsing the command line, acting on the options, and finding the files whose names appeared in the command-line arguments.
To introduce these mundane concepts, our first theme is /bin, e.g. re-implementing some common Unix tools. This will get us warmed up.
These examples should demonstrate
• How to set up the sha-bang invocation for Guile scripts run from Unix shells. • How to handle command line arguments
• How to map file names given as command line arguments to their files • How to search for files and directories
• How to open files, both as binary data and as encoded text data
To demonstrate some of these concepts, in the following sections you will find echo script that prints out its own arguments; cat which concatenates files or standard input to the standard output; ls which lists the files in a directory. There is also compress and uncompress which perform LZW compression on a file. And lastly there are scripts to generate tar-conformant archives.
And so, without further ado, here are the examples.
3.1 Problem 1: Echo and Cat
In this problem, two venerable Unix commands are re-implemented in Scheme: echo and cat. echo prints out the command-line arguments, and cat prints a file to the terminal.
In this problem, like in many of the problems, we’ll lay out the requirements for a program, and then see how our volunteer implemented the requirements. For the purpose of this exercise, the requirements for echo and cat with be drawn from the Posix standard1,
with a couple of minor modifications. Since these commands are implemented in different ways on different systems, a specification is given for the versions implemented here.
3.1.1 Echo
The echo script writes its arguments to the standard output, followed by a <newline>. If there are no arguments, it just prints a <newline>.
echo has no command-line options. Even ‘--help’ and ‘--version’ are not treated as command-line options.
If any of the arguments contain the backslash character (\), the argument is modified. Backslash introduces an escape. These escapes are parsed from logical left to right.
1
\a Write an <alert> in place of \a. \b Write a <backspace> in place of \b.
\c Suppress the <newline> that would otherwise be written after the command-line arguments. The \c is not written, any remaining characters in this argument are not written, and any remaining arguments are not written.
\f Write a <form-feed> in place of \f. \n Write a <newline> in place of \n.
\r Write a <carriage-return> in place of \r. \t Write a <tab> in place of \t
\v Write a <vertical-tab> in place of \v.
\\ Write a single backslash character in place of the pair of backslash characters. \0num Write an 8-bit character corresponding to num, an octal number between octal
0 and octal 377 (decimal 255) inclusive.
A backslash at the end of a command line argument will not be escaped. The backslash will be written. However, the exit value will be 1 in this case.
A backslash followed by any other character not listed in the table, will will not be escaped. The backslash will be written, and the character that follows it will be written. However, in this case, the exit value will be 1.
For the octal escape \0, it is important to note that this value is not an ISO-8859-x position or a Unicode code point, but, rather a raw 8-bit byte to be sent unencoded to the standard output. It is up to the operator, not echo, to ensure that a character sequence that is valid for the environments locale is being sent.
If a \0 escape is present, but is not followed by an number, the raw byte zero is written. If a \0 escape is present and is followed by an octal number of greater than 3 digits, only the first 3 digits will be interpreted as being part of the escape.
If a \0 escape is present and its octal value is greater than 377, print nothing. In this case, the exit value will be 1.
An octal escape may not have unnecessary initial zeros. For example • \01 should output raw byte 1
• \001 should output raw byte zero followed by the string “01” • \0001 should output raw byte zero followed by the string “001”
The digits 8 and 9 are not part of an octal escape. For example, the string \018 shall be output as the raw byte 1 followed by the character for the numeral 8.
Remember that command-line arguments and file names may contain any character allowed by the current locale.
An implementation of ‘echo’
Chris K Jester-Young wrote the original solution to this problem. #!/usr/bin/guile \
-e main -s !#
(use-modules (ice-9 binary-ports))
;; The exit code for the program: #t == exit code 0, #f == exit code 1 (define status #t)
(define (main args) (setlocale LC_ALL "")
;; Recursively loop over the list of command-line arguments (let loop ((args (cdr args))
(first-arg #t)) (cond ((null? args)
(newline) (quit status)) (else
(unless first-arg
(write-char #\space)) (let ((arg (car args)))
;; Take the current command-line argument and create a ;; port from that argument. Pass that port as input to ;; the procedure ‘initial’.
(call-with-input-string arg initial) (loop (cdr args) #f))))))
;; ‘initial’ and ‘echo’ jointly form a recursive loop that reads ;; characters one-by-one from the port and writes them to stdout. ;; Backslash may introduce a string escape that needs special ;; processing.
(define (echo ch port) (write-char ch) (initial port)) (define (initial port)
(define ch (read-char port)) (cond ((eqv? ch #\\)
(backslash port)) ((not (eof-object? ch))
(echo ch port))))
(define (backslash port)
(define ch (read-char port)) (case ch
((#\a) (echo #\alarm port)) ((#\b) (echo #\backspace port)) ((#\c) (quit status))
((#\f) (echo #\page port)) ((#\n) (echo #\newline port)) ((#\r) (echo #\return port)) ((#\t) (echo #\tab port)) ((#\v) (echo #\vtab port)) ((#\\) (echo #\\ port))
((#\0) (let ((next (peek-char port))) (if (and (assv next octal-digits)
(not (char=? next #\0))) (octal port)
(echo #\nul port)))) (else (set! status #f)
(write-char #\\)
(unless (eof-object? ch) (unread-char ch port) (initial port)))))
;; Backslash 0 introduces the octal escape. Zero to three octal ;; numbers are read and output as a raw (not locale encoded) byte. (define (octal port)
(let loop ((value 0) (waiting 3)) (cond ((zero? waiting)
(if (< value 256)
(put-u8 (current-output-port) value) (set! status #f))
(initial port))
(else (let ((ch (read-char port))) (cond ((eof-object? ch)
(loop value 0))
((assv ch octal-digits) => (lambda (ass)
3.1.2 Cat
Again, since cat is implemented differently on different systems, a specification of what we were trying to accomplish is given here.
cat [OPTION]... [FILE]...
cat concatenates files or standard input and prints it to the standard output.
This version of cat supports three command-line options, each with a short and a long form.
‘-u --unbuffered’
Do no buffering. Write bytes from the input to the standard output without delay as each character is read.
‘-h --help’
Print out command help. ‘-v --version’
Print out the program name and version number.
After the command-line options, a list of file names is expected. The contents of the files are printed to standard output. No character encoding or decoding of the contents of the files should be performed: they should be transmitted unmodified.
If the special file name ‘-’ (hyphen) is given, at that point the contents of the standard input will be transmitted to the standard output.
If one of the files does not exist, or if it cannot be opened, the program will print a descriptive error message to the standard error and will return the exit code 1.
An implementation of cat
Chris K Jester-Young wrote the original solution for cat as well. One interesting thing to note in this example is the use of catch to catch system errors that may arise if files do not exist or cannot be opened.
#!/usr/bin/guile \ -e main -s
!#
(use-modules (srfi srfi-1)
(ice-9 binary-ports) (ice-9 format)
(ice-9 getopt-long))
;; The exit code of the script: #t == exit code 0, #f == 1 (define status #t)
(define (main args)
(define opts (getopt-long args (get-getopt-options))) ;; Handle the unbuffered flag
(when (assq ’unbuffered opts)
(setvbuf (current-output-port) _IONBF)) (let ((files (assq-ref opts ’())))
(if (null? files) (cat)
(for-each (lambda (file)
;; If a filename is "-" get text from stdin (if (string=? file "-")
(cat)
(cat file))) files))
(catch ’system-error force-output write-error-handler) (quit status)))
(define cat (case-lambda
;; When called with no arguments, get data from stdin (()
(catch ’system-error cat-port (read-error-handler "stdin"))) ;; When called with one argument, read data from a file ((file)
(catch ’system-error
(lambda () call-with-input-file file cat-port) (read-error-handler file)))))
(define bv (get-bytevector-some in)) (unless (eof-object? bv)
(catch ’system-error (lambda () put-bytevector out bv) write-error-handler) (cat-port in out)))
;; An error handler that catches system errors receives a list ;; containing the errno.
(define (read-error-handler label) (lambda args
(perror label (system-error-errno args)) (set! status #f)))
(define (write-error-handler . args)
(perror "write error" (system-error-errno args))
;; Don’t try to flush buffers at exit, since it’d obviously fail. (primitive-_exit 1))
(define (perror label errno)
(format (current-error-port) "cat: ~a: ~a~%" label (strerror errno))) (define (help _)
(display "Usage: cat [OPTION]... [FILE]...\n")
(display "Concatenate FILE(s), or standard input, to standard output.\n") (newline)
(for-each (lambda (option)
(format #t " -~a, --~16a ~a~%"
(cadr (assq ’single-char (cdr option))) (car option)
(cadr (assq ’description (cdr option))))) getopt-options)
(quit))
(define (version _)
(display "cat 0.1, for Guile100\n") (quit))
(define (get-getopt-options)
;; getopt-long doesn’t like extraneous option properties, so filter out (map (lambda (option)
(remove (lambda (prop)
(and (pair? prop) (eq? (car prop) ’description))) option))
getopt-options))
;; Here is a list of all the command-line options (define getopt-options
(description "do not buffer standard output")) (help (single-char #\h) (value #f) (predicate ,help)
(description "display this help and exit"))
3.2 Problem 2: ‘ls’
In this section, we investigate the most famous Unix command of all time: ls. ls lists files or directories, and displays their properties.
However, ls has accumulated dozens of options over the past decades. A feature-complete ls would be too long to make a usable example. So, this script is constrained to the most important command-line options.
The command ls lists information about files, directories, and the contents of directories. Basically, for this challenge, the script should operate like a limited functionality version of Posix ls1.
The Requirements for a Limited ls
This script only recognizes a limited set of command-line options:
• ‘-a’ - display all matching files, including those whose name begins with a period • ‘-l’ - use the long output format
• ‘-R’ - recursively descend into subdirectories
Any other command-line arguments that begin with a hyphen should cause an “invalid option” error, and the program will be terminated with a non-zero exit code.
The command-line option ‘-R’ will recursively print the contents of any subdirectory encountered.
The command-line option ‘-l’ has two effects. One, information about the files will be printed in the long format. Two, when given a symbolic link to a directory, the command will print information about the symbolic link itself and not the file or directory to which it points.
Operands
If a command-line argument does not begin with a hyphen, it is treated as an operand. When called without operands, the contents of the current directory are printed. Operands must be either the names of files, directories, or symbolic links. When an operand that is not one of the above is encountered, the script should print a descriptive error and exit with a non-zero return code.
If an operand is a file, ls will print the name of the file. If an operand is a symbolic link to a file, the command will print the name of the link. If an operand is a directory, ls will print out the contents of that directory. If an operand is a symbolic link to a directory, ls will print the contents of that directory, unless the ‘-l’ is given.
When printing the contents of a directory, files and directories that begin with <period> are usually not printed. If the command-line option ‘-a’ is given, files and directories that begin with <period> are printed.
1
Output
There are two output formats: the default format and the long format.
Within each directory, the files are sorted in case-insensitive alphabetical order according to the current locale.
In the default format, the filenames are output one per line. You can print them out in a columnar format if you like, though.
In the long format, the file information will be printed as follows
Field Length Description
Type 1 ‘d’ for directory
‘-’ for regular file ‘b’ for block special file ‘l’ for symbolic link
‘c’ for character special file ‘p’ for fifo
User Read 1 ‘r’ if readable by the owner ‘-’ otherwise
User Write 1 ‘w’ if twritable by the owner ‘-’ otherwise
User Execute 1 ‘S’ if the file is not executable and the set-user-ID mode is set
‘s’ if the file is executable and the set-user-ID mode is set ‘x’ if the file is executable or the directory is searchable by the owner
‘-’ otherwise
Group Read 1 ‘r’ if readable by the group ‘-’ otherwise
Group Write 1 ‘w’ if writable by the group ‘-’ otherwise
Group Execute 1 ‘S’ if the file is not executable and the set-group-ID mode is set
‘s’ if the file is executable and the set-group-ID mode is set ‘x’ if the file is exectuable or the directory is searchable by members of this group
‘-’ otherwise
Other Read 1 ‘r’ if readable by others ‘-’ otherwise
Other Execute 1 + space
‘T’ if the file is a directory and the search permission is not granted to others and the restricted deletion flag is set ‘t’ if the file is a directory and the search permission is granted to others and the restricted deletion flag is set ‘x’ if the file is executable or the directory is searchable by others
‘-’ otherwise
Link Count For a directory, number of immediate subdirectories it has plus one for itself plus one for its parent. The link count for a file is one.
Owner Name Group Name
File Size in bytes
Date & Time “month day hour:sec” format if the file has been modified in the last six months, or “month day year” format otherwise
Pathname For non-links, the path
For links, “<link name> -> <path to linked file or directory>” The exit code should be zero except in those error cases described above.
3.2.1 An Implementation of ‘ls’
Jez Ng contributed a script to these specifications. It is an interesting solution.
One thing to note is how he has decided to truly minimize the scope of the procedures by declaring procedures within procedures.
Unsurprisingly, the majority of the script involves getting the format right for long output.
#! /usr/local/bin/guile -s !#
;; A solution to Guile 100 Problem #2 ‘ls’ ;; Contributed by Jez Ng.
(use-modules (srfi srfi-1) ; fold, map etc
(srfi srfi-26) ; cut (partial application) (srfi srfi-37) ; args-fold
(ice-9 ftw) (ice-9 format) (ice-9 i18n))
(define perror (cut format (current-error-port) <...>)) (define (default-printer path st . rest)
(format #t "~a~%" (basename path))) (define* (long-printer path st #:optional
(max-nlinks 0) (max-size 0)
(max-uname-length 0) (max-groupname-length 0)) (let*
((bits-set?
(lambda (bits . masks)
(let ((mask (apply logior masks))) (= mask (logand bits mask))))) (permission-string
(lambda (perms)
(rwx-letter (lambda (bit letter)
(if (bits-set? perms bit) letter #\-))) (setid-letter (lambda (exec-bit setid-bit letter)
(cond ((bits-set? perms exec-bit setid-bit) letter) ((bits-set? perms setid-bit)
(char-downcase letter))
(else (rwx-letter exec-bit #\x)))))) (string (rwx-letter owner-read-bit #\r)
(rwx-letter owner-write-bit #\w)
(setid-letter owner-exec-bit setuid-bit #\S) (rwx-letter group-read-bit #\r)
(rwx-letter group-write-bit #\w)
(setid-letter group-exec-bit setgid-bit #\S) (rwx-letter other-read-bit #\r)
(rwx-letter other-write-bit #\w)
(setid-letter other-exec-bit sticky-bit #\T))))) (format-time
(lambda (time)
(if (and (<= time (current-time))
(< (- (current-time) time) (* 3600 24 30 6))) (strftime "%b %e %H:%M" (localtime time))
(strftime "%b %e %_5Y" (localtime time))))) (type (case (stat:type st)
((directory) #\d) ((regular) #\-) ((symlink) #\l) ((block-special) #\b) ((char-special) #\c) ((fifo) #\p) (else #\?)))
(digits (lambda (n) (if (= n 0) 1 (1+ (inexact->exact (ceiling (log10 n)))))))) (format #t "~a~a ~vd ~va ~va ~vd ~a ~a\n"
type
(permission-string (stat:perms st)) (digits max-nlinks) (stat:nlink st)
max-uname-length (passwd:name (getpwuid (stat:uid st))) max-groupname-length (group:name (getgrgid (stat:gid st))) (digits max-size) (stat:size st)
(format-time (stat:mtime st)) (if (char=? type #\l)
(format #f "~a -> ~a" path (readlink path)) (basename path)))))
(define (ls-dir dir-name dir-stat recursive? all? print-header? printer) (let* ((not-hidden? (lambda (name) (not (string-prefix? "." name))))
(enter? (lambda (path st)
(= (stat:ino st) (stat:ino dir-stat)))))) (let recurse ((tree (file-system-tree dir-name enter?))
(parent-path ‘(,(dirname dir-name))) (top-level? #t))
;; ‘file-system-tree’ returns a structure of the form ;; (string basename, object stat, tree children) (let* ((path (cons (car tree) parent-path))
(path-string (string-join (reverse path) file-name-separator-string)) (children
(filter
(lambda (tree) (or all? (not-hidden? (car tree))))
(sort (let ((current-dir-path (in-vicinity path-string ".")) (parent-dir-path (in-vicinity path-string ".."))) (cons (list current-dir-path (lstat current-dir-path))
(cons (list parent-dir-path (lstat parent-dir-path)) (cddr tree))))
(lambda (a b) (string-locale-ci<? (car a) (car b)))))) ;; ‘max’ throws an error if called without arguments;
;; ‘max-above-0’ just returns 0
(max-above-0 (lambda args (apply max (cons 0 args)))) (stats (map cadr children))
(max-nlinks (apply max-above-0 (map stat:nlink stats))) (max-size (apply max-above-0 (map stat:size stats))) (max-uname-length
(apply max-above-0 (map (compose string-length passwd:name getpwuid stat:uid) stats))) (max-groupname-length
(apply max-above-0 (map (compose string-length group:name getgrgid stat:gid) stats)))) (if (or (not top-level?) print-header?) (format #t "~a:~%" path-string)) (for-each (lambda (child)
(printer
(in-vicinity path-string (car child)) (cadr child)
max-nlinks max-size max-uname-length max-groupname-length)) children)
(if recursive?
(for-each (lambda (child)
(if (and (eq? (stat:type (cadr child)) ’directory) (not (or (equal? (basename (car child)) ".")
(equal? (basename (car child)) "..")))) (recurse child path #f)))
children))))))
(let* ((program-name (car (program-arguments))) (make-bool-option
(option ‘(,flag) #f #f (lambda (opt name arg result) (acons opt-name #t result))))) ;; ‘getopt-long’ requires the long option name to be provided, ;; but the real ‘ls’ does not use long names. srfi-37 does not ;; have this restriction, so we use it instead.
(args (args-fold
(cdr (program-arguments))
(map make-bool-option ’(all? recursive? long?) ’(#\a #\R #\l)) (lambda (opt name arg result)
(perror "~a: illegal option -- ~a~%" program-name name) (perror "usage: ~a [-alR] [file ...]~%" program-name) (exit 1))
(lambda (opt result) (assq-set! result ’paths
(cons opt (assq-ref result ’paths)))) ’((paths))))
(paths (if (null? (assq-ref args ’paths)) ’(".") (assq-ref args ’paths))) (printer (if (assq-ref args ’long?) long-printer default-printer))
(ls-dir-cut (cut ls-dir <> <>
(assq-ref args ’recursive?) (assq-ref args ’all?) (> (length paths) 1) printer)) (exit-code 0)) (for-each (lambda (path) (catch ’system-error (lambda ()
(let ((st (lstat path))) (case (stat:type st)
((directory) (ls-dir-cut path st)) ((symlink) (if (assq-ref args ’long?)
(printer path st) (ls-dir-cut
(let ((linked-path (readlink path))) (if (absolute-file-name? linked-path)
linked-path
(in-vicinity (dirname path) linked-path))) (stat path))))
(else (printer path st))))) (lambda args
(perror "~a: ~a: ~a~%"
program-name path (strerror (system-error-errno args))) (set! exit-code 1)))) paths)
3.3 Problem 3: LZW Compression
Good old LZW compression: a nice problem in every CompSci’s undergraduate classes. Lempel-Ziv-Welch compression is the basis of both the UNIX Compress program and of GIF encoding.
The only problem with LZW is that it doesn’t actually to a very good job at compression, but, it is has an interesting logic and is familiar enough that it makes a good example.
This task has two parts.
• Write ‘compress’ and ‘uncompress’ procedures for LZW compression. • Use them to make ‘compress’ and ‘uncompress’ scripts.
First up are the compression procedures.
lzw-compress and lzw-uncompress
[Guile Procedure]
lzw-compress input-bv #:key table-size dictionary
This procedure should take a bytevector presumed to contain 8-bit unsigned integers, and it should return a bytevector containing 16-bit unsigned integers in little-endian format.
input-bv is the input bytevector.
table-size is an optional parameter that indicates the maximum number of entries in the dictionary. This parameter is limited to the range 258 - 65536. The default value of table-size is 65536.
dictionary is an optional parameter that modifies the output. When true, the proce-dure shall return both the output 16-bit bytevector as well as the hash table created by the compression routine that maps indices to codes.
Probably the best writup on LZW compression is the one by Mark Nelson over at
http://marknelson.us/2011/11/08/lzw-revisited/. Refer to that article for details on LZW compression.
It is possible to fill up the dictionary. In that case, one continues to use the dictionary as it is, without adding new entries.
As I’ve noted, we’re focussing on the problem of encoding 8-bit binary data. Thus, the first 256 entries in the dictionary – entries #0 to #255 – are initialized to 0 to 255. Entry #256 is not used in this example, but, it is usually reserved for a special code that empties the dictionary. Entries #257 to #(table-size - 1) contain the multi-byte entries in the dictionary.
[Guile Procedure]
lzw-uncompress input-bv #:key table-size dictionary
Similarly, this procedure takes input-bv the bytevector created by compress and an optional table size and returns the 8-bit unsigned bytevector of uncompressed data. dictionary, when true, causes the procedure to also return its dictionary or hash table. Daniel Hartwig contributed an implementation of these compression routines.
;; Copyright (C) 2013 Daniel Hartwig <[email protected]> ;;
;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see <http://www.gnu.org/licenses/>. (define-module (lzw)
#:use-module (rnrs bytevectors) #:use-module (rnrs io ports) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:use-module (ice-9 receive) #:export (lzw-compress
lzw-uncompress %lzw-compress %lzw-uncompress))
;; This procedure adapted from an example in the Guile Reference ;; Manual.
(define (make-serial-number-generator start end) (let ((current-serial-number (- start 1)))
(lambda ()
(and (< current-serial-number end)
(set! current-serial-number (+ current-serial-number 1)) current-serial-number))))
(define (put-u16 port k) ;; Little endian.
(put-u8 port (logand k #xFF))
(put-u8 port (logand (ash k -8) #xFF))) (define (get-u16 port)
;; Little endian. Order of evaluation is important, use ’let*’. (let* ((a (get-u8 port))
(b (get-u8 port)))
(if (any eof-object? (list a b)) (eof-object)
(define (%lzw-compress in out done? table-size) (let ((codes (make-hash-table table-size))
(next-code (make-serial-number-generator 0 table-size)) (universe (iota 256))
(eof-code #f))
;; Populate the initial dictionary with all one-element strings ;; from the universe.
(for-each (lambda (obj)
(hash-set! codes (list obj) (next-code))) universe)
(set! eof-code (next-code)) (let loop ((cs ’()))
(let ((c (in))) (cond ((done? c)
(unless (null? cs)
(out (hash-ref codes cs))) (out eof-code)
(values codes))
((hash-ref codes (cons c cs)) (loop (cons c cs)))
(else
(and=> (next-code)
(cut hash-set! codes (cons c cs) <>)) (out (hash-ref codes cs))
(loop (cons c ’())))))))) (define (ensure-bv-input-port bv-or-port)
(cond ((port? bv-or-port) bv-or-port)
((bytevector? bv-or-port)
(open-bytevector-input-port bv-or-port)) (else
(scm-error ’wrong-type-arg "ensure-bv-input-port" "Wrong type argument in position ~a: ~s" (list 1 bv-or-port) (list bv-or-port))))) (define (for-each-right proc lst)
(let loop ((lst lst)) (unless (null? lst)
(loop (cdr lst)) (proc (car lst)))))
(define (open-bit-output-port bits-per-entry) (let ((current 0)
(lambda ()
(open-bytevector-output-port)) (lambda (port get-bytevector)
(let ((write-to-bv (lambda (val)
;; (format #t "Entering write-to-bv: current ~a location ~a val ~a bpe ~a~%" current location val bits-per-entry) (set! current (logior current (ash val location)))
(set! location (+ location bits-per-entry)) (while (> location 8)
;; (format #t "Writing ~a~%" (logand current #xff)) (put-u8 port (logand current #xff))
(set! current (ash current -8)) (set! location (- location 8)))
;; (format #t "Leaving write-to-bv: current ~a location ~a~%" current location) ))
(get-bv (lambda ()
(put-u8 port current) (get-bytevector)))) (values write-to-bv get-bv)))))) (define (open-bit-input-port bv bits-per-entry)
(let ((current 0) (location 0) (eof #f)) (call-with-values (lambda () (open-bytevector-input-port bv)) (lambda (port)
;; Return the read procedure, which begins here (lambda ()
;; (format #t "Entering read-from-bv: current ~x location ~a~%" current location) (let loop ((u8 (get-u8 port)))
;; (format #t "Read ~a~%" u8) (if (eof-object? u8)
(if (> location 0) (begin
(let ((output (bit-extract current 0 bits-per-entry))) (set! current (ash current (- bits-per-entry))) (set! location (- location bits-per-entry))
;; (format #t "EOF Leaving read-from-bv: current ~x location ~a output ~x~%" current location output) output))
(begin
;; (format #t "EOF Leaving read-from-bv: <eof>~%") (eof-object)))
;; else (begin
(set! location (+ location 8)) (if (< location bits-per-entry)
(begin
;; (format #t "Looping in read-from-bv: current ~x location ~a~%" current location) (loop (get-u8 port)))
;; else
(let ((output (bit-extract current 0 bits-per-entry))) (set! current (ash current (- bits-per-entry))) (set! location (- location bits-per-entry))
;; (format #t "Leaving read-from-bv: current ~x location ~a output ~x~%" current location output) output))))))))))
#!
(lambda ()
(format #t "Entering read-from-bv: current ~x location ~a~%" current location) (if eof
(eof-object) ;;else
(begin
(while (< location bits-per-entry)
(format #t "Looping in read-from-bv: current ~x location ~a~%" current location) (let ((u8 (get-u8 port)))
(format #t "Read ~a~%" u8) (if (eof-object? u8)
(begin
(set! eof #t) (break)) ;; else (begin
(set! current (logior current (ash u8 location))) (set! location (+ location 8))))))
(format #t "After loop in read-from-bv: current ~x location ~a~%" current location) (let ((output (bit-extract current 0 bits-per-entry)))
(set! current (ash current (- bits-per-entry))) (set! location (- location bits-per-entry))
(format #t "Leaving read-from-bv: current ~x location ~a output ~x~%" current location output) output))))))))
!#
(define (%lzw-uncompress in out done? table-size) (let ((strings (make-hash-table table-size))
(next-code (make-serial-number-generator 0 table-size)) (universe (iota 256))
(eof-code #f)) (for-each (lambda (obj)
(set! eof-code (next-code))
(let loop ((previous-string ’())) (let ((code (in)))
(unless (or (done? code)
(= code eof-code))
(unless (hash-ref strings code) (hash-set! strings
code
(cons (last previous-string) previous-string))) (for-each-right out
(hash-ref strings code)) (let ((cs (hash-ref strings code)))
(and=> (and (not (null? previous-string)) (next-code))
(cut hash-set! strings <> (cons (last cs)
previous-string))) (loop cs)))))))
(define (lzw-compress-inner bv table-size dictionary) (call-with-values
(lambda ()
(open-bytevector-output-port)) (lambda (output-port get-result)
(let ((dict (%lzw-compress (cute get-u8 (ensure-bv-input-port bv)) (cute put-u16 output-port <>)
eof-object? table-size))) (if dictionary
(values (get-result) dict) (get-result))))))
(define* (lzw-compress bv #:key (table-size 65536) dictionary) (let ((bv (lzw-compress-inner bv table-size dictionary)))
(receive (write-to-bv get-bv)
(open-bit-output-port (integer-length (1- table-size)))
;; (write (bytevector->uint-list bv (endianness little) 2)) (newline) (for-each write-to-bv (bytevector->uint-list bv (endianness little) 2)) (get-bv))))
(define* (lzw-uncompress-inner bv table-size dictionary) (format #t "lzw-uncompress: table-size ~a~%" table-size) (call-with-values
(lambda ()
(open-bytevector-output-port)) (lambda (output-port get-result)
(cute put-u8 output-port <>) eof-object?
table-size))) (if dictionary
(values (get-result) dict) (get-result))))))
(define* (lzw-uncompress bv #:key (table-size 65536) dictionary)
(let* ((get-val (open-bit-input-port bv (integer-length (1- table-size)))) (u16lst (let loop ((x (get-val))
(lst ’())) (if (eof-object? x)
lst
(loop (get-val) (append lst (list x)))))))
The ‘compress’ and ‘uncompress’ scripts
Once the procedures are working, it is a simple task to write scripts that use them. So we’ll write scripts that are simplified versions Unix commands ‘compress’ and ‘uncompress’. These scripts will manipulate files with the following format.
Each file will begin with a 3 byte header. • Byte 1: #x1F
• Byte 2: #x9D
• Byte 3: Dictionary size, given as an 8-bit unsigned number between 9 and 16 inclusive. The number indicates a dictionary size from between 2^9 and 2^16.
The rest of the file is the LZW-compressed 16-bit binary data stored in little-endian format.
Note that this will not be compatible with your operating system’s version of compress. The compress file format is not consistent across platforms. Every current implementation of compress adds more functionality to squeeze more compression out of the vanilla LZW algorithm.
compress [-v] [-b bits] [name ...]
For each filename, compress, will create a LZW-compressed version of an input file. The compressed file will have the same filename as the input file with the ".Z" extension appended to it. If the compression is successful and the output file is successfully written, the input file will be deleted.
If no filenames are given, compress will take the contents of stdin and send the com-pressed data to stdout.
The optional ‘-b’ bits parameter will indicate the maximum size of the dictionary. If bits is given, it must be between 9 and 16, indicating maximum dictionary sizes of 2^bits. If the optional ‘-v’ parameter is given, the script should print to stdout the compression ratio for each file processed. If no file was specified and this program is thus compressing stdin to stdout, this flag is ignored.
Compress should fail with appropriate error messages if any of the following problems occur
• The command-line has unknown options or is otherwise incorrect
• The command line argument after a ‘-b’ is out of range, non-numeric, or missing. • The file associated with an input filename does not exist or is unreadable
• An input filename has a ".Z" suffix
• Writing the output file would overwrite a file that already exists • Writing to disk fails for any reason
• Erasing the input file on completion fails for any reason
If an error occurs, the script should return the error code 1. Otherwise it returns the error code 0.
uncompress [-v] [name ...]
removed. If the uncompression is successful and the output file is successfully written, the input file will be deleted.
Also, like compress, if no filenames are given, uncompress takes the contents of stdin and uncompresses them to stdout.
If the optional ‘-v’ parameter is given, the script should print to stdout the compression ratio for each file processed. If no file was specified and thus this program is compressing stdin to stdout, this flag is ignored.
Uncompress should fail with appropriate error messages if any of the following problems occur
• The command-line has unknown options or is otherwise incorrect • The file header is incorrect
• The bits parameter in the file header is out of range
• The file associated with the input filename does not exist or is unreadable
• The input compressed data is incorrect or corrupt, which can be detected by receiving an index that is not yet in the dictionary, or if an index value exceeds the number of entries in the dictionary as specified in the header, or if the last entry in the file not a complete 16-bit integer
• The input file does not end in ".Z"
• The output file would overwrite a file that already exists • Writing to disk fails for any reason.
• Erasing the input file on completion fails for any reason
If an error occurs, the script should return the error code 1. Otherwise it returns the error code 0.
compress and uncompress
Here’s compress #!/usr/bin/guile \ -L . -e main -s !#
;; Copyright (C) 2013 Daniel Hartwig <[email protected]> ;;
;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see <http://www.gnu.org/licenses/>. (use-modules (lzw) (ice-9 control) (ice-9 format) (ice-9 i18n) (rnrs bytevectors) (rnrs io ports) (srfi srfi-37)) (define *program-name* #f)
;; This form of ’gettext’ is helpful for longer messages. A single ;; message id can be split and aligned across many lines, similar to ;; the common usage in C.
(define (_ msg . rest)
(gettext (string-concatenate (cons msg rest)) "guile100-compress")) (define (error* status msg . args)
(force-output)
(let ((port (current-error-port))) (when *program-name*
(display *program-name* port) (display ": " port))
(apply format port msg args) (newline port)
(unless (zero? status)
;; specified status value. Similar to ’exit’ but more ;; controlled, for example, when using the REPL to debug, ;; ’abort’ will not cause the entire process to terminate. ;;
;; This is also handy to attempt processing every file, even ;; after an error has occured. To do this, establish another ;; prompt at an interesting place inside ’main’.
(abort (lambda (k) status)))))
(define (make-file-error-handler filename) (lambda args
(error* 1 (_ "~a: ~a") filename
(strerror (system-error-errno args))))) (define (system-error-handler key subr msg args rest)
(apply error* 1 msg args))
(define (compression-ratio nbytes-in nbytes-out)
(exact->inexact (/ (- nbytes-in nbytes-out) nbytes-in))) (define (write-lzw-header port bits)
(put-bytevector port (u8-list->bytevector (list #x1F #x9D bits)))) (define (compress-port in out bits verbose?)
#; (begin
(write-lzw-header out bits) (%lzw-compress (cute get-u8 in)
(cute put-u16 out <>) eof-object?
(expt 2 bits)))
(let* ((in-bv (get-bytevector-all in))
(out-bv (lzw-compress in-bv #:table-size (expt 2 bits)))) (write-lzw-header out bits)
(put-bytevector out out-bv)))
(define (compress-file infile bits verbose?) (catch ’system-error
(lambda ()
(let ((outfile (string-append infile ".Z"))) (when (string-suffix? ".Z" infile)
(error* 1 (_ "~a: already has .Z suffix") infile)) (when (file-exists? outfile)
(out (open-file outfile "wb")))
;; TODO: Keep original files ownership, modes, and access ;; and modification times.
(compress-port in out bits verbose?) (when verbose?
(format #; (current-error-port) (current-output-port)
(_ "~a: compression: ~1,2h%\n") ; ’~h’ is localized ’~f’. infile
(* 100 (compression-ratio (port-position in) (port-position out))))) (for-each close-port (list in out))
(delete-file infile)))) system-error-handler))
(define (ensure-bits obj)
(let ((n (or (and (integer? obj) obj) (and (string? obj)
(locale-string->integer obj))
(error* 1 (_ "bits must be an integer -- ~a") obj)))) (unless (<= 9 n 16)
(error* 1 (_ "bits must be between 9 and 16 -- ~a") n)) n))
(define (make-boolean-processor key) (lambda (opt name arg config . rest)
(apply values (assq-set! config key #t) rest)))
(define (make-option-processor key parse) (lambda (opt name arg config . rest)
(apply values (assq-set! config key (parse arg)) rest)))
(define (usage status)
(format (current-error-port)
(_ "Usage: ~a [-v] [-b bits] [FILE]...\n"
" -v, --verbose show compression ratio\n"
" -b, --bits bits maximum number of BITS per code [16]\n") *program-name*)
(abort (lambda (k) status))) (define options
(list (option ’(#\h "help") #f #f (lambda args
(option ’(#\v "verbose") #f #f
(make-boolean-processor ’verbose?)) (option ’(#\b "bits") #t #f
(make-option-processor ’bits ensure-bits)))) (define (main args)
;; Establishing this prompt ensures that any call to ’abort’ will at ;; most escape to the continuation of ’%’ here. In effect, calling ;; ’abort’ causes ’main’ to stop what it was doing and continue with ;; the procedure passed to ’abort’ instead.
(% (call-with-values (lambda ()
(args-fold (cdr args) options
(lambda (opt name arg . rest)
(error* 0 (_ "invalid option -- ’~a’") name) (usage 1))
(lambda (arg config infiles) (values config
(cons arg infiles)))
;; First seed: config (with default values). ’((bits . 16)
(verbose? . #f))
;; Second seed: infiles (initially empty list). ’()))
(lambda (config infiles)
(let ((bits (assq-ref config ’bits))
(verbose? (assq-ref config ’verbose?))) (for-each (lambda (infile)
(cond ((string=? infile "-")
(compress-port (current-input-port) (current-output-port) bits verbose?)) (else (compress-file infile bits verbose?)))) (if (null? infiles)
;; No arguments, use stdin. ’("-")
;; Process the files in the order given on ;; the command line.
(reverse infiles)))
(when (batch-mode?) (setlocale LC_ALL "")
Here’s uncompress #!/usr/bin/guile \ -L . -e main -s !#
;; Copyright (C) 2013 Daniel Hartwig <[email protected]> ;;
;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see <http://www.gnu.org/licenses/>. (use-modules (lzw) (ice-9 control) (ice-9 format) (ice-9 i18n) (ice-9 match) (rnrs bytevectors) (rnrs io ports) (srfi srfi-37)) (define *program-name* #f) (define (_ msg . rest)
(gettext (string-concatenate (cons msg rest)) "guile100-compress")) (define (error* status msg . args)
(force-output)
(let ((port (current-error-port))) (when *program-name*
(display *program-name* port) (display ": " port))
(apply format port msg args) (newline port)
(unless (zero? status) (abort (lambda (k)
(define (make-file-error-handler filename) (lambda args
(error* 1 (_ "~a: ~a") filename
(strerror (system-error-errno args))))) (define (system-error-handler key subr msg args rest)
(apply error* 1 msg args))
(define (compression-ratio nbytes-in nbytes-out)
(exact->inexact (/ (- nbytes-in nbytes-out) nbytes-in))) (define (read-lzw-header port)
(match (bytevector->u8-list (get-bytevector-n port 3)) ((#x1F #x9D bits)
(and (<= 9 bits 16) (values bits))) (x #f)))
(define (uncompress-port in out verbose?) (let ((bits (read-lzw-header in)))
(unless bits
(error* 1 (_ "incorrect header"))) #;
(%lzw-uncompress (cute get-u16 in) (cute put-u8 out <>) eof-object?
(expt 2 bits))
(let* ((in-bv (get-bytevector-all in))
(out-bv (lzw-uncompress in-bv #:table-size (expt 2 bits)))) (put-bytevector out out-bv))))
(define (uncompress-file infile verbose?) (catch ’system-error
(lambda ()
(let ((outfile (string-drop-right infile 2))) (when (not (string-suffix? ".Z" infile))
(error* 1 (_ "~a: does not have .Z suffix") infile)) (when (file-exists? outfile)
(error* 1 (_ "~a: already exists") outfile)) (let ((in (open-file infile "rb"))
(out (open-file outfile "wb"))) (uncompress-port in out verbose?) (when verbose?
(format #; (current-error-port) (current-output-port)
infile
(* 100 (compression-ratio (port-position out) (port-position in))))) (for-each close-port (list in out))
(delete-file infile)))) system-error-handler))
(define (usage status)
(format (current-error-port)
(_ "Usage: ~a [-v] [FILE]...\n"
" -v, --verbose show compression ratio\n") *program-name*)
(abort (lambda (k) status)))
(define (make-boolean-processor key) (lambda (opt name arg config . rest)
(apply values (assq-set! config key #t) rest)))
(define (main args) (% (call-with-values
(lambda ()
(args-fold (cdr args)
(list (option ’(#\h "help") #f #f (lambda args
(usage 0)))
(option ’(#\v "verbose") #f #f
(make-boolean-processor ’verbose?))) (lambda (opt name arg . rest)
(error* 0 (_ "invalid option -- ’~a’") name) (usage 1))
(lambda (arg config infiles) (values config
(cons arg infiles)))
;; First seed: config (with default values). ’((verbose? . #f))
;; Second seed: infiles (initially empty list). ’()))
(lambda (config infiles)
(let ((verbose? (assq-ref config ’verbose?))) (for-each (lambda (infile)
(cond ((string=? infile "-")
(uncompress-port (current-input-port) (current-output-port) verbose?))
(uncompress-file infile
verbose?)))) (if (null? infiles)
;; No arguments, use stdin. ’("-")
;; Process the files in the order given on ;; the command line.
(reverse infiles))) ;; Exit indicating success.
0))))) (when (batch-mode?)
(setlocale LC_ALL "")
3.4 Problem 4: tar file archives
This challenge is to create a script that takes a list of filenames and that generates an ustar-format archive file. This archive file format is compatible with common POSIX tools. The ustar interchange format is one of the simpler formats used for archive files that contain multiple files along with their metadata.
To begin, we are going to create a script that creates ustar-format files. But, to keep things simple, we are only going to use a small subset of the functionality that ustar files can provide. The result should be readable by common tar and pax tools.
3.4.1 ustar Script
The ustar script will have a simple calling structure. ustar archive file1 .. filen
It will create a new archive containing the files indicated on the command line. The script will have to handle many error conditions, including but not limited to • filename contains characters not in the ustar-string’s character set
• file part of filename is longer than 100 characters • path part of filename is longer than 155 characters
• file is a symbolic link, fifo, directory or any othet type of non-normal file • file’s uname and gname contain characters not in ustar-string’s character set • file’s uname or gname are longer than 31 characters
• file length is greater than 8,589,934,591 bytes, (octal 77777777777) • file’s UID or GID is greater than 2,097,151 (octal 7777777)
• system errors about inability to open, write, or close files.
3.4.2 The
rustar File Format
First, I will describe our restricted ustar file format, which, I’m going to dub rustar for restricted ustar, just so that we’re clear that I’m talking about something more specific than the ustar format.
File Structure
A rustar file contains a set of logical records. Each logical record represents the contents of a file plus its metadata. The logical records appear sequentially in the file, one after another, and there is no global header in the file. At the end of the file is a footer.
Logical Records
Each logical record consists of two parts, a header segment, and the contents of the file a.k.a the data segment. Of these, only the header requires a detailed explanation.
Header
Header Types
Here we describe the three types that can appear in a header. Each type has the annotation [N]. The N indicates that this field is a fixed-size that takes up N bytes.
1. rustar-string[N] is a fixed-width string that contains only the codepoints listed below. It is stored in the ASCII encoding, and, if necessary, is right padded with NULL bytes to ensure it occupies the whole of its N bytes. NULL bytes can only appear at the end of the string. The string need not end with NULL bytes if it fills the whole of its fixed witdh.
The list of allowed codepoints is • U+20 to U+22
• U+25 to U+3F • U+41 to U+5A • U+5F
• U+61 to U+7A
• and U+00, but, U+00 can only be followed by more U+00.
2. rustar-0string[N] — note the ‘0’ — is a fixed-width string with the same format and restrictions as a rustar-string[N] but with an addition restriction. It must end with at least one NULL byte.
3. rustar-number[N] is an unsigned integer stored as a fixed-width string. The string contains the the text representation of the integer in octal format. The last byte (and only the last byte) of the string must be NULL. The string is left-padded with the ‘0’ character to ensure the number occupies the whole of its fixed width buffer.
For example, a rustar-number[8] field for the integer 10 will be the string “0000012” followed by one byte of NULL. 12 octal equals 10 decimal.
Header Fields
The 17 fields in the 512 byte header block of a logical record are
Field Format Description
Name string[100] The filename by itself, with no directory information. The path separator character (U+2F), is not allowed.
Mode number[8] A bitfield of the permissions. See below.
UID number[8] The User ID of the file
GID number[8] The Group ID of the file
Size number[12] The length of the file in bytes
mtime number[12] The 32-bit integer modification time of the file.
Checksum number[8] 256 + the sum of all the bytes in this header except the checksum field.
Typeflag string[1] Always “0”.
Link name string[100] Always 100 bytes of NULL.
Version string[2] The string “00”.
uname 0string[32] The uname of the file.
gname 0string[32] The gname of the file
Dev-Major number[8] Always zero.
Dev-Minor number[8] Always zero.
Prefix string[155] Path information for this file. If this file has no additional path information, this is all NULL. Directory separation is represented by ‘/’ forward slash. The slash at the end is assumed, and should not be included ex-plicitly.1
Padding 0string[12] 12 bytes of NULL.
The mode bitfield is a standard permissions bitfield: • 0x001 execute permission for ’other’
• 0x002 write permission for ’other’ • 0x004 read permission for ’other’ • 0x008 exeute permission for ’group’ • 0x010 write permission for ’group’ • 0x020 read permission for ’group’ • 0x040 execute permission for ’owner’ • 0x080 write permission for ’owner’ • 0x100 read permission for ’owner’ • 0x200 (unused)
• 0x400 if is setgid • 0x800 if is setuid
Data
After the 512-byte header block, the binary contents of the file are stored. The data segment is NULL-padded so that it ends on a 512-byte block boundary.
Footer
The footer is 1024 bytes of NULL that appears at the end of the file.
1 For example: prefix “foo” + name “bar” forms “foo/bar”. Prefix “foo/” + name “bar” forms “foo//bar”.
The Archive Script
Jez Ng contributed a script that meets the above requirements quite nicely. One thing to note here is the use of the procedures cut and cute. These let you, in effect, pass a subset of the required parameters to a procedure. In a later call, you can add the remaining parameters to the procedure and then truly call it.
#! /usr/bin/env guile \ -e main -s
!#
(use-modules (rnrs bytevectors) (rnrs io ports)
(srfi srfi-1) ; map, reduce (srfi srfi-26) ; cut, cute (ice-9 format))
(define write-bytevector (cut put-bytevector (current-output-port) <...>)) (define block-size 512)
(define (cat)
(define bv (make-bytevector block-size 0))
(let ((read-count (get-bytevector-n! (current-input-port) bv 0 block-size))) (unless (eof-object? read-count)
(write-bytevector bv)
(unless (< read-count block-size) (cat))))) (define rustar-char-set (char-set-union (ucs-range->char-set #x20 #x23) (ucs-range->char-set #x25 #x40) (ucs-range->char-set #x41 #x5B) (char-set #\x5F) (ucs-range->char-set #x61 #x7B))) (define (valid-rustar-char? c) (char-set-contains? rustar-char-set c)) (define (make-fixed-string length string) (let ((bv (make-bytevector length 0)))
(string-for-each-index (lambda (i)
(let ((c (string-ref string i))) (unless (valid-rustar-char? c)
(throw ’ustar-error "encountered invalid character")) (bytevector-u8-set! bv i (char->integer c))))
bv))
(define (make-rustar-string length string) (if (<= (string-length string) length)
(make-fixed-string length string)
(throw ’ustar-error "’~a’ is too long for tar header" string))) (define (make-rustar-0string length string)
(if (< (string-length string) length) (make-fixed-string length string)
(throw ’ustar-error "’~a’ is too long for tar header" string))) (define (make-rustar-number length number)
(let* ((num (number->string number 8))
(padding (- length (string-length num) 1))) (if (>= padding 0)
(make-fixed-string length (string-append (make-string padding #\0) num)) (throw ’ustar-error "~a is too large for tar header" num))))
;; Unlike dirname, this doesn’t return "." for files in the cwd. (define (raw-dirname path)
(let ((last-separator-pos (string-rindex path
(string-ref file-name-separator-string 0)))) (if last-separator-pos
(string-take path last-separator-pos) "")))
(define (write-file-header filename) (define st (lstat filename))
(unless (eq? (stat:type st) ’regular)
(throw ’ustar-error "Only regular files are supported")) (let* ((uid (stat:uid st))
(gid (stat:gid st))
; We only really need an a-list for the purposes of modifying ; checksum in-place. The other keys are not used. However, they do ; serve as documentation.
(header
‘((filename . ,(make-rustar-string 100 (basename filename))) (mode . ,(make-rustar-number 8 (stat:perms st)))
(uid . ,(make-rustar-number 8 uid)) (gid . ,(make-rustar-number 8 gid))
(size . ,(make-rustar-number 12 (stat:size st))) (mtime . ,(make-rustar-number 12 (stat:mtime st)))
(checksum . ,(make-bytevector 8 (char->integer #\space))) (typeflag . ,(make-rustar-string 1 "0"))
(magic . ,(make-rustar-0string 6 "ustar")) (version . ,(make-rustar-string 2 "00"))
(uname . ,(make-rustar-0string 32 (passwd:name (getpwuid uid)))) (gname . ,(make-rustar-0string 32 (group:name (getgrgid gid)))) (dev-major . ,(make-rustar-number 8 0))
(dev-minor . ,(make-rustar-number 8 0))
(path . ,(make-rustar-string 155 (raw-dirname filename))) (padding . ,(make-rustar-0string 12 ""))))
(sum (cut reduce + 0 <>))
(checksum (sum (map (compose sum bytevector->u8-list cdr) header)))) (set! header (assq-set! header ’checksum (make-rustar-number 8 checksum))) (for-each (compose write-bytevector cdr) header)))
(define (tar archive filenames) (with-output-to-file archive
(lambda ()
(for-each (lambda (filename)
(write-file-header filename)
(with-input-from-file filename cat #:binary #t)) filenames)
(write-bytevector (make-bytevector (* block-size 2) 0))) #:binary #t))
(define (main args)
(define perror (cut format (current-error-port) <...>)) (define (system-error-handler . args)
(perror "error: ~a~%" (strerror (system-error-errno args))) (exit 1))
(define (ustar-error-handler . args) (perror "error: ")
(apply perror (cdr args)) (perror "~%")
(exit 1))
(catch ’ustar-error (lambda ()
(catch ’system-error
(cute tar (cadr args) (cddr args)) system-error-handler))
4 Theme 2: Web 1.0
he second theme in this project is “Web 1.0”, where we’ll talk about interacting with the Internet as it existed in the 1990s.
The 1990s began with emergence of Gopher clients and servers. The Internet Gopher protocol visualized the world as a series of folders. The folders usually contained plain-text documents or media files likeGIFs or AU audio. This was before bothHTML and PDF, so mixing text and graphics in a single file wasn’t as common, and, if it did occur, it was in formats such as PostScript.
The HTTP-and-HTML-based internet is linked to the appearance of the NCSA Mosaic browser and theNCSA httpdserver. There were precursors, but, as a practical matter, 1993 was the beginning of theHTTP/HTMLweb.
But, in those days, before AJAXor Flash, most of the content was staticHTMLcontent or dynamic content created by CGI scripts. In this context, before the concept of cookies was developed in 1994, personalization of content for different users was not practical.
JavaScript appeared in Netscape Navigator 2.0 in 1995 and Internet Explorer 3.0, in late 1996, but, with incompatibilities between the two implementations. Before 1996, almost all content was static and generated on the server side. This early Web had a more strongly defined separation between client and server.
The early Web pages had stylistic quirks that are less common today. BeforeCSS2, Web page layouts were often created by using tables. Blinking text, animated GIFs, embedded
MIDItunes were common.
By the end of the decade, Linux, Apache, MySQL, and PHPwere all quite functional. Those programs, in conjunction with Perl, which first appeared in the 80s, became the building blocks of the famous LAMPstack. This free, open software stack allowed for some of the common types of interactivity to which we have become accustomed.
PHP used a model that allowed for rapid generation of Web pages, where code was embedded within otherwise staticHTMLweb pages. When those pages were requested, the embeddedPHPcode was run, and its output became HTMLcontent.
So, in our second theme we’ll imagine what the world would have been like if GUILE
were part of the ecosystem that made up the 1990s Internet experience. Specifically, we’ll take a look at using Guile for
• on-the-fly evaluation of code embedded within HTMLdocuments • the Internet Gopher protocol
• CGI scripting
• the Linux Apache GUILE MySQL stack • and the animated GIF format.
4.1 Problem 5: PHP-Style GUILE
This challenge is to write a CGI script that 1. receives a filename as a parameter
2. passes a file by that filename through a preprocessor called eguile 3. and returns the output to the CGI client.
But why eguile? That script helps us mix HTML and Guile.
One of the programming paradigms of Web 1.0 was thePHPprogramming model, where code was embedded within HTML. The code was run when a client requested the file from the server, and any output printed by the execution of the code became embedded in the
HTML when it was sent to the client. The code enclosed between the <?php and ?> tags is evaluated when the file is requested. Anything printed to stdout appears in the HTML
document.
<!DOCTYPE html> <html>
<body> <?php
echo "My first PHP script!"; ?>
</body> </html>
When it first arrived on the scene, PHP was CGI executable.
The side effect of today’s challenge is to re-create the PHP programming model in Guile, making something like the following possible.
<!DOCTYPE html> <html>
<body> <p>
<?scm
(display "A Guile Script!") ?> </p> <p> <?scm:d "A string" ?> </p> </body> </html>
Mixing HTML and Scheme
eguile does this by recognizing two new tags.
• ‘<?scm’ and ‘?>’ enclose Scheme code, which eguile will pass to Guile for evaluation. • ‘<?scm:d’ and ‘?>’ also enclose Scheme code to be evaluated, just like the ‘<?scm’ tags. Additionally, eguile will display the value of the last expression using the display procedure.
Making a CGI Script
eguile by itself is not a complete solution. It can run mixedHTMLand Guile code through the Guile interpreter, but, it doesn’t have any hooks to connect it to the webserver.
To make this happen, we can add some framework code to have eguile run as part of a CGIscript.
The quickest way to make a CGI script is to use the functions provided by the Guile-WWW project. Guile-Guile-WWW has routines that provide CGIfunctionality.
Thus, we’ll be creating a Scheme script that uses Guile-WWW for CGIprocessing and that includes an updated version of eguile.
URL Parsing
We’ll call this script ‘ghp.cgi’. ghp is short for Guile HTML Processor. For any basic webserver, you can put the ‘ghp.cgi’ in the ‘cgi-bin’ directory, and run it by pointing your browser to something like http://localhost/cgi-bin/ghp.cgi.
But wait! We have to tell ‘ghp.cgi’ what HTML-and-Scheme file it needs to process and output. One way is to have ‘ghp.cgi’ parse any extra path information at the end of its URL.
That script can parse extra path information is given after the script name, like so: http://localhost/cgi-bin/ghp.cgi/FILENAME
Any normal webserver should put the extra path information for the CGI script in the PATH_INFO environment variable.
The ‘ghp.cgi’ script should load anHTML-and-Guile file named FILENAME from some sensible default path, process it through Eguile, and then serve it back to the client.
Like any saneCGIscript that processes aURL, ‘ghp.cgi’ should strip out any ‘/../’ in the path, or maybe just fail if there are ‘/../’ in the path.
The Task at Hand
The task is to write a CGIscript that
1. inspects its PATH_INFO to see if an extra filename appears at the end of theURLused to call the script
2. passes a file by that filename through the Eguile processing procedure 3. and sends it back to the client
If a file by that filename doesn’t exist, the script should return aHTTP404 “Not Found” error.
I’m asking you to create. But, once theCGIscript is in place, we can serve up mixed HTML and Scheme content just likePHP3 did way back in 1997.
You can find Guile-WWW athttp://nongnu.org/guile-www. For the moment, you can find Eguile at
https://github.com/spk121/guile100/blob/master/code/eguile.scm
The original source of Eguile is at
http://woozle.org/~neale/src/eguile/. Remember that it is abandonware, so don’t bug the owner with questions. We’re going to find a new home and maintainer for Eguile in the near future.
Eguile itself was based on other predecessors, like Shiro Kawai’s ESCM. You can find ESCM at
http://practical-scheme.net/vault/escm.html.
4.2 Problem 6: MySQL
This challenge is to write one static HTML form page and one CGI script that will add data to a MySQL database table.
1. Create a static HTML page that has a form with a name text field and a male/female/other gender radio button set. The form, when posted, will call a Guile CGI script as its action, posting the name and gender fields.
2. Create a CGI script that receives the form’s name and gender post data and adds it to a MySQL / MariaDB database. The script will then display the entire contents of the database as a table in HTML.
You may find Guile-WWW useful when creating CGI scripts. You can find Guile-WWW athttp://nongnu.org/guile-www.
Guile-DBI is probably the best way to access MySQL databases in Guile. You can find it athttp://home.gna.org/guile-dbi/.
4.3 Problem 7: Animated GIF Badges
A very important part of the Web 1.0 experience were the GIF badges. These 88 by 31 pixel images were typically bright colors on a grey background with a border to give it a raised button effect. They had text announcing one’s loyalty to a brand of webbrowser, computer, or political philosophy, or were used as download buttons. They were usually animated.
To create our Web 1.0 experience, we need animated GIF badges. So this week’s task is to write a procedure that will create a GIF. The procedure will have to come in two versions: one for animated GIF and one for static GIF.
For the static GIF case, you should assume that your input data is the following: • a filename for the output
• a palette of 256 24-bit RGB colors, perhaps stored as a vector of unsigned integers • a two dimensional array of unsigned 8-bit indices to the palette colors
For the animated GIF case, you should assume that your input data is • a filename for the output
• a three-dimensional array of unsigned 8-bit indices
• and a variable containing the desired millisconds per frame The actual specification for GIF, GIF89a, can be found at
http://www.w3.org/Graphics/GIF/spec-gif89a.txt. This specification, how-ever, contains a lot of fields and features that won’t be needed for this specific case. On the other end of the spectrum is the current Wikipedia page for GIF,
https://en.wikipedia.org/wiki/Gif, which, at the time of this writeup, contains a very condensed and cryptic description of the file format and the fields contained therein. By merging information from the official specification and the condensed one, it should be possible to write a legible function that creates GIFs for the two cases described above.
One of the trickier parts of the implementation is the LZW compression required. For-tunately, an implementation of LZW compression is handy, See Section 3.3 [Problem 3], page 18: LZW Compression.
These days, the giflib project is as close as we have to a canonical library for the Gif reading and writing. It can be referenced to help understand the places in the specification that are obscure. It is at http://sourceforge.net/projects/giflib.
Appendix A Other Examples
Here are some other examples for you
A.1 ustar Archives
Back in Section 3.4 [Problem 4], page 36, I defined a limited, reduced functionality version of the ustar archive format. The limited version had just enough functionality to create a valid TAR file. After I received Jez’s solution, Mark Weaver sent an alternate script that handles almost all of the capabilities of the ustar file format, including links and longer path names. That script is below
#!/usr/bin/guile \ -e main -s
!#
;;; Copyright (C) 2013 Mark H Weaver <[email protected]> ;;;
;;; This program is free software: you can redistribute it and/or modify ;;; it under the terms of the GNU General Public License as published by ;;; the Free Software Foundation, either version 3 of the License, or ;;; (at your option) any later version.
;;;
;;; This program is distributed in the hope that it will be useful, ;;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License ;;; along with this program. If not, see <http://www.gnu.org/licenses/>. (use-modules (srfi srfi-1)
(ice-9 match) (ice-9 receive) (rnrs bytevectors) (rnrs io ports))
;; ’file-name-separator-string’ and ’file-name-separator?’ are ;; included in Guile 2.0.9 and later.
(define file-name-separator-string "/")
(define (file-name-separator? c) (char=? c #\/))
(define (fmt-error fmt . args)
(error (apply format #f fmt args)))
;; Like ’string-pad-right’, but for bytevectors. However, unlike ;; ’string-pad-right’, truncation is not allowed here.