Compaction of Sial - The BCPL Cintsys and Cintpos User Guide

The call cvfpg("KPG") reads the Sial statement knowing it is of the form: KPG Pk Gn). This outputs the statement as an assembly language comment after placing k and n in pval and gval, respectively. The three writef calls then output the three assembly language instructions for the KPG operation, and ENDCASE transfers control to where the next Sial statement is processed. All the other cases are equally simple.

The section name of the program, which must be present, compiles into a C callable function that initialises the BCPL global vector with the entry points defined within this module. To complete the 386 implementation, there is a short handwritten assem-bly language library natbcpl/sysasm/mlib.s that defines the BCPL callable func-tions sys, changeco and muldiv. The program must be linked the compiled ver-sions of the BCPL library modules BLIB and DLIB, and also clib whose source is in natbcpl/sysc/clib.c and a program typically called initprob.c that defines the function initsections to invoke all the global initialisation functions. The file initprog.cis normally created by a call such as:

makeinit prog.b to initprog.c The resulting initprog.c is typically:

// Initialisation file written by makeinit version 2.0

#include "bcpl.h"

WORD stackupb=50000;

WORD gvecupb=1000;

// BCPL sections

extern BLIB(WORD *); // file (run-time library)

extern DLIB(WORD *); // file (system dependent library) extern prog(WORD *); // file prog.b

void initsections(WORD *g) {

BLIB(g); // file (run-time library)

DLIB(g); // file (system dependent library) prog(g); // file prog.b

return;

}

10.3 Compaction of Sial

In order to transmit program to a device such as a mobile phone or space probe over a slow connection it is useful to have a compact representation of the code. Sial is both target machine independent and can be compacted with ease. This section gives a brief overview of an experimental compaction technique that seems to performs well.

Since the types of operands and their number depend only on the Sial operator, an Sial stream can be split into several streams of which the main one is the stream

of Sial operators. Others are streams holding global variable numbers, local variable offsets, program label numbers, data labels, integer constants, character codes and a some others. These streams can be separately compressed taking advantage of the special properties of each. Some ideas are given below.

Local variable offsets have a very skew distribution and so are susceptible to Huff-man (or possibly arithmetic) coding after some preprocessing to deal with large values and the implementation of a mechanism to take advantage of the observation that, if an offset is used once, the same offset is likely to be used again in the near future. This might suggest the use of move-to-front buffering.

Program labels have the property that, in any section, they are each only set once using a LAB or ENTRY statement. If they are systematically renumbered so that successive label setting statements take successive label numbers, there is no need for these statements to take a label argument. The remaining labels in the stream are typically nearly monotonic the compaction algorithm can take advantage of this.

The operation code stream often contains repeated patterns that are susceptible to the conventional techniques used to compress text, and the same applies to the stream of characters. It might be worth separating out the integers representing the character string lengths from other integers and place them either in a stream of their own or insert them into the stream of characters.

Some preliminary experiments on Sial compression can be found in the directory bcplprogs/sial in the standard BCPL distribution.

Chapter 11 The MC Package

This chapter describes the MC package which provides a machine independent way to generate and executing native machine code at runtime. The work on this package started in January 2008 and is still under development, however, it cur-rently works well enough to run the n-queens problem on i386 machines about 24 times faster than the normal Cintcode interpretive version. MC package de-velopment is performed in the directory BCPL/bcplprogs/mc/ and fairly stable versions are copied to BCPL/cintcode/g/mc.h, BCPL/cintcode/com/mci386.b and BCPL/cintcode/cin/mci386which can be used from any working directory. Currently the MC package does not have any floating point operations. This will be rectified in due course.

The package is based on a simple machine independent abstract machine code called MC which is easily translated into machine instructions for most architectures.

Although native code is generated by MC calls such as mcRDX(mc add, mc b, 20, mc d), MC has a corresponding assembly language to assist debugging. The assembly form of the instruction generated by the previous call is ADD B,20(D) meaning set register B to the sum of B and the contents of the memory location whose address is 20 plus the value of register D. MC instructions are fairly low level and typically translate into single native code instructions for most architectures. This example translates into the i386 GNU statement: addl 20(%edx),%ebx.

The first operand is the destination for any instruction that updates a regis-ter or memory location. Thus assignments are always from right to left as in most programming languages but unlike many assembly codes where, for instance, movl 20(%edx),%ebxupdates the second operand.

The MC machine has six registers A, B, C, D, E and F that are directly available to the programmer, and also a program counter, stack pointer, stack frame pointer and a condition code register, although these cannot be accessed explicitly.

11.1 MC Example

The following program is a simple demonstration of the i386 version of the MC package.

GET "libhdr"

181

GET "mc.h"

MANIFEST {

A=mc_a; B=mc_b; C=mc_c; D=mc_d; E=mc_e; F=mc_f a1=1; a2; a3

}

LET start() = VALOF

{ // Load the dynamic code generation package for i386 machines.

LET mcseg, mcb, n = globin(loadseg("mci386")), 0, 0 UNLESS mcseg DO

{ writef("Trouble with MC package: mci386*n") GOTO fin

}

// Create an MC instance for 10 functions with a data space // of 100 words and code space of 4000 words.

mcb := mcInit(10, 100, 4000) UNLESS mcb DO

{ writef("Unable to create an mci386 instance*n") GOTO fin

}

mc := 0 // Currently no selected MC instance.

mcSelect(mcb) // Select the new MC instance.

mcK(mc_debug, #b0011) // Trace comments and MC instructions.

mcKKK(mc_entry, 1, 3, 5) // Entry point for function 1

// having 3 arguments and 5 local variables mcK(mc_debug, #b1111) // Trace comments, MC instructions, target

// instructions and the compiled code.

mcRA(mc_mv, A, a1) // A := <arg 1>

mcJS(mc_jlt, n) // IF A<100 JMP Ln

mcK(mc_debug, #b0011) // Trace only comments and MC instructions.

mcF(mc_rtn) // Return from function 1 with result in A.

mcF(mc_endfn) // End of function 1 code.

mcF(mc_end) // End of dynamic code generation.

writef("*nF1(10, 20, 30) => %n*n", mcCall(1, 10, 20, 30)) fin:

IF mcseg DO unloadseg(mcseg) RESULTIS 0

}

When this program runs it outputs the following.

// ENTRY 1 3 5

11.1. MC EXAMPLE 183

The result of 117 (= 10+20+(30-1)*3) shows that the body of the loop was correctly executed three times.

The header file (mc.h) defines manifests (such as mc mv and mc add) and globals (such as mcK and mcRA) provided by the package. The package itself must be dynami-cally loaded (by globin(loadseg("mci386"))) and then selected (by mcSelect(mcb)).

MC instructions are compiled by calls such as mcRA(op,... or mcRK(op,... where op specifies the instruction or directive and the letters following mc (eg RA or RK) specify the sort of operands supplied.

A register operand is denoted by R and an integer operand by K. There are 9 possible kinds of memory operands denoted by A, V, G, M, L, D, DX, DXs and DXsB. A denotes an specified argument of the current function, V denotes a specified local variable of the current function, G denotes a specified BCPL global variable, M denotes a location in Cintcode memory specified by a BCPL pointer, L denotes the position within the data or code areas of the compiled code corresponding to a given label, D denotes a specified absolute machine address, DX denotes a location whose machine address is the sum of a given byte offset and register, DXs is similar to DX only the index register is scaled by a given factor of 1, 2, 4 or 8 and finally DXsB is like DXs but has a second specified register added into the effective address.

The following table summarises the MC code generation functions. The first ar-gument is always specifies the directive or instruction and the remaining arar-guments specify the operands. The destination of any instruction that updates a register or memory location is always the first operand.

Function Operands

mcF No operand

mcK One integer operand mcR One MC register operand

mcA One operand specifying an argument number mcV One operand specifying an local variable number mcG One operand specifying a global variable number

mcM One operand giving the word address of a location in Cintcode mem-ory

mcL One numeric label operand, defaulting to 32-bit relative mcD One operand giving an absolute machine address

mcDX One memory operand specified by an offset added to an index register mcDXs One memory operand specified by an offset added to an index register

scaled by s which must be 1, 2, 4 or 8

mcDXsB One memory operand specified by an offset added to a base register and an index register scaled by s which must be 1, 2, 4 or 8

mcJS Jump instructions with near relative destinations

mcJL Jump instructions with possibly distant relative destinations mcJR Jump instructions with destination given by resister

mcRA Two operands, R and A mcRV Two operands, R and V mcRG Two operands, R and G mcRM Two operands, R and M mcRL Two operands, R and L mcRD Two operands, R and D mcRDX Two operands, R and DX mcRDXs Two operands, R and DXs mcRDXsB Two operands, R and DXsB mcRR Two operands, R and R mcAR Two operands, A and R mcVR Two operands, V and R mcGR Two operands, G and R mcMR Two operands, M and R mcLR Two operands, L and R mcDR Two operands, D and R mcDXR Two operands, DX and R mcDXsR Two operands, DXs and R mcDXsBR Two operands, DXsB and R

In document The BCPL Cintsys and Cintpos User Guide (Page 187-193)