.de pg
.sp
..
.na
.tr `
.ce
REGENERATING SYSTEM SOFTWARE
.sp
This document discusses how to
assemble or compile various parts of the
UNIX system software.
This may be necessary because
a command or library is accidentally
deleted or otherwise
destroyed;
also, it may be desirable to install a modified
version of some command or library routine.
It should be noted that in the system as distributed,
there are quite a few commands that depend
to some degree on the current configuration
of the system;
thus in any new system modifications to some commands
are advisable.
Most of the likely modifications
relate to the standard disk devices contained
in the system.
For example, the df__ ("disk free")
command has built into it the names of
the standardly present disk storage drives
(e.g. "/dev/rf0", "/dev/rp0").
Df__ takes an argument to indicate which
disk to examine, but it is convenient
if its default argument is adjusted to
reflect the ordinarily present devices.
.pg
The companion document "Setting up UNIX"
discusses which commands are likely to require
changes.
.pg
The greater part of the
source files for commands resides
in several subdirectories
of the directory /usr/source.
These subdirectories, and a general
description of their contents, are
.sp
.in 8
.ti -4
s1``Source files for most commands with names beginning
with "a" through "l".
.sp
.ti -4
s2``Source files for most commands with names beginning
with "m" through "z".
.sp
.ti -4
s3``Source files for subroutines contained
in the standard system library, "/lib/liba.a"
(see below).
.sp
.ti -4
s4``Source files for the C library, "/lib/libc.a"
(see below).
.sp
.ti -4
s5``empty
.sp
.ti -4
s6``This directory
is probably nonexistent or empty;
in our own system it contains certain administrative-type
commands which deal with old-style file systems.
.sp
.ti -4
s7``Contains the source files for all the text formatters
roff, nroff, and troff.
They are separate because they overloaded the s2 directory.
.sp
.in 0
To regenerate most commands in the s1 and s2 directories
is straightforward.
The appropriate directory will contain one or more source
files for the command.
These will all have the suffix ".s" if the command is written in assembler language,
or ".c" if it is written in C.
The first part of the name begins with
the name of the command.
If there are several source
files,
the command name will be followed
by a character which distinguishes the
several files. it is typically "1", "2", ...;
Sometimes the last is "x".
For example,
The "bas" command has source files (in s1)
called "bas0.s", "bas1.s", ..., "bas4.s", "basx.s".
In all cases,
the lexicographical order of the distinguishing character
is the order in which the source files should be compiled
or assembled.
Thus, for example, the way to reassemble a new "bas"
is to say (in s1)
.sp
	as bas?.s
.sp
Some of the assembly-language commands
are completely stand-alone and require no application
of the link editor ld__
(also loosely called the loader).
Unfortunately there is no
.ul
a priori
way of determining which
need to be loaded.
A simple
.ul
a posteriori
method is to assemble the
command as discussed above, then say
.sp
	nm -u a.out
.sp
which will list the undefined external symbols.
If any appear, the loader should be called
by saying
.sp
	ld a.out -l
.sp
A minority of the commands written in assembly language
are coded so that
their text portions are pure and they can be shared.
(The most important of these is "ed";
another is "write.")
Such commands may (and in the case of the editor,
should) be loaded (whether or not they need it for
picking up library routines) by
.sp
	ld -n a.out -l
.sp
One important command which needs slightly special
treatment is "tp" which has to be loaded
with the C library:
.pg
	as tp?.s
.br
	ld a.out -l -lc
.pg
because it calls the C-language ctime subroutine.
.pg
It is not particularly easy to
find out if an assembly-language command
has a pure text segment.
The simplest way is probably to look
at the source and see if there
are ".data" assembler operators surrounding
system calls with pluggable arguments.
It is probably
not a bad idea to ignore the whole question
except in the case of the editor,
where there are likely to be real
gains in text-segment sharing.
.sp
As it happens, there are no commands written in C
(except those described below)
which consist of more than one file.
The command "com.c" can therefore be recompiled
simply by saying
.sp
	cc -n com.c
.sp
Here as above the "-n" indicates the desire to produce
a object file which has a pure, sharable text segment.
Since C produces pure code and the C library is pure,
one might as well share.
.sp
Some of the most important commands
are considerably more complicated to
regenerate, and these are discussed
specifically below.
The contents of libraries are also
discussed.
.sp
AS__
.sp
The assembler consists of two executable files:
/bin/as and /etc/as2.
The first is the 0-th pass:
it reads the source program, converts it to
an intermediate form in a temporary file "/tmp/atm0?",
and estimates the final locations
of symbols.
It also makes two or three other temporary
files which contain the ordinary symbol table,
a table of temporary symbols (like n_:)
and possibly an overflow intermediate file.
The program /etc/as2
acts as an ordinary two-pass assembler
with input taken from the files produced by /bin/as.
.pg
The source files for /bin/as
are named "/usr/source/s1/as1?.s"
(there are 9 of them);
/etc/as2 is produced
from the source files
"/usr/source/s1/as2?.s";
they likewise are 9 in number.
Considerable care should be exercised
in replacing either component of the
assembler.
Remember that if the assembler is lost,
the only recourse is to replace it from some backup storage;
a broken assembler cannot assemble itself.
.pg
C_
.br
The C compiler consists of
three files:
"bin/cc", which expands compiler control lines and
which calls the phases of the compiler proper,
the assembler, and the loader;
"/lib/c0", which is the first phase of the compiler;
and "/lib/c1", which is the second phase of the compiler.
The loss of the C compiler is as serious
as that of the assembler.
.pg
The source for /bin/cc
resides in "/usr/source/s1/cc.c".
Its loss alone is not fatal.
Provided that prog.c does not contain any
compiler control lines,
prog.c can be compiled by
.sp
	/lib/c0 prog.c temp0 temp1
.br
	/lib/c1 temp0 temp1 temp2
.br
	as - temp2
.br
	ld -n /lib/crt0.o a.out -lc -l
.sp
If /bin/cc is lost,
it can be recovered in this way,
since it contains no compiler control lines.
.pg
The source for the compiler proper is in the
directory /usr/c.
The first phase (c0)
is generated from the files c00.c, ..., c04.c,
which must be compiled by C;
c0t.s, which must be assembled;
and c0h.c, which is a header file which
should not be compiled but is a file
include_______d
by the C programs of the first phase.
The c0t.s program contains a parameter
"fpp" which determines whether C is to
used on a machine which has PDP 11/45 floating-point
hardware; it should be set to 1 if so, 0 if not.
In the standard system fpp is 1.
To make a new /lib/c0,
assemble c0t.s, name the output c0t.o,
and
.sp
	cc -n c0t.o c0[0-4].c
.sp
Before installing the new c0, it is prudent to save the old one someplace.
.pg
The second phase of C (/lib/c1)
is generated from the C source files c10.c, ..., c13.c,
the assembly-language program c1t.s,
the include-file c1h.c, and a library
of object-code tables called tab.a.
To generate a new second phase,
assemble c1t.s, call it c1t.o, and
.sp
	cc -n c1t.o c[0-3].c tab.a
.sp
It is likewise prudent to save c1 before
installing a new version.
In fact in general it is wise to save the
object files for the C compiler so that
if disaster strikes C can be reconstituted
without a working version of the compiler.
.pg
The library of tables mentioned above
is generated from the files
regtab.s, sptab.s, cctab.s, and efftab.s.
The order is not important.
These ".s" files are not in fact assembler source;
they must be converted by use of the cvopt_____
program, whose source and object are
located in the C directory.
For example:
.sp
	cvopt regtab.s temp
	as temp
	mv a.out regtab.o
	ar r tab.a regtab.o
.sp
.ul
FORTRAN
.sp
Probably because it is a very large
subsystem written entirely in assembly language,
Fortran is quite complicated to regenerate.
On the other hand, Fortran is vital only to its own
users;
since none of the compiler nor any
important part of the run-time system is
written in Fortran,
both can be regenerated in case of loss.
.sp
The fc__ command itself is essentially
equivalent to a long shell command file;
for a single source program
.ul
prog.f,
it amounts to saying
.sp
	/usr/fort/fc1 prog.f
	as - f.tmp1
	ld /lib/fr0.o a.out /lib/filib.a -lf -l
.sp
Thus, /usr/fort/fc1 is the compiler proper;
fc1 leaves its output in the current directory in the
file "f.tmp1".
/lib/fr0.o is the runtime startoff.
Filib.a is the library of operators;
Fortran is essentially interpretive,
and operations such as "add floating variable to floating
variable"
are short routines loaded from the filib.a library.
.sp
/lib/libf.a (specified by the "-lf") is an archive file
containing the language builtin functions
plus a few others.
The standard assembly language library
(the "-l", or /lib/liba.a)
is referenced by certain of the
builtin functions
(for routines like sin___).
