------------------------------- Page    i -------------------------------

                         The M4 Macro Processor




                                                 Brian W. Kernighan

                                                 Dennis M. Ritchie



                                                 Edited for UTS

                                                 February 16, 1981

------------------------------- Page   ii -------------------------------

                            TABLE OF CONTENTS


Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

1.    Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .   1

2.    Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2

3.    Defining Macros . . . . . . . . . . . . . . . . . . . . . . . .   2

4.    Quoting . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4

5.    Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . .   5

6.    Arithmetic Built-ins  . . . . . . . . . . . . . . . . . . . . .   6

7.    File Manipulation . . . . . . . . . . . . . . . . . . . . . . .   7

8.    System Command  . . . . . . . . . . . . . . . . . . . . . . . .   8

9.    Conditionals  . . . . . . . . . . . . . . . . . . . . . . . . .   9

10.   String Manipulation . . . . . . . . . . . . . . . . . . . . . .   9

11.   Printing  . . . . . . . . . . . . . . . . . . . . . . . . . . .  11

Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . . . .  11

References  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11

Appendix A.    Summary of Built-ins . . . . . . . . . . . . . . . . .  12


                                                            Last Page  12




                             TABLE OF TABLES


Table 1.    Operators Processed by 'eval' . . . . . . . . . . . . . .   6

-------------------------------- Page  1 --------------------------------

ABSTRACT

M4 is a macro processor available on UTS.  It has been used for languages
as diverse  as C  and Cobol.   M4 is particularly  suited for  functional
languages like C, Fortran, and PL/I since macros are specified in a func-
tional notation.

M4 provides features seldom found  even in much larger macro  processors,
including:

  *  arguments
  *  condition testing
  *  arithmetic capabilities
  *  string and substring functions
  *  file manipulation

This paper is a user's manual for M4.




1.    INTRODUCTION

A macro processor is a useful way  to enhance a programming language,  to
make it more palatable or more readable, or to tailor  it to a particular
application.  The #define statement in C and the .de command of nroff are
examples  of  the  basic  facility  provided  by  any  macro  processor--
replacement of text by other text.

The M4 macro processor  is an extension  of a macro  processor called  M3
that was written by  D. M. Ritchie for  the AP-3 minicomputer; M3 was  in
turn based on  a macro  processor described in  [1].  Readers  unfamiliar
with the basic  ideas of macro processing  may wish to  read some of  the
discussion there.

M4 is a suitable  front end for  C, and has  also been used  successfully
with Fortran and Cobol.   Besides the straightforward replacement of  one
string of text by another, it provides macros with arguments, conditional
macro expansion,  arithmetic,  file manipulation,  and  some  specialized
string processing functions.

The basic operation of  M4 is to copy  its input to  its output.  As  the
input is read,  however, each  alphanumeric 'token' (that  is, string  of
letters and digits) is checked.  If it is the  name of a macro, then  the
name of the  macro is replaced  by its defining  text, and the  resulting
string is pushed  back onto  the input to  be rescanned.   Macros may  be
called with arguments, in which case the arguments are collected and sub-
stituted into  the  right  places  in the  defining  text  before  it  is

-------------------------------- Page  2 --------------------------------

rescanned.

M4 provides a collection of  about twenty built-in functions for  various
useful operations; in addition, the  user can define new macros.   Built-
ins and user-defined macros work exactly  the same way, except that  some
of the built-ins have side effects on the state of the process.




2.    USAGE

Usage is:

     m4 [file ...]

Each argument file is processed in order;  if there are no arguments,  or
if an argument  is '-', the standard  input is read  at that point.   The
processed text is written on the  standard output, which may be  captured
for later processing with

     m4 [file ...] > output_file




3.    DEFINING MACROS

The primary built-in function of M4 is define, which defines new  macros.
The input

     define(name, stuff)

causes the string name to be defined as stuff.  All later occurrences  of
name will be replaced by stuff.  name must be alphanumeric and must begin
with a letter (the underscore _ counts as  a letter).  stuff is any  text
that contains balanced parentheses; it may stretch over multiple lines.

Thus, as a typical example,

     define(N, 100)
      ...
     if (i > N)

defines N to  be 100, and  uses this  'symbolic constant' in  a later  if

-------------------------------- Page  3 --------------------------------

statement.

The left parenthesis must immediately  follow the word define, to  signal
that define has arguments.  If  a macro or built-in name is not  followed
immediately by '(', it is assumed to have no arguments.  This explains  N
above, which is  really a macro with  no arguments, and  thus when it  is
used there need be no (...) following it.

You should also notice that a macro name is only recognized as such if it
appears surrounded by nonalphanumerics.  For example, in

     define(N, 100)
      ...
     if (NNN > 100)

the variable NNN is  absolutely unrelated  to the defined  macro N,  even
though it contains many N's.

Things may be defined using other things.  For example,

     define(N, 100)
     define(M, N)

defines both M and N to be 100.

What happens if N is redefined?  Or, to say it another way, is M  defined
as N  or as  100?  In  M4, the latter  is true--M  is 100, so  even if  N
changes later, M does not.

This behavior arises because M4  expands macro names into their  defining
text as soon as it possibly can.  Here, that means that when the string N
is seen as the arguments of define are being collected, it is immediately
replaced by 100, just as if you had said

     define(M, 100)

in the first place.

If this isn't what you  really want, there are two  ways out of it.   The
first, which is specific to this example, is to interchange  the order of
the definitions:

     define(M, N)
     define(N, 100)

Now M is defined to be the string N, so when you ask for M later,  you'll
always get the value of N at that time (because the M will be replaced by
N, which will be replaced by 100).

-------------------------------- Page  4 --------------------------------

4.    QUOTING

The more general solution is to delay  the expansion of the arguments  of
define by quoting them.  Any text surrounded by single quotes (`') is not
expanded immediately, but has the quotes stripped off.  If you say

     define(N, 100)
     define(M, `N')

the quotes around the N  are stripped off as the  argument is being  col-
lected, but they have  served their purpose, and  M is defined to be  the
string N, not 100.   The general rule  is that M4  always strips off  one
level of single  quotes whenever  it evaluates something.   This is  true
even outside macros.  If you want the word  define to appear in the  out-
put, you have to quote it in the input, as in

     `define' = 1;

As another instance of the  same thing, which is  a bit more  surprising,
consider redefining N:

     define(N, 100)
      ...
     define(N, 200)

Perhaps regrettably, the N in the second definition is evaluated when  it
is seen; that is, it is replaced by 100, as if you had written

     define(100, 200)

This statement is ignored by  M4, since you can  only define things  that
look like names, but it obviously doesn't have the effect you wanted.  To
really redefine N, you must delay the evaluation by quoting:

     define(N, 100)
      ...
     define(`N', 200)

In M4, it is often wise to quote the first argument of a macro.

If ` and ' are not convenient for  some reason, the quote characters  can
be changed with the built-in changequote:

     changequote([, ])

makes the new  quote characters  the left  and right  brackets.  You  can
restore the original characters with just

     changequote

-------------------------------- Page  5 --------------------------------

There are two additional built-ins  related to define.  undefine  removes
the definition of some macro or built-in:

     undefine(`N')

removes the definition of N.  (Why are the quotes absolutely necessary?)
Built-ins can be removed with undefine, as in

     undefine(`define')

but once you remove one, you can never get it back.

The built-in ifdef provides a  way to determine if  a macro is  currently
defined.

ifdef permits three arguments;  if the  name is undefined,  the value  of
ifdef is then the third argument, as in

     ifdef(`mac', defined, not defined)




5.    ARGUMENTS

So far we have discussed the simplest form of macro processing--replacing
one string by another (fixed) string.  User-defined macros may also  have
arguments, so different invocations can  have different results.   Within
the replacement text for a macro (the second argument of  its define) any
occurrence of $n will be replaced by the  nth argument when the macro  is
used.  Thus, the macro bump, defined as

     define(bump, $1 = $1 + 1)

generates code to increment its argument by 1:

     bump(x)

is

     x = x + 1

A macro can have as many arguments as you  want, but only the first  nine
are accessible, through $1 to $9.  (The macro name itself is $0, although
that is  less  commonly  used.)   Arguments that  are  not  supplied  are
replaced by null strings, so  we can define a macro cat that simply  con-
catenates its arguments, like this:

-------------------------------- Page  6 --------------------------------

     define(cat, $1$2$3$4$5$6$7$8$9)

Thus

     cat(x, y, z)

is equivalent to

     xyz

$4 through $9 are null, since no corresponding arguments were provided.

Leading unquoted blanks, tabs,  or new-lines that  occur during  argument
collection are discarded.  All other white space is retained.  Thus

     define(a,   b   c)

defines a to be b___c.

Arguments are separated by commas, but parentheses are counted  properly,
so a comma 'protected' by parentheses does not delimit an argument.  That
is, in

     define(a, (b,c))

there are only two arguments; the second is (b,c).  And of course a  bare
comma or parenthesis can be inserted by quoting it.




6.    ARITHMETIC BUILT-INS

M4 provides  two built-in  functions  for doing  arithmetic  on  integers
(only).  The simplest is incr,  which increments its numeric argument  by
1.  Thus to handle the common  programming problem of needing a  variable
defined as 'one more than N', write

     define(N, 100)
     define(N1, `incr(N)')

Then N1 is defined to be one more than the current value of N.

The more  general mechanism  for arithmetic  is a  built-in called  eval,
which is  capable of  arbitrary  arithmetic on  integers.  The  following
table lists the operators in decreasing order of precedence:

-------------------------------- Page  7 --------------------------------

                Table 1.    Operators Processed by 'eval'

          _____________________________________________________
         |_operator_______|__operation________________________|
         | + -            |  unary plus, minus                |
         | ** (or ^)      |  exponentiation                   |
         | * / %          |  multiplication, division, modulus|
         | + -            |  addition, subtraction            |
         | == != < <= > >=|  relationals                      |
         | !              |  logical not                      |
         | & (or &&)      |  logical and                      |
         |_|_(or_||)______|__logical_or_______________________|


Parentheses can group  operations when  needed.  All the  operands of  an
expression given to eval must  ultimately be numeric.  The numeric  value
of a true relation  (like 1>0) is 1,  and false is  0.  The precision  in
eval is 32 bits.

As a simple example, suppose we want M to be 2**N+1.  Then

     define(N, 3)
     define(M, `eval(2**N+1)')

As a matter of principle, it is advisable to quote the defining text  for
a macro unless  it is simple (say  just a number);  it usually gives  the
result you want, and is a good habit to get into.




7.    FILE MANIPULATION

You can include a new file in the input at any time by the built-in func-
tion include:

     include(file_name)

inserts the contents of file_name in  place of the include command.   The
contents of the file is often a set of definitions.  The value of include
(that is, its replacement text) is the contents of the file; this can  be
captured in definitions, etc.

It is a fatal error if the file named in include cannot be accessed.   To
get some control over this, the alternate form sinclude can be used; sin-
clude ('silent include') says  nothing and continues  if it can't  access
the file.

-------------------------------- Page  8 --------------------------------

It is also possible to divert the output of M4 to temporary files  during
processing, and output the  collected material on command.  M4  maintains
nine of these diversions, numbered 1 through 9.  If you say

     divert(n)

all later output is put onto the end of  a temporary file referred to  as
n.  Diverting to this file is stopped by another divert  command; in par-
ticular, divert or divert(0) resumes the normal output process.

Diverted text is normally output  all at once at  the end of  processing,
with the diversions output in numeric order.  It is possible, however, to
bring back diversions at any time, that is, to append them to the current
diversion.

     undivert

brings back all diversions in numeric order, and undivert with  arguments
brings back  the selected  diversions  in the  order given.   The act  of
undiverting discards the diverted stuff, as does diverting into a  diver-
sion whose number is not between 0 and 9 inclusive.

The value  of  undivert is  not  the diverted  stuff.   Furthermore,  the
diverted material is not rescanned for macros.

The built-in divnum returns the number of the currently active diversion.
This is zero during normal processing.




8.    SYSTEM COMMAND

You can run any  program in the  local operating system  with the  syscmd
built-in.  For example,

     syscmd(date)

runs the date command.  Normally syscmd would  create a file for a  later
include.

To simplify making unique file names, the built-in maketemp is  provided,
with specifications identical to the system function mktemp: a string  of
'XXXXXX' in the argument  is replaced by  the process ID  of the  current
process.

-------------------------------- Page  9 --------------------------------

9.    CONDITIONALS

There is a built-in called ifelse that allows arbitrary conditional test-
ing.  In the simplest form,

     ifelse(a, b, c, d)

compares the two strings a and b.  If these are identical, ifelse returns
the string  c; otherwise  it returns  d.  Thus  we might  define a  macro
called compare that compares  two strings  and returns 'yes'  or 'no'  if
they are the same or different.

     define(compare, `ifelse($1, $2, yes, no)')

Note the quotes, which prevent premature evaluation of ifelse.

If the fourth argument is missing, it is treated as empty.

ifelse can have any number of arguments, and thus provides a limited form
of a multiway conditional.  In the input

     ifelse(a, b, c, d, e, f, g)

if the string a matches the string b, the  result is c.  Otherwise, if  d
is the same as e,  the result is f.  Otherwise  the result is g.  If  the
final argument is omitted, the result is null, so

     ifelse(a, b, c)

is c if a matches b, and null otherwise.




10.   STRING MANIPULATION

The built-in len returns the length of the string that makes up its argu-
ment.  Thus

     len(abcdef)

is 6, and len((a,b)) is 5.

The built-in substr can produce  substrings of strings.   substr(s,_i,_n)
returns the substring of s that starts at the ith position (origin zero),
and is n characters  long.  If n is  omitted, the rest  of the string  is
returned, so

-------------------------------- Page 10 --------------------------------

     substr(`now is the time', 1)

is

     ow is the time

If i or n are out of range, various sensible things happen.

index(s1,_s2) returns  the index  (position) in  s1 where  the string  s2
occurs, or  -1 if  it  doesn't occur.   As with  substr,  the origin  for
strings is 0.

The built-in translit does character transliteration.

     translit(s, f, t)

modifies s by replacing  any character  found in f  by the  corresponding
character of t.  That is,

     translit(s, aeiou, 12345)

replaces the vowels by the corresponding digits.  If t is shorter than f,
characters that don't have an entry in t are deleted; as a limiting case,
if t is not present at all, characters from f are deleted from s.  So

     translit(s, aeiou)

deletes vowels from s.

There is also a built-in called dnl that deletes all characters that fol-
low it up  to and including the  next new-line; it  is useful mainly  for
throwing away empty lines that  otherwise tend to  clutter up M4  output.
For example, if you say

     define(N, 100)
     define(M, 200)
     define(L, 300)

the new-line at the end of each line is not part of the definition, so it
is copied into the output, where it may not be wanted.  If you add dnl to
each of these lines, the new-lines will disappear.

Another way to achieve this is:

     divert(-1)
          define(...)
           ...
     divert

-------------------------------- Page 11 --------------------------------

11.   PRINTING

The built-in  errprint writes  its arguments  out on  the standard  error
file.  Thus you can say

     errprint(`fatal error')

dumpdef is a debugging aid that dumps the current definitions of  defined
terms.  If there are no arguments, you get everything; otherwise  you get
the ones you name as arguments.  Don't forget to quote the names!




ACKNOWLEDGEMENTS

We are indebted to Rick  Becker, John Chambers,  Doug McIlroy, and  espe-
cially Jim Weythman, whose pioneering use of M4 has led  to several valu-
able improvements.  We are also  deeply grateful to Weythman for  several
substantial contributions to the code.




REFERENCES

 [1]  B. W. Kernighan and P. J. Plauger, Software Tools,  Addison-Wesley,
      Inc., 1976.

-------------------------------- Page 12 --------------------------------

APPENDIX A.    SUMMARY OF BUILT-INS

Each entry is preceded by the section number where it is described:

       4     changequote(L, R)
       3     define(name, replacement)
       7     divert(number)
       7     divnum
      10     dnl
      11     dumpdef(`name', `name', ...)
      11     errprint(s, s, ...)
       6     eval(numeric_expression)
       4     ifdef(`name', this_if_true, this_if_false)
       9     ifelse(a, b, c, d)
       7     include(file)
       6     incr(number)
      10     index(s1, s2)
      10     len(string)
       8     maketemp(...XXXXXX...)
       7     sinclude(file)
      10     substr(string, position, number)
       8     syscmd(s)
      10     translit(string, from, to)
       4     undefine(`name')
       7     undivert(number, number, ...)

-------------------------------- The End --------------------------------
