From coff at tuhs.org  Tue Oct  7 03:39:10 2025
From: coff at tuhs.org (Douglas McIlroy via COFF)
Date: Mon, 6 Oct 2025 13:39:10 -0400
Subject: [COFF] [TUHS] Unix gre,
 forgotten successor to grep (was: forth on early unix)
In-Reply-To: <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev>
References: <96A17F58-C1D8-4CA6-BF2F-EABDE17DF02C@archibald.dev>
 <CAGfO01woXmf=e-XvwVq9Xfb1hCymztWRUuNOrcrk8eOE7HPvJw@mail.gmail.com>
 <CAGw7KvpCTsPy8s3mAmbnMynsposuq-rWerXn_2DMQwg4fDCiQQ@mail.gmail.com>
 <CAMP=X_=iMBARR+tRMbz_eTgNUJU9JCj9G8L6GrjxJtMcaV1L9A@mail.gmail.com>
 <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev>
Message-ID: <CAKH6PiU5GwAJtAK3HS3y7272-ovZ2PaS3Ls338wGvqO6hVejaQ@mail.gmail.com>

Since QED predated Unix, I'm redirecting this to COFF.

Ken's CACM article evoked an unusually harsh response in Computing
Reviews. The reviewer said roughly that everybody knows one can make a
deterministic recognizer that runs in linear time, so why waste our
time talking about an NFA recognizer? This moved me to write a letter
to the editor of CR in Ken's defense. I pointed out that a
deterministic recognizer can have  exponentially more states than the
number of symbols in the regular expression. This might well overflow
memories of the time and in any event would take exponential time to
construct and completely wipe out the advantage of linear recognition
time. (Al Aho had not yet invented the egrep algorithm, which
constructs only the states encountered during recognition.)

Computing Reviews did not have a letters section, so, as far as I
know, the off-base review still stands unrebutted in the literature.

Doug

On Mon, Oct 6, 2025 at 12:04 AM Thalia Archibald <thalia at archibald.dev> wrote:
>
> Ken,
>
> Your email reminds me of a comment you made in a 1989 interview with Mike
> Mahoney, that suggests something earlier than QED:
>
> > I did a lot of compiling. Even in college and out of college I did a lot of
> > on-the-fly compilers. Ah. ah. I wrote a GREP-like program. It would... You
> > type in …, you’d say what you wanted it to look for, and a sed-like thing
> > also. That you’d say, I want to do a substitute of A for B or some block of
> > text. What it would do is compile a program that would look for A and
> > substitute in B and then run the compiled program so that one level removed
> > from it do I direct my (unclear) and the early languages, the regular
> > expression searching stuff in ED and its predecessors on CTSS and those things
> > were in fact compilers for searches. They in fact compiled regular...
>
> https://www.tuhs.org/Archive/Documentation/OralHistory/transcripts/thompson.htm
>
> By anyone's history of regular expressions, your matcher in QED was the first
> software implementation of regular expressions. Was this grep-like program you
> wrote in college something earlier than that? Could you share more about it? Do
> you somehow still have the source for these? I'd love to study it.
>
> Thalia
>
> On Sep 23, 2025, at 11:40, Ken Thompson wrote:
> > i think the plan9 grep is the fastest.
> > it is grep, egrep, fgrep also.
> > i think it is faster than boyer-moore.
> > the whole program is a jit dfa
> >
> >   read block
> >   for c in block
> >   {
> >      s=s.state[c]
> >      if s == nil do something occasionally
> >   }
> >
> > it is a very few cycles per byte. all of the
> > time is reading a block. i cant imagine b/m
> > could be faster. the best b/m could do is
> > calculate the skip and then jump over
> > bytes that you have already read.
> >
> >
> > russ cox used it to do the (now defunct) code
> > search in google.
>
>

From coff at tuhs.org  Sat Oct 18 11:44:02 2025
From: coff at tuhs.org (steve jenkin via COFF)
Date: Sat, 18 Oct 2025 12:44:02 +1100
Subject: [COFF] [TUHS] To NDEBUG or not to NDEBUG, that is the question
In-Reply-To: <E1v9iqn-00000000fsD-41aV@gleep.local>
References: <E1v9iqn-00000000fsD-41aV@gleep.local>
Message-ID: <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au>

This thread, responding to the original, moved to COFF, not about Early Unix.
============================================================


> On 17 Oct 2025, at 22:42, Aharon Robbins via TUHS <tuhs at tuhs.org> wrote:
> 
> Now, I can understand why assert() and NDEBUG work the way they do.

> Particularly on the small PDP-11s on which C and Unix were developed,
> it made sense to have a way to remove assertions from code that would
> be installed for all users.


How many computing workloads are now CPU limited,
and can’t afford run-time Sanity Checking in Userland?

For decades, people would try to ‘optimise’ performance 
by initially writing in assembler [ that myth dealt with by others ].

That appears to have flipped to using huge, slow Frameworks,
such as Javascript / ECMA script for ‘Applications’.

I’m not advocating “CPU is free, we can afford to forget about optimisation”.

That’s OK with prototypes and ‘run once or twice’, 
human time matters more, but not in high-volume production workloads.

The deliberate creation of bloat & wasting resources (== energy & dollars)
for production work isn’t Professional behaviour IMHO.

10-15 years ago I saw something about Google’s web server 
CPU utilisation, being 60%-70%, from memory.

It struck me that “% CPU" wasn’t a good metric for throughput anymore,
and ’system performance’ was a complex, multi-factored problem,
that had to be tuned per workload and target metric for ‘performance’.

Low-Latency is only achieved at the cost of throughput.
Google may have deliberately opted for lower %CPU to be responsive.

Around the same time, there were articles about the throughput increase
and latency improvement by some large site moving to SSD’s.
IIRC, their CPU utilisation dropped markedly as well.

Removing the burden of I/O waits causing deep scheduling queues 
somehow reduced total kernel overhead. 
Perhaps fewer VM page faults because of shorter process residency…

I’ve no data on modern Supercomputers - I’d expect there to be huge
effort in turning resources for individual applications & data sets.

There’d be real incentive at the high-end to maximise ‘performance’,
as well as at the other end: low-power & embedded systems.

I’m more talking about Commercial Off the Shelf and small- to mid-size
installations:
- the things people run every day  and suffer from slow response times.

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin


From coff at tuhs.org  Sat Oct 18 14:11:03 2025
From: coff at tuhs.org (Lars Brinkhoff via COFF)
Date: Sat, 18 Oct 2025 04:11:03 +0000
Subject: [COFF] [TUHS] To NDEBUG or not to NDEBUG, that is the question
In-Reply-To: <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au> (steve
 jenkin via COFF's message of "Sat, 18 Oct 2025 12:44:02 +1100")
References: <E1v9iqn-00000000fsD-41aV@gleep.local>
 <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au>
Message-ID: <7wplak3l48.fsf@junk.nocrew.org>

Steve Jenkin wrote:
> How many computing workloads are now CPU limited,
> and can’t afford run-time Sanity Checking in Userland?

At my day job we have compiled with -g -O0 from day one, and we are not
eager to change.  I suppose if the project management starts to worry
about CPU load or memory shortage, then we'll turn on the optimizer.  We
have joked about adding ballast to the application, so we can score an
easy win when someone complains it's too big.