Parsers for extensible grammars?

[snip]

Are you familiar with how the VMS operating system handles a similar issue for it's CLI command language ?

The VMS CLI is called DCL and DCL has an extendable command language which allows users/programmers to add new commands, which follow certain DCL imposed syntax rules, to the process specific command table.

Unlike with Unix and friends, much of a DCL command line, such as if a qualifier/option is supported for this command, or if required values for a option is missing, is actually validated by DCL before the executable behind the command itself is actually run.

This is possible because each command has a command definition file which describes what the list of options, their types, and what combinations are disallowed, for each command. This command definition file is compiled into a binary form which is directly accessible from DCL.

While this isn't a direct match for what you are describing, it sounds pretty close and you might get some ideas from it.

The VMS documentation is online at:

formatting link

and you want the first part of the "HP OpenVMS Command Definition, Librarian, and Message Utilities Manual", which is available from:

formatting link

(Ignore the Message Utilities and Librarian sections; they are not relevant here.)

Any other manuals mentioned in this manual are also available from the above os84_index.html link. I recommend you follow the PDF link for each manual; the HTML documentation is not as well done as it should be.

Simon.

PS: And yes, this is a part of my day job. :-) (For now...)

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley
Loading thread data ...

I$DISCARDED$MY$3000AXP$A$DECADE$AGO

Ah, I didn't realize that! Interesting approach -- assuming there are no other ways to invoke binaries that bypass this.

It's worth a closer look!

Thanks!

Reply to
Don Y

Commands can exist outside of the above infrastructure if one wishes; they are called foreign commands and are treated pretty much like a Unix command would be - there's no validation of options prior to the executable starting and hence there's no nicely pre-parsed options and option values ready to be read from within the program itself.

Foreign commands are used for example when porting Unix tools to VMS as it means the existing command line parsing code in the tool can be used pretty much unchanged.

On VMS, if your executable takes no command line input, you can also just run the executable with the run command.

However it's your choice - you can have DCL do some of the validation work for you by having your program integrated into the above DCL infrastructure or you can do it all yourself in your program just as you do on Unix and friends.

IOW, having foreign commands available as an option doesn't stop you from also having the native DCL integrated approach. If you try to run a DCL integrated executable as a foreign command, no values will be available to read from within your program so you are forced to run it via the DCL mechanism above which means you also get the DCL level validation as well.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

You can *freely* mark any command as either type? I.e., can I mark a command that relies on the DCL stuff for option parsing as a FOREIGN command? And, thus, screw it up (at runtime)?

So, this (DCL) mechanism is meant as an *aid*/service -- not as a means of ensuring software integrity (?).

Reply to
Don Y

Don is a hard person to suggest things to. I'm actually surprised that his response was as positive as it was.

I have had great success at solving many problems with Forth. There are text file formats can be read by treating various words as commands, defining those words in Forth to execute while reading the file and storing the numeric data where it needs to be stored. Easy peasy. Not quite the same thing as what Don is likely doing, but very similar.

Don may be looking for something that will let him check the input for the correct syntax, number of values, etc. But as usual he has not really defined the problem he is trying to solve.

--

Rick
Reply to
rickman

That is starting to sound a lot like Forth.

--

Rick
Reply to
rickman

Forth could easily be a better choice. I need to spend more time on Forth.

I tend towards Tcl because I have a large codebase of scripts for it. It also excels at socket/serial port handling.

--
Les Cargill
Reply to
Les Cargill

In many ways Forth is amazingly simple. You define words (subroutines) that have actions. Words are stored in the dictionary. Forth has built in a "parser" that scans the input for words and numbers (in that order). The dictionary is searched for words in the input stream and when found they are executed. If no word is found Forth checks to see if the "word" is actually a number. If it is a number it is pushed onto the stack. Pretty simple, no?

The action of a word can make use of system words to further parse the input stream. This is done if the input "grammar" is not RPN style with the values first (nouns) and the words (verbs) last. This is done even for some Forth words like "TO" which is used to store a value in a variable, e.g. "99 TO BottlesOfBeer".

Most people find the use of the stack to be a problem for them while it is really no big deal. It's just different. Forth has other issues which relate to the fact that not so many people use it. But it seems to be a very useful tool to me.

--

Rick
Reply to
rickman

I don't know if Don has tried playing with Forth or not, I just made a suggestion that what he was seeking to do sounded somewhat Forth-like in nature. With somewhat more difficulty he could probably look at re-creating a MSDOS type environment which would also suit the bill (new commands in batch files etc or added programmes in the COMMAND directory).

You are probably right about a lack of Clear, Concise, Correct, Coherent, Complete & Confirmable (Testable) specification of the requirements.

--
******************************************************************** 
Paul E. Bennett IEng MIET..... 
Forth based HIDECS Consultancy............. 
Mob: +44 (0)7811-639972 
Tel: +44 (0)1235-510979 
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. 
********************************************************************
Reply to
Paul E Bennett

Yes, but it would fail in a clean way with a status code returned from the CLI routine called by the program to return the (non-existent) pre-parsed information.

The idea behind pointing you to the DCL CLD material was to give you some possible ideas about how extendable CLIs with validated options are handled in another environment (ie: VMS).

Like I said at the time, it's not an exact match for your requirements but I thought you might be interested in seeing how a similar problem was handled in VMS, including the syntax used in the command definition file (and compare it to how the problem is fully pushed to the executable itself in Unix land).

It's a way of expressing an expected structure for a command line which can be somewhat validated by DCL before the program even starts and provides a robust, operating system level, method for a program to obtain command line parameters and options in a way (and with functionality) that leaves getopt and friends standing in the dust.

The relevance here is that I've encountered people who have never been exposed to the VMS way of handling this and who think that ad-hoc getopt style functionality is the only _possible_ way to parse command lines. I just wanted to make reference to another way of doing this in case it gave you some ideas even though there's probably nothing you can _directly_ use here.

Sometimes Don you leave your questions wide open, presumably in order to invite a wide range of options in response, including ones you had never even considered. The difficulty with that is that sometimes it's hard to understand what additional unspoken constraints might exist. :-)

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

It sounds like you are setting the bar a bit higher than I was looking for. I'd just like to get an idea of what he is doing. There is a long history of Don's posts asking for info on a very specific item. When people ask him for details which are often where the crux of the problem lies, he shares sparingly... but only in the sense of info, while being very prolific with words.

As people try to suggest he is decomposing the problem in an odd way which makes the problem harder to solve he throws out more details that seem to justify a rather unique approach. In the end those who are trying to help get frustrated because of the difficulty and Don gets frustrated because people seem to be argumentative rather than helpful.

I think his original post on this was something along the lines of, "How long is a piece of string?" or maybe a better analogy is, "Where can I get some string that will be long enough to do my job and strong enough to not break on the job but will break when I want to break it." I haven't read all the chit-chat that has evolved from that. I was going to suggest Forth might be useful and saw your post.

--

Rick
Reply to
rickman

Understood. I.e., the DCL-ed executables aren't *fed* arguments that the front-end has testified as "valid". Rather, it *fetches* that information from the front end. As such, if the front end wasn't in place when the executable was invoked, the executable is aware of that.

Yes.

Exactly!

Understood. I'm not fond of the VMS syntax. But, will be studying the implementation to see if I can extract some ideas and apply them to a more suitable syntax.

I pass along the minimum constraints under which I am operating. If I

*add* constraints, then I risk steering solutions into a specific direction that isn't necessarily required for the problem at hand. Or, discounting solutions that could be viable.

I engage clients with the same sort of approach: "Is what you are telling me a REQUIREMENT or just your idea of how I *might* approach the problem?" The fewer the requirements, the more flexibility in the solution. This almost always leads to a more optimized solution.

By contrast, wanting extra ARTIFICIAL constraints boxes the design into a corner. "Why are you doing it like that?" "Because that's how everyone else does it!" or "Because that's how I *thought* it would be done."

In this case, I asked for a "command parser" that was "extensible". And, indicated that those extensions would be implemented at compile time (so, any solution need not accommodate dynamic modifications). I also introduced contrived examples for folks who couldn't think entirely in the abstract: "Imagine these were the common commands and you wanted to add commands LIKE these..." Anything beyond this would change the problem definition.

Note that I didn't ask for a "programming language" -- or even a scripting language (yet, at least one proposed solution fits that description AND fits my requirement!).

OTOH, if I had augmented my question as "an extensible scripting language", I would risk YOUR solution potentially disqualifying itself. :-/

Reply to
Don Y
[Forth]

Wow! I was *amazed* at the intensity of the reactions to this suggestion when I circulated it among colleagues! In hindsight, I wish I had presented the suggestion as "a generic scripting language" instead of "Forth" as many of the replies seemed to

*choke* on the mention of Forth! Despite the fact that several of us use Open Firmware. [Still waiting for replies from two other folks, but...]

I guess I don't really understand the "disapproval" (for want of a better word). Granted, it's been almost 40 years since I used Forth but I don't have a "bad taste" associated with the experience.

I tried to quickly shift the discussion away from Forth by offering other "simple"/small languages as alternatives. But, I think the idea had already taken on a taint. I even proposed a small C interpreter for that role ("Why interpret C when you can COMPILE it??"). I have an even simpler "language" I'll try proposing but fear it won't fly, either!

(sigh) I will have to wait for attitudes to settle back down before looking for clarification of this "rejection". And, perhaps reexamine Forth to get a feel for what folks might be objecting to...

Reply to
Don Y

Very.

--
Les Cargill
Reply to
Les Cargill

I like Simon's idea re: DCL. I only used VMS briefly in school and I wasn't aware that DCL could be extended in that way. It's a nice way to regularize at least the argument input.

If all you want is to regularize input, another way might be to borrow from the HTML fast-cgi interface. It doesn't guarantee anything about interoperability per se, but it does arrange that programs get their arguments in recognizable key:value pairs.

Ouch. I don't leave anything on provider servers either, but my own IMAP server is backed up regularly.

What I was thinking about is less important given compile time parser generation ... we were talking about providing API object and entry point addresses to runtime generated code.

The relevance (if any) depends on whether you want to allow grammar extensions direct access to your device API: i.e. whether the extensions should be allowed to manipulate your (part of the) device or whether they have to go through your provided language to do it.

That's the safe approach, but it's not terribly friendly ... particularly in the case of a reusable component where you don't have any say in the use. A new developer may want to completely change the approach to the command set.

It's reasonable to expect that many developers will have little experience with _formal_ parsing methods and tools ... quite a lot of applications can get by with RegEx or even just separator tokenization and have no need of any formal grammar.

George

Reply to
George Neuner

Do any of these people use RPN calculators ?

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

I'm looking into the implementation to see which ideas I can borrow. I think I have a "prettier" way of creating the syntax.

I move my mail archive to a little server that does my DNS, TFTP, font, DHCP, etc. services periodically. Lets me view it as mail (instead of as a raw "mailbox"). That box is the only thing that runs 24/7/365, here so it's the destination for just about everything!

[It's more of an appliance than a server]

In my eagerness to get rid of kit before the annual equipment upgrade cycle, I opted to replace that box -- it had been running for several years (uptime was over 1000 days) so I felt it "deserved" to be replaced.

I moved everything onto a *smaller*, faster, less power-hungry box. Life was good. Another bit of kit into the bin! :>

Unfortunately, the box only lasted a few weeks before the disk (apparently) spun a bearing. Current thinking is it wasn't a suitable orientation to *mount* the disk and/or too close to the main heatsink (the box is REALLY small! had to shoehorn the

2.5" disk in there just to get it to fit!).

Anyway, easy to recreate everything for which I had sources -- even my databases! Mail archive has never been something I considered "precious" so that was *the* backup (aside from what was on my MUA at the time).

Rather than risk another disk in the "new" appliance (which was never intended to have a physical disk drive within), I replaced *that* with another appliance that *does* accommodate a disk drive (even having a fan just for the drive).

Moral: "If it ain't broke, don't fix it!"

When the box that hosts the MUA went down, I took the opportunity to replace it (preserving the mail that *was* on it at the time).

So, I managed to get rid of some kit -- and the most recent chunk of my mail archive in the process.

The goal is for people to be able to extend the "system" -- in whichever directions THEY choose -- and provide a simple way of configuring those enhancements (software/hardware/systems). It's a "low value" operation and, as such, doesn't merit lots of resources. OTOH, it shouldn't require "surgery" to tweek variables deep in the sources (these sorts of "selections" should be very visible without detailed inspection).

I've considered it in the context of "settings" -- almost like twiddling envars -- except it need not be. I.e., the "commands" could initiate

*actions* if so desired.

I *hadn't* considered it as a "programming/scripting language" (which is why I called it a "command parser" and presented command line argument parsing and configuration file parsing as the examples to illustrate its intent.

There *could* be some value to a real procedural language, there. But, when I conveyed the "Forth" suggestion to colleagues, it didn't fare well (I suspect that had more to do with "Forth" than the "programming language aspect" of the idea -- almost provincial!

[No desire to take on THAT fight, thankyouverymuch!]

That's why they were called "common commands" -- things that you can

*rely* upon regardless of implementation.

Well, he can always rewrite the subsystem! :> I doubt it will take me more than a week or two to put it in place (though in this first instance, there are two other subsystems that I want to add/modify in the process for a more integrated experience)

Exactly. That was my (upthread) observation of what I've encountered in config file parsers, command line option parsing, etc. When you consider how "rich" some of those environments are (options, etc.), it's a wonder that so much ad hoc parsing is done! I can see "evolution"/feeping creaturism explaining some of it (i.e., "it wasn't this complex when we started") but I doubt that's the real cause.

I suspect it's more one of familiarity: you write "generic code" everyday. So, writing something to extract a numeric value from the "second whitespace-delimited field" in a statement is trivial. OTOH, writing a formal grammar so that a *tool* can do this for you AUTOMATICALLY probably means you spend more time relearning the tool than you would have spent writing the code!

Reply to
Don Y

I'm sure they/we all have in the past (I don't use a calculator).

And, several of us "regularly" use Forth in configuring, for example, SPARCstations, etc. (OpenFirmware)!

But, it's not something that is done *often* so there's always a relearning experience involved. Forth is quirky -- remembering which words serve which functionality, etc. (what word displays my IP address? how do I remove an alias from NVRAMrc? etc.)

By contrast, they/we have probably used a BASIC dialect even LESS recently than OFW -- yet, I would imagine all of us could craft a little BASIC "script" in a matter of seconds... and, be assured it would work first time.

Dunno. I'm only guessing as to the source of the "resistance" (resentment?). As I said, I tried to shift the discussion away from Forth, per se, to see if the idea of a procedural language was the issue ("We don't need that level of functionality -- just parse COMMANDS!") or the proposed language...

I'll revisit it, later (just for my own edification) but won't chase that approach after this sort of "reception" :-/

Reply to
Don Y

What exactly is the environment for this command parser? Will it run as a program under an OS? As a command processor for the OS? At the same level as OpenFirmware? If it runs as an app, then any Forth for that OS is a nearly instant solution. You can just ignore the added capabilities. The memory foot print of any Forth I have used is very small compared to the OS. As to the comment, "We don't need that level > of functionality -- just parse COMMANDS!", Forth largely *is* a command parser.

I can't explain why you can remember how to display an IP address in BASIC but not in Forth. That would seem to be a personal problem. How do you "remove an alias from NVRAMrc?" using BASIC? Is that really a program you can write in seconds and have work the first time? I'm not sure what that even means, lol.

--

Rick
Reply to
rickman

I guess the real question is why *you* should be providing that? You've already given them a way to control your gizmo and they have the gizmo's API with which to construct a different command set if they want.

Forth - and Lisp, too - has a bad rep in many circles ... which is unfortunate because sometimes it may be the simplest answer to the problem.

From what I've read I don't think you're really needing a programming language. But then I'm still trying to figure out how sophisticated the commands are likely to be ... they don't look like much from your examples, but the discussion hints at more complexity.

I think it's more panic upon seeing the manual's introduction to the method theory. The vast majority of programmers today have little or no formal schooling, and language theory waters (doesn't matter syntactic or semantic) get deep very fast.

As with most things, deep understanding isn't necessary to use the tools, but a programmer does need at least a passing familiarity with the tool's method to understand what it is trying to do and why it is failing. In this regard parser generators are less forgiving than the average compiler and it doesn't much matter which method(s) the tool uses - you can make as big a mess with LL(n) as with (LA|S)?LR(1) and quite easily hang yourself using PEG or GLR.

Learning to use a parser generator isn't really any harder than learning any other programming language, but the documentation tends to be scarier. This isn't helped by the tendency of tool developers to minimally document and to relate their wares to existing tools that the user is presumed already to be familiar with.

Nothing to be done about it except try to convince people that it really isn't that hard and that, for most purposes, they really should be using a generator tool rather than rolling their own.

The only really valid exceptions are very simple interface "languages" and very tiny systems 8-) that can't afford the overhead of a generated parser. The constant overhead of a generated parser is fairly large: with care using Bison you can shoehorn a fairly complex language into ~10KB ... but even the simplest VERB NOUN command parser is difficult to bring in under ~3KB. And Bison actually is pretty good at making small parsers - there are a few tools that are better, but many more are worse.

However, I think even the parser size argument is losing weight as the average "small" system keeps getting bigger (32-bit ARM running Linux toasting bread, etc.). Moreover, many small system developers are comfortable using (some form of) RegEx to handle interfaces and probably most have no clue about its intrinsic overhead.

George

Reply to
George Neuner

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.