I am driving much of my software from data excised from formal documents that describe the algorithms involved in much more detail (and modalities) than is possible with (textual) "source code".
I'm a huge fan of table-driven applications so I often express components of algorithms in tables, then excise the tables from the document and propagate them into the formal "sources" (all mechanically, of course).
This minimizes the chance for typographical errors to creep into the "code" between the documentation and the executable. It also makes it difficult for the code to evolve without the documentation coming along *with* it!
And, of course, it allows things to be expressed in forms that are more intuitive/self-documenting than would otherwise be available with "ASCII text".
So far, I've been creating ad hoc tools to extract the needed components from the documents. The markup language used in the documents is well documented and the way I build my documents makes it fairly easy to isolate the components of interest and "extract them".
For example, to extract a particular table, I invoke a tool I wrote with the command line: gettable TABLETITLE [,] and redirect the output to a file (which is later massaged by an application specific tool/script to get it into a form suitable for #include in a source). This knows how to parse the (nested) tags of the MU language until it finds the table having the specified TABLETITLE (string); then, extracts lists of (font,string) tuples for each cell in the specified columns of the table.[other tags associated with the cell only contain cosmetic information -- line spacing, text alignment, etc. -- so they can be ignored]
But, I'm looking at other options for a more generic solution to this problem.
E.g., I wrote a formal grammar for the markup language so I can build a specific parser to extract what I need *using* arbitrary parts of that grammar (e.g., if I later decide the *color* of the text in a cell is important -- highly doubtful!).
I'm also looking at building a formal DTD for the MU language and seeing what XML-ish tools exist to do these sorts of things.
The downside of a more "involved"/capable solution is it gets more tedious to maintain -- especially as the MU language evolves! And, testing the tool becomes a project in itself! :<
So, specifically, what sorts of OTS tools (prefer ones with sources that I can modify) exist that will let me do things like parsing to a particular nested tag, verifying the attribute associated with it matches what I seek (e.g., TABLETITLE) then extracting all (and ONLY!) attributes of specific tags contained nested *within* that context?
I.e., I want to be able to specify what parts of the tree to extract based on criteria I specify on a command line.