When you merge two FSM you often get redundant "don't care" nodes, but you also can get nodes which either are impossible to enter [dead code], or impossible to leave [halt], because there are no legal transitions that will permit it. Joining FSM involves identifying and pruning both types of nodes.
No. Consider the case of just recognizing a decimal digit: compare the graph using the alternation: (0|1|2|3|4|5|6|7|8|9), vs the graph using the class [:digit:].
Using the OR alternation, including start you have 11 nodes. Start has
10 transitions exiting, and each digit node has a single transition entering.Using the digit class, you have 2 nodes, with 10 transitions that all get you from start to the digit class node.
Obviously this is simplistic, because the members of the character class form a subgraph which itself has to be recognized. The important point here is that the subgraph as a whole can represent a /single/ node in a much more complex graph - its constituent characters need not be repeated in the complex graph. More on this below.
A complex DFA that combines many different regex may present other opportunities to recognize given (possibly arbitrary) sets of characters - opportunites that may not be apparent from looking at the constituent regex.
When given the option to find equivalence classes, flex can identify sets of characters that are used repeatedly. Those characters are gathered into an "equivalence" that then can be a node in the DFA instead of redundantly repeating individual characters.
Remember DFA are deterministic - a node can't take different actions depending on which of multiple transitions entered (or left) it ... so if you want the same character to be recognized in a different context (leading to a different action), you must repeat it in a different node.
This is where being able to identify, essentially arbitrary, sets of character and coalesce them into a recognizer "class" is useful. If a given set of N(>1) characters is used M times in the graph, then by coalescing them you remove M(N-1) nodes from your graph. The number of /transitions/ in the graph remains the same, but recall that it is the /nodes/ that consume space in the lexer tables.
You're mixing abstraction levels here: <digit>, <digits>, <number>, and <value> are lexical tokens, whereas <expr> is syntax.
However ...
Knowing that yacc and bison CAN handle characters as tokens, and assuming you have defined <digit> and <digits> elsewhere in your grammar, neither yacc nor bison can find this kind of equivalence. In yacc it will result in a reduce/reduce error. In bison what happens depends on the kind of parser you asked for (LALR,SLR,LR,GLR), but in any case the result won't be pretty.
Assuming instead that you meant for <number> and <value> to be recognized by the lexer rather than the parser ... flex (not lex) could discover that <number> and <value> are equivalent, but since they would lead to different actions: returning a different token - both would included in the DFA. However, whichever one happened to be tried first would be the only one that ever was recognized, and your parser would only ever get one of the expected tokens.
Algorithms for turning graphs into table driven FSM, or equivalently a switch / case statement, are well known.
Assuming an appropriate graphical IDE, a designer certainly could specify a state graph and code for actions, and have a program generated from it. Given the right input from the designer, a great deal of checking could be done against the graph to verify that it covers enumerated inputs and transitions, that specified inputs lead to specified actions, that action code exists, etc.
But what is NOT possible is to verify that all /possible/ inputs and state transitions have been enumerated. Nor is it possible to verify that required actions have been specified, or necessarily that the actions are being taken in proper context ... those are things for which the tool simply MUST trust the graph designer.
George