Not really ... pull vs push parsing is more like the difference between function call-return and CPS: the control path is inverted.
In a push design the lexer is fed characters one at a time and tries incrementally to match its partial input against the recognizer set. When a valid token is identified, that token is fed to the parser which tries incrementally to match partial token sequences against its rule set. When a rule is matched, the parser executes the associated user code as is usual.
The only additional complexity vs the normal pull design is that the lexer and parser have to remember the current state of their FSM between activations. Because bison/flex FSM are table driven, in practice that amounts to just a few integer values.
WRT memory use, a push lexer doesn't need any character buffering at all unless it must pass a matched string on to the parser. Input stream buffering (if any) is normally done by the surrounding program code. Character strings passed from lexer to parser need to be heap allocated, but the same is generally true in pull designs. Overall the push design can use less memory.
As far as catching errors early, the lexer is essentially fed one character at a time ... if it can tell that the character is out of place for a particular recognizer context, it can signal the error immediately (either directly or by passing an error token to the parser).
Yacc and lex are severely outdated - they now are kept only for backward compatibility. Bison and flex now are standard issue on nearly every Unix/Linux system. Besides which, anyone likely to understand an LALR grammar well enough to maintain/modify the code likely already knows about them.
Unless the language[*] is simple or the target platform extremely limited, a hand-crafted parser is almost never the right way to go. Good CCG tools like bison, Yacc++, ANTLR, LLgen, etc. have sound theory behind them and their results are predictable and reproducible. For non-trival grammars, generated parsers are likely to be both smaller and faster than most people could hand craft.
[*] a comm protocol is a language just as is a turing complete programming language. the only difference is complexity.Lexers are a different story. Lexers are relatively easy and when the set of recognizers is limited, table driven regex can be improved upon by hand coding. Quite a few real-world systems use a generated parser with a hand coded lexer.
Most use of the CPU stack is from user added code. FSM use in both tools is actually very modest as the generated code is basically a loop containing a big switch statement - there is no recursion. Yacc's PDA stack is optionally a static array or a heap allocation and each stack element is quite small: a couple of integer indexes and a "token" type (which by default is an integer but can be a user defined union or structure).
Push parsers/lexers created by bison/flex are just a big switch statements - they don't need the control loop.
I'm not sure what you mean by that - yacc/bison and (f)lex don't have any "configuration" problems that aren't part of any multiple input file project. However, there are better tools that will generate parser and lexer from a single grammar file.
I still think you should look into using a push parser.
George