Please provide any evidence of this assertion :
" The price you pay is very large: unmaintainable, unreadable code which is probably an order of magnitude larger than proper RTL."
This coding style which you so clearly denegrate as sub par, is actually quite standard among high end chip development. Some reasons :
1) Much easier to swap flop models, because noone is allowed to write there own always @ flop blocks. To replace the library for flops, it is as simple as changing an include.
Attempting to do this in a "proper RTL" is actually "unmaintainable". Experience in porting a design from one technology library to the next will give you the type of experience that shows these coding standards are *necessary*, to be able to do this type of work, your "proper RTL" style is inflexible compared to this.
2) The synthesis tool does not care. If you write up a inverter going into a flop, or code up a flop with an inverting input, the synthesizer doesn't care where you placed the code. The final result is the exact same thing.
Now if you had followed these coding guidelines, swapping out this flop can be done by tweaking the include path for the library, you would have to visit every line of code in your design to see whether or not the always block is actually a flop, and then recode by hand. ( Good if you get paid by the hour, not so good for your employer ).
3) Rebalancing logic across clock domain crossings is easier when the logic is seperate from the flop : X's are flops (a,b,c) is assign wires
X1 --> a --> b --> c --> X2 X1 --> a --> b --> X2 --> c
The only changes that need to occur, is the input from the X2 is changed to be b, instead of c, and the input to c is changed to be the output of X2 instead of the output of b.
Using "proper RTL", you might have coded "a,b,c" inside of an always block. You then need to create more wires or modify an always blocks to pull this logic out, and then hook it up. At the end, after you have applied your timing fix, the code is larger.
Worse yet, you may have made a mistake. These things tend to happen, and when you recoded this flop, you have have left some path out, and have turned it into a latch. These types of mistakes are not possible to do in a instance -- assign -- assign -- instance methodology. because "always @(posedge...)" is not allowed in your code, it belong inside a library.
Now you may say "But I am smarter than that!", well that is nice for you. But when setting up a coding standard that needs to be used by hundreds of engineers, and verified by tools, ad-hoc methods of "proper RTL" get left in the dust behind rigid standards that prevent bad stuff from happening in the first place.
Please consider that this chip was probably design by a group of engineers easily topping 100+, there were many compiler tools, synthesis, and other tools that needed to manipulate this code, and get meaningful information from it. Having each engineer write in what you describe as "proper RTL" style is not acceptable in these situations. It is not flexible enough ( you have to add lines of code just to make timing fixes), error prone ( you can write logic that is not possible or available in your library ), and doesn't get you *any* better results.
I fail to see any benefit from using your "proper RTL" style. If there is some that would offset the costs I have listed above, I am open to reconsider. And I realize if you have not been exposed to these ideas before they may sound like problems that you have not faced. But these problems are common among large scale IC's that need to be taped out many technologies, and go through extensive ECO timing fixes to achieve maximum performance.
-Art