Mike,
If the input common source (CS) transistor sees an almost 0 ohm load ( the input Z of the transimpedance pair , if the loop gain is high enough ) the time constant of that node will be very minimized. Also, because this first stage has no voltage gain, Cgd1 is not multiplied by Miller feedback when reflected into the input. The CS input device has been unilateralized ... and its output behaves close to an ideal current source.
Then, if ( BIG IF ) the second and third transistors could be modeled as a TIA, with gain Rm, the total gain at midband frequencies is (very) approximately equal to gm1.Rm .
The problem has now been transfered to the proper design of the combination of M2 and M3. Yet another twist would be to have M2 and M3 as cascode pair, instead of a cascade CS - CS.
Here is my second attempt at ASCII art of the AC circuit.
Thanks , Jure Z.