I have a design with a large PLA, and I'm trying to make it run fast in a Spartan-3. It's too big for block RAM, since it has 25 inputs, slightly fewer than 512 product terms, and 32 outputs.
My first attempt was to just translate the PLA equations to VHDL and synthesize it with the default settings. This uses 34% of a 3S500E, and has a minimum cycle time of slightly under 12 ns with over 20 levels of logic. If I turned on timing-directed mapping, or increased the effort levels, I suppose it might get slightly better. But my own analysis suggests that it should be possible to implement the PLA with no more than 11 levels of logic worst case, seven levels for the product terms, and 4 levels for the sums.
Anyhow, as the subject line suggests, I'm interested in any clever tricks to get more efficient FPGA implementation of the PLA. For instance, would use of the carry chain speed up wide gates? (Or do the tools already infer that?) Should I put some constraint on the product terms, to keep the tools from merging the product term and sum logic?
I'll experiment with this myself, but perhaps someone here has already done this, in which case suggestions would be quite welcome.
Thanks, Eric