Hello Leon,
Register packing refers to the use of the LUT and FF in a Logic Element (LE) at the same time. There are four ways this can happen in Cyclone II FPGAs (and Cyclone, Cyclone III, and Stratix device architectures). Refer to Figure 2-2 in the handbook
formatting link
- The simplest example is a LUT whose output feeds a register -- you can place the register in the same LE, absorbing the connection.
- The next, which the data book defines as register packing, is the use of a LUT and a register for independent functions. This means using up to a 3-input LUT function plus an independent FF. Or if you happen to have a 4-LUT that shares an input with a FF, you could pack those together. The downside of these packings is that the software must be careful in what it chooses to pack -- taking two things that don't have anything in common and forcing them into the same spot in the chip could result in bad performance.
- A FF that drives a LUT. Notice the FF can directly drive the third input of the LUT. So the software can choose to pack a FF with its fanout instead of its fanin.
- If you have FFs in a register chain, this is a special version of (2). The FF can receive its input directly from the FF above it, and meanwhile the 4-LUT can be used for something else.
This architectural flexibility requires smart software (Quartus II) to work well. There are register packing options that allow you to control how aggressively the tool tries to pack logic with registers
-- but usually you don't have to. The default Auto mode tells the software to pack as much as it thinks it needs to. This means that usually, it will perform packings of logic & registers that want to be together (cases 1 and 3), and might happen to use cases 2 & 4 opportunistically. However, if Quartus notices that you're running out of LEs, it will start doing more of the type 2 and 4 packings. One consequence of this is that your "push-button" LE count doesn't reflect the minimum # of LEs your design can take.
A somewhat out-of-date explanation of the architecture & results can be found in a CICC paper I wrote
formatting link
Since then, the wizs in the Fitter team have signficantly improved Quartus II's ability to pack registers without impacting design performance.
To answer your second question, look again at Figure 2-2. The FF data input comes from a bunch of goo -- but if you are using the "packed register" mode as defined in the databook (my #2 above), you can see that you bypass the synchronous load logic. So a register that uses a synchronous load cannot be packed with an independent LUT.
Regards,
Paul Leventis Altera Corp.