[On multiplexing the readback values from numerous addressable registers, without the HDL code becoming a dog's dinner]
Mike's and Symon's responses got me thinking some more (a rare occurrence these days) and I came up with a couple of ideas that are probably well-known to half the population of comp.arch.fpga but are new to me.
Idea 1: Wide, extensible readback mux. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I am pretty sure that, for FPGA architectures at least, it's more efficient to do a wide OR than a wide MUX. In other words, as I suggested earlier, make sure that all deselected registers jam their data outputs to zero, and then OR together all the data outputs. Here's a rather neat solution to the wide OR gate - much nicer than the ripple thing I suggested earlier. It depends on the use of an unconstrained array port of record types, so you'll need to check it works with your chosen synthesis tool.
-- Step 1: Package to define some types and constants. -- We define a record "T_gated_databus" to reflect the -- readback data coming out of a register. The "data" -- element is, of course, the data; "enable" is a single- -- bit enable signal that's asserted when the register -- is addressed. -- library ieee; use ieee.std_logic_1164.all; package P_databus is -- constant databus_width : positive := 32; subtype T_databus is std_logic_vector(databus_width-1 downto 0); -- type T_gated_databus is record enable : std_logic; data : T_databus; end record; constant unused_databus: T_gated_databus := ( enable => '0' , data => (others => '0') ); -- type A_gated_databus is array(natural range ) of T_gated_databus; end;
-- Step 2: Make an arbitrarily wide OR structure using -- an unconstrained array input port. Feed it as many -- T_gated_databus records as you have registers. -- At most one of those will have its enable asserted -- at any given time. The output y.enable is asserted -- when one of the input enables is asserted. -- library ieee; use ieee.std_logic_1164.all; use work.P_databus.all; entity radialmux is port ( d: in A_gated_databus ; y: out T_gated_databus ); end; architecture RTL of radialmux is begin process (d) variable vy: T_gated_databus; begin vy := unused_databus; for i in d'range loop if d(i).enable = '1' then vy.enable := '1'; vy.data := vy.data or d(i).data; end if; y regA_output , d(2) => regB_output , y => CPU_readback );
Addresses are applied once only, as generics on the instances of register-containing entities. If you add another register-containing entity, you simply add another signal to the top level architecture and bolt it in to the port map of Readback_Mux, which then grows wider to suit the extended port map. You don't even need the numbering to be contiguous: I tried this...
Readback_Mux: entity work.radialmux port map ( d(0 to 45) => (others => unused_databus) , d(46) => regA_output , d(47 to 62) => (others => unused_databus) , d(63) => regB_output , y => CPU_readback );
So you can make the port subscripts match-up with your register addresses, if it makes you feel better. Because "unused_databus" is an all-zero constant, synthesis optimises away the zero inputs.
Your mileage may vary, but I think this shows promise. The synth tool I tried made a really excellent job of this, using a tree of LUTs in the obvious optimal way. It didn't, though, use carry chains - sorry Symon!