It seems like you want to "count" the 1s in a 150-bit wide word, coming in every 100 ns = 10 MHz.
Here is how I would do it: Use 6 or 7 dual-ported BlockRAMs as LUTs. Each BlockRAM is used as a ROM, organized 4k x 4,i.e. with 12 address bits and 4 output bits. The ROM stores the value of the number of ones on the address inputs. One ROM takes care of 12 inputs, but since it is dual-ported, each BlockRAM takes care of 24 inputs, generating two independent 4-bit outputs. Six BlockRAMs thus cover 144 inputs, and generate 12 independent 4-bit binary numbers in less than 4 ns. The remaining 96 ns can be used in simpler adder structures, or in a 12-step sequential accumulator running at, say, 200 MHz. Peter Alfke, Xilinx Applications