Why is it difficult to understand or modify?
It is just a counter that gets loaded with (delay-2) and then counts down until it becomes -1, and then reloaded. The load value can be changed at any time, and takes effect the next time the counter reaches the terminal count of -1 (this downcounter has the advantage of not requiring any decoding, and of having an easily adjusted modulus). The counter output becomes the address to the memory. I think it is pretty straightforward to comprehend, do you find it otherwise? The only difference is just being a little clever in the counting and taking advantage of the read before write capability of the memory. As far as modifying it, I'm not sure I see the difficulty there either: The modulus is easy to change, and in particular can be easily changed dynamically. You can easily change the width of the counter for different aspect ratios on the memory, say for instance if you wanted to delay a 9 bit signal by a delay not to exceed 2048 clocks. Beyond that, I'm not sure what modifications you'd want to make, or for that matter what modifications that wouldn't be as difficult in the two port scheme you offered. As far as the unused port goes, you can leave it unused if you want, or if the depth of the shift register is less than half the memory depth, you can set the high order address bit to 0 on the shift register side and to '1' on the spare port and then you can use the spare port side as a 9K bit memory for anything your heart desires. Unfortunately, the tools can't figure that out for you so you have to instantiate and initialize (and initialize if needed) the memory. You can also use the second side if you need a wide shift register delay, like you might use for a digital filter.