Hi Swapnajit,
Thank you so much for the inputs.
You got it correctly i.e. I want only the last step of merge-sort. For sorting I am using parallel VHDL implementation of Insertion Sort for my application (I know it consumes lot of LEs on my FPGA, but I found it quite suitable for my application).
In each clock cycle, I am required to sort two sequences (for which insertion sort suits best). I get the sorted sequences in the next clock cycle. I would like to merge these two sorted sequences now. And I do have a latency of one more clock cycle. So I want a similar implementation of merge-sort also, wherein single sorted sequence should be obtained in the next clock cyle itself.
However I doubt if the hypercube implementation will help here. Ya, the compare-exchange method seems to be of some use in this case.
vizziee.