"Divide" a video line in two stripe

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Here's my problem :

A have a video module (that I can't really change), that outputs a
3840x2400 image, by outputing two consecutive pixels at once (like
dual-link DVI). The problem is that the screen to display that doesn't
want dual-link DVI, it wants two independant DVI stream, one for the
left part of the screen and another for the right part of the screen.
(two "stripes" of 1920x2400).

I'm trying to come up with a solution to "transform" one into another,
without using a frame buffer nor storing more than 1 line of video.
(At 3840, in color, that already is 6 Xilinx BRAMs and I'm a little
short of those ...).

According to my calculations, It should even be possible to only store
half a line, but I prefer to have a 1 line delay than half a line
delay.
My problem is that I can't find how to do it ... Storing in BRAM has
proven to be an addressing nightmare to store and reread simultaneously
without overwriting data I haven't re-read yet ... (since I don't read
in the same order that I write).

Does anyone has done something similar or has a genius idea ? Because
I'm missing something here, that should be simple and I just don't see
it ...


Sylvain


Re: "Divide" a video line in two stripe
Hello,

The simplest scheme would certainly be to use two async fifos of
half-a-line each (one for each DVI output stream). The input stream
feeds alternatively the first and the second fifo. You synchronize the
get of the fifos when both have data: this allows you to get two
synchronized output DVI streams with one line buffering.

If you can desynchronize the DVI output streams, then a quarter line
fifo for each output stream is sufficient (you don't have to wait for
the other fifo to have elements in it to start emptying it, so by te
time the input stream gets back to you, you already have half-emptied
your first fifo).

JB


Re: "Divide" a video line in two stripe
Hello again,

Quoted text here. Click to load it

Now that I think of it, even in this suboptimal case, your second fifo
only need to be a quarter line. It can start emptying as soon as data
is fed into it, at half the data rate. But then you get a half-line
delay between the input and output streams.

So for this scheme, the summary is:

* Synchronized output streams, one line delay costs 1 line buffering
* Synchronized output streams, half-line delay costs 3/4 of a line of
buffering
* Desynchronized output streams costs 1/2 line of buffering

JB


Re: "Divide" a video line in two stripe

Quoted text here. Click to load it

As I said, I need to have a 1 line delay, because of sync with the DVI
timing, both input and output of my module must be in sync with it.
(well, withing a few clock cycles of pipeline ...)

And the two DVI outputs must be synchronous within a few ns ...


So with this scheme (two FIFO), the FIFO storing the the first half of
line would need to be bigger.
At the end of the line, both fifo would be full with the pixel of that
line. Then during the first half of the next line, at each clock cycle,
I would be reading 1 pixel, and pushing 2 pixels. And that for half a
line, so the first fifo would need to be 3/4 of a line and the second
one would need to be 1/2 a line. So I'm over my "quota" of 1 line of
video storage.


Sylvain


Re: "Divide" a video line in two stripe
Quoted text here. Click to load it

Well in this case, apart from crazy adressing schemes which may cost
you a lot in control and headache, I see no simple solution. If you're
really tight on block rams, it won't cost you much to duplicate the DVI
engine (if DVI is generated on-board) or simply delay the DVI control
by half a line with CLB shift-registers (if generated externally).
Can't you really do this ?

Quoted text here. Click to load it

Totally agreed.

JB


Re: "Divide" a video line in two stripe

Quoted text here. Click to load it

Duplicating the DVI isn't really an option. It's not that small because
it's
entirely programmable dynamically.

And I liked the line delay because I already have other blocks in
the pixel processing path that have a line delay, so I already
have the logic to handle these.

And the half line delay need to be programmable as my resolution might
max out at 3840 but it's in fact programmable dynamically ...


I was kindof looking for some addressing trick that would have allowed
me to do this without too much logic and not more BRAMs ...
But I've been searching and so far didn't find anything ...


Re: "Divide" a video line in two stripe


Sylvain


I'm not familiar with Xilinx's memory architecture; but if their memory =
blocks have the option of being run in dual-port mode it could make this =
problem much easier to deal with. 20%

20%

In the past I've taken advantage of other mfg's mixed-port =
read-during-write mode.  This mode is used when a RAM has one port =
reading and the other port writing to the same address location with the =
same clock. The memory block outputs the old data at the specified =
address when there is a simultaneous read during write to the same port. =
You then could set up two blocks (one for each half of the image) a line =
deep.



First fill the memory blocks with line 1



fill block a

*reset wraddr_a back to addr 0 and wait for block b to fill

fill block b

reset wraddr_b back to addr 0

now you read through the two blocks simultaneously while writing to the =
same address for block_a

reset wraddr_b back to addr 0

repeat *



You'll have two pointers for each memory block, one read and one write =
pointer.



I haven't done any work with DVI so I may be missing something specific =
to that interface.  If so, my apologies.



Take care,

Rob



Quoted text here. Click to load it

Re: "Divide" a video line in two stripe
Rob, since you mentioned XILINX BRAM architecture:
Yes, each BlockRAM has two ports that are compltely independent.
On each port, during a write operation, you can read the previous
content of the memory word that you are just now writing into.
This is a configuration option: either read the old content, or the new
content, or leave the output word untouched.
Peter Alfke, Xilinx Applications

Quoted text here. Click to load it
have the option of being run in dual-port mode it could make this problem much
easier to deal with.  
Quoted text here. Click to load it
mode.  This mode is used when a RAM has one port reading and the other port
writing to the same address location with the same clock. The memory block
outputs the old data at the specified address when there is a simultaneous read
during write to the same port. You then could set up two blocks (one for each
half of the image) a line deep.
Quoted text here. Click to load it
address for block_a
Quoted text here. Click to load it
that interface.  If so, my apologies.
Quoted text here. Click to load it


Re: "Divide" a video line in two stripe
Hi Rob,

Yes the bram are dual port but that's not the issue here.

In your scheme, I'll be overwriting some data of the previous line
before I reread them ... Look at the content of block a. I'll need 1
full line time to reread it but I'll fill it with new data in only half
line time ...

Sylvain

Quoted text here. Click to load it
have the option of being run in dual-port mode it could make this problem much
easier to deal with.
Quoted text here. Click to load it
mode.  This mode is used when a RAM has one port reading and the other port
writing to the same address location with the same clock. The memory block
outputs the old data at the specified address when there is a simultaneous read
during write to the same port. You then could set up two blocks (one for each
half of the image) a line deep.
Quoted text here. Click to load it
address for block_a
Quoted text here. Click to load it
that interface.  If so, my apologies.
Quoted text here. Click to load it


Re: "Divide" a video line in two stripe
I guess I misundertood your original ask.

I was thinking that the data was coming in like

Assuming a 12 pixel line for discussion puposes.

clk1 pixel 1, 2
clk2 pixel 3, 4
clk3 pixel 5, 6
--------------------split
clk4 pixel 7, 8
clk 5 pixel 9, 10
clk 6 pixel 11, 12

And what you wanted to do was to split the line in half and ship out two
data streams.

First line will be used to initially fill the memory buffers

So, to fill the memory you would:
clk1: write into address 0 of mem buf_a pixels 1/2
clk2: write into address 1 of mem buf_a pixels 3/4
clk3: write into address 2 of mem buf_a pixels 5/6
Then reset write pointer for mem buf_a back to address 0 while waiting for
mem buf_b to fill

clk4: write into address 0 of mem buf_b pixels 7/8
clk5: write into address 1 of mem buf_b pixels 9/10
clk6: write into address 2 of mem buf_b pixels 11/12
Then reset write pointer for mem buf_b back to address 0

L2
clk1: read from address 0 of mem buf_a and mem buf_b -- pixles 1/2 and 7/8
go out
        write into address 0 of mem buf_a L2 pixel 1/2
clk2: read from address 1 of mem buf_a and mem buf_b -- pixles 3/4 and 9/10
go out
        write into address 1 of mem buf_a L2 pixel 3/4
clk3: read from address 2 of mem buf_a and mem buf_b -- pixles 5/6 and 11/12
go out
        write into address 2 of mem buf_a L2 pixel 5/6
        reset write pointer for mem buf_a back to address 0 while waiting
for the rest of L2 to be written to mem buf_b
        reset write pointer for mem buf_b back to address 0
clk4: write into address 0 of mem buf_b L2 pixel 7/8
clk5: write into address 1 of mem buf_b L2 pixel 9/10
clk6: write into address 2 of mem buf_b L2 pixel 11/12
        reset write pointer for mem buf_b back to address 0

Then repeat for each addition line.

Rob


Quoted text here. Click to load it



Re: "Divide" a video line in two stripe


Quoted text here. Click to load it

Nobody has provided what you really want here yet.
I sympathize with the need to conserve BRAM in video applications.
Maybe we can turn your nightmare into a sweet dream.

Quoted text here. Click to load it

Sylvain,
  The approach to solving your addressing dilemma is to start small.
Look at a line that is only 12 pixels long. On your first pass you will
write to these locations:
| first half line        |  second half line        |
   0   1   2   3   4   5   6   7   8   9  10  11
On your next line, you want to Read before Write these locations:
   0   6   1   7   2   8   3   9   4  10   5  11
Continuing this pattern gives:
   0   3   6   9   1   4   7  10   2   5   8  11
   0   7   3  10   6   2   9   5   1   8   4  11
   0   9   7   5   3   1  10   8   6   4   2  11
   0  10   9   8   7   6   5   4   3   2   1  11
   0   5  10   4   9   3   8   2   7   1   6  11
   0   8   5   2  10   7   4   1   9   6   3  11
   0   4   8   1   5   9   2   6  10   3   7  11
   0   2   4   6   8  10   1   3   5   7   9  11
   0   1   2   3   4   5   6   7   8   9  10  11
and voila, you're back where you started.

Now how do you produce these numbers?
Let the magic of modular arithmetic help you.
Below is a perl script that generates the sequences you need, and shows
everything you need to do. It takes one accumulator and one counter, an
addition, a comparison, and optional subtraction. Limit may be
programmable since you want to handle arbitrary line lengths right?.
You have not mentioned what clock rate you have, but if you translate
this into VHDL, you may even be able to get the addition, comparison,
and optional subtraction into a single cycle. If not, then you can
double the width of the BRAM inputs/outputs using external registers,
demuxes/muxes until you have enough time to do the requisite modular
addition.

I expect this is what you wanted,
HTH
Just John

Herewith, the perl script...
#!/bin/perl
use strict;
my $Limit = 11; # One less than line length
my $Middle = $Limit >> 1; # Routing operation
my $Loop = $Limit + 1;
my $NewIncr = 1;
while ( $Loop-- ) { # Loop over lines
#  printf "Loop %2d:", $Loop;
  my $Incr = $NewIncr;
  my $Addr = 0;
  my $C = 0;
  while ( $C != $Limit ) { # Loop over pixels in lines
    printf "  %2d", $Addr;
    $Addr += $Incr;
    $Addr -= $Limit  if ( $Addr > $Limit );
    $NewIncr = $Addr if ( $C == $Middle );
    $C++;
    }
  printf "  %2d\n", $Limit;
  }


Re: "Divide" a video line in two stripe
Sylvain,
   Speed tip for modular address calculation:
Quoted text here. Click to load it

_An_ approach...

Quoted text here. Click to load it
I wasn't thinking all the way through in the last two sentences above.
I expressed the modular increment as:
    $Addr += $Incr;
    $Addr -= $Limit  if ( $Addr > $Limit );
Since this is probably very fast stuff (being Hi-Res video), and done
in H/W, you can take a different approach.
Rather than do the addition, then the comparison, then the optional
subtraction, do two versions of the addition in parallel, and a
modified comparison, using that to select between the two results.
( Let A = Addr, I1 = Incr, I2 = Incr - Limit )
I2 is computed sometime after NewIncr is set in the previous line, and
before the current line starts.
The two additions are:
A1 = A + I;
A2 = A + I2;
The original comparison I posted was essentially:
if ( ( A + I ) > L )
This can be reworked by letting C = L - I
Since L and I are fixed for the line duration, C can be computed once
before the line starts.
Using C, the comparison becomes

if ( A > C )
  A <= A2
else
  A <= A1

The whole circuit operates almost as fast as a simple counter followed
by a mux.
You can extend the concept to operate on both ports of a DP_BRAM, so
that you can retrieve
both first half line pixels and second half line pixels simultaneously.
(Left as an exercise).

HTH
Just John


Re: "Divide" a video line in two stripe
Wow, that's cool.

JB


Re: "Divide" a video line in two stripe
Hi John,


Yes this addressing scheme works, and I actually though of that (well,
didn't formalize it that much, I just did the firstfew  steps on
paper).
But since I hit a problem since I must write two pixel in parallel and
the location of those pixel changes, I couldn't use simple a twice as
large BRAM.

But ... now that I think of it, since I read and write at the same
location, I can use the same port to do both. Then use the second port
to  work on the second pixel. Then, putting 3x2 BRAM in parallel in
4096x4 mode gives me my storage space for 4096 pixels x 3 x 8b. Great !

Thanks a lot, now I don't even have to figure the math by myself, I
just have to type it in VHDL ;)


Site Timeline