"Divide" a video line in two stripe

Here's my problem :

A have a video module (that I can't really change), that outputs a

3840x2400 image, by outputing two consecutive pixels at once (like dual-link DVI). The problem is that the screen to display that doesn't want dual-link DVI, it wants two independant DVI stream, one for the left part of the screen and another for the right part of the screen. (two "stripes" of 1920x2400).

I'm trying to come up with a solution to "transform" one into another, without using a frame buffer nor storing more than 1 line of video. (At 3840, in color, that already is 6 Xilinx BRAMs and I'm a little short of those ...).

According to my calculations, It should even be possible to only store half a line, but I prefer to have a 1 line delay than half a line delay. My problem is that I can't find how to do it ... Storing in BRAM has proven to be an addressing nightmare to store and reread simultaneously without overwriting data I haven't re-read yet ... (since I don't read in the same order that I write).

Does anyone has done something similar or has a genius idea ? Because I'm missing something here, that should be simple and I just don't see it ...

Sylvain

Reply to
Sylvain Munaut
Loading thread data ...

Hello,

The simplest scheme would certainly be to use two async fifos of half-a-line each (one for each DVI output stream). The input stream feeds alternatively the first and the second fifo. You synchronize the get of the fifos when both have data: this allows you to get two synchronized output DVI streams with one line buffering.

If you can desynchronize the DVI output streams, then a quarter line fifo for each output stream is sufficient (you don't have to wait for the other fifo to have elements in it to start emptying it, so by te time the input stream gets back to you, you already have half-emptied your first fifo).

JB

Reply to
jbnote

Hello again,

Now that I think of it, even in this suboptimal case, your second fifo only need to be a quarter line. It can start emptying as soon as data is fed into it, at half the data rate. But then you get a half-line delay between the input and output streams.

So for this scheme, the summary is:

  • Synchronized output streams, one line delay costs 1 line buffering
  • Synchronized output streams, half-line delay costs 3/4 of a line of buffering
  • Desynchronized output streams costs 1/2 line of buffering

JB

Reply to
jbnote

As I said, I need to have a 1 line delay, because of sync with the DVI timing, both input and output of my module must be in sync with it. (well, withing a few clock cycles of pipeline ...)

And the two DVI outputs must be synchronous within a few ns ...

So with this scheme (two FIFO), the FIFO storing the the first half of line would need to be bigger. At the end of the line, both fifo would be full with the pixel of that line. Then during the first half of the next line, at each clock cycle, I would be reading 1 pixel, and pushing 2 pixels. And that for half a line, so the first fifo would need to be 3/4 of a line and the second one would need to be 1/2 a line. So I'm over my "quota" of 1 line of video storage.

Sylvain

Reply to
Sylvain Munaut

Well in this case, apart from crazy adressing schemes which may cost you a lot in control and headache, I see no simple solution. If you're really tight on block rams, it won't cost you much to duplicate the DVI engine (if DVI is generated on-board) or simply delay the DVI control by half a line with CLB shift-registers (if generated externally). Can't you really do this ?

Totally agreed.

JB

Reply to
jbnote

Duplicating the DVI isn't really an option. It's not that small because it's entirely programmable dynamically.

And I liked the line delay because I already have other blocks in the pixel processing path that have a line delay, so I already have the logic to handle these.

And the half line delay need to be programmable as my resolution might max out at 3840 but it's in fact programmable dynamically ...

I was kindof looking for some addressing trick that would have allowed me to do this without too much logic and not more BRAMs ... But I've been searching and so far didn't find anything ...

Reply to
Sylvain Munaut

=_NextPart_000_003B_01C73E73.E1F6D680 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Sylvain

I'm not familiar with Xilinx's memory architecture; but if their memory = blocks have the option of being run in dual-port mode it could make this = problem much easier to deal with. =20

=20

In the past I've taken advantage of other mfg's mixed-port = read-during-write mode. This mode is used when a RAM has one port = reading and the other port writing to the same address location with the = same clock. The memory block outputs the old data at the specified = address when there is a simultaneous read during write to the same port. = You then could set up two blocks (one for each half of the image) a line = deep.

First fill the memory blocks with line 1

fill block a

*reset wraddr_a back to addr 0 and wait for block b to fill

fill block b

reset wraddr_b back to addr 0

now you read through the two blocks simultaneously while writing to the = same address for block_a

reset wraddr_b back to addr 0

repeat *

You'll have two pointers for each memory block, one read and one write = pointer.

I haven't done any work with DVI so I may be missing something specific = to that interface. If so, my apologies.

Take care,

Rob

"Sylvain Munaut " wrote in = message news: snipped-for-privacy@a75g2000cwd.googlegroups.com...

simultaneously

Reply to
Rob

Nobody has provided what you really want here yet. I sympathize with the need to conserve BRAM in video applications. Maybe we can turn your nightmare into a sweet dream.

Sylvain, The approach to solving your addressing dilemma is to start small. Look at a line that is only 12 pixels long. On your first pass you will write to these locations: | first half line | second half line | 0 1 2 3 4 5 6 7 8 9 10 11 On your next line, you want to Read before Write these locations: 0 6 1 7 2 8 3 9 4 10 5 11 Continuing this pattern gives: 0 3 6 9 1 4 7 10 2 5 8 11 0 7 3 10 6 2 9 5 1 8 4 11 0 9 7 5 3 1 10 8 6 4 2 11 0 10 9 8 7 6 5 4 3 2 1 11 0 5 10 4 9 3 8 2 7 1 6 11 0 8 5 2 10 7 4 1 9 6 3 11 0 4 8 1 5 9 2 6 10 3 7 11 0 2 4 6 8 10 1 3 5 7 9 11 0 1 2 3 4 5 6 7 8 9 10 11 and voila, you're back where you started.

Now how do you produce these numbers? Let the magic of modular arithmetic help you. Below is a perl script that generates the sequences you need, and shows everything you need to do. It takes one accumulator and one counter, an addition, a comparison, and optional subtraction. Limit may be programmable since you want to handle arbitrary line lengths right?. You have not mentioned what clock rate you have, but if you translate this into VHDL, you may even be able to get the addition, comparison, and optional subtraction into a single cycle. If not, then you can double the width of the BRAM inputs/outputs using external registers, demuxes/muxes until you have enough time to do the requisite modular addition.

I expect this is what you wanted, HTH Just John

Herewith, the perl script... #!/bin/perl use strict; my $Limit = 11; # One less than line length my $Middle = $Limit >> 1; # Routing operation my $Loop = $Limit + 1; my $NewIncr = 1; while ( $Loop-- ) { # Loop over lines # printf "Loop %2d:", $Loop; my $Incr = $NewIncr; my $Addr = 0; my $C = 0; while ( $C != $Limit ) { # Loop over pixels in lines printf " %2d", $Addr; $Addr += $Incr; $Addr -= $Limit if ( $Addr > $Limit ); $NewIncr = $Addr if ( $C == $Middle ); $C++; } printf " %2d\n", $Limit; }

Reply to
JustJohn

have the option of being run in dual-port mode it could make this problem much easier to deal with.

mode. This mode is used when a RAM has one port reading and the other port writing to the same address location with the same clock. The memory block outputs the old data at the specified address when there is a simultaneous read during write to the same port. You then could set up two blocks (one for each half of the image) a line deep.

address for block_a

that interface. If so, my apologies.

messagenews: snipped-for-privacy@a75g2000cwd.googlegroups.com...

Reply to
Peter Alfke

_An_ approach...

I wasn't thinking all the way through in the last two sentences above. I expressed the modular increment as: $Addr += $Incr; $Addr -= $Limit if ( $Addr > $Limit ); Since this is probably very fast stuff (being Hi-Res video), and done in H/W, you can take a different approach. Rather than do the addition, then the comparison, then the optional subtraction, do two versions of the addition in parallel, and a modified comparison, using that to select between the two results. ( Let A = Addr, I1 = Incr, I2 = Incr - Limit ) I2 is computed sometime after NewIncr is set in the previous line, and before the current line starts. The two additions are: A1 = A + I; A2 = A + I2; The original comparison I posted was essentially: if ( ( A + I ) > L ) This can be reworked by letting C = L - I Since L and I are fixed for the line duration, C can be computed once before the line starts. Using C, the comparison becomes

if ( A > C ) A

Reply to
JustJohn

Wow, that's cool.

JB

Reply to
jbnote

Hi Rob,

Yes the bram are dual port but that's not the issue here.

In your scheme, I'll be overwriting some data of the previous line before I reread them ... Look at the content of block a. I'll need 1 full line time to reread it but I'll fill it with new data in only half line time ...

Sylva> Sylvain

have the option of being run in dual-port mode it could make this problem much easier to deal with.

mode. This mode is used when a RAM has one port reading and the other port writing to the same address location with the same clock. The memory block outputs the old data at the specified address when there is a simultaneous read during write to the same port. You then could set up two blocks (one for each half of the image) a line deep.

address for block_a

that interface. If so, my apologies.

messagenews: snipped-for-privacy@a75g2000cwd.googlegroups.com...

Reply to
Sylvain Munaut

Hi John,

Yes this addressing scheme works, and I actually though of that (well, didn't formalize it that much, I just did the firstfew steps on paper). But since I hit a problem since I must write two pixel in parallel and the location of those pixel changes, I couldn't use simple a twice as large BRAM.

But ... now that I think of it, since I read and write at the same location, I can use the same port to do both. Then use the second port to work on the second pixel. Then, putting 3x2 BRAM in parallel in

4096x4 mode gives me my storage space for 4096 pixels x 3 x 8b. Great !

Thanks a lot, now I don't even have to figure the math by myself, I just have to type it in VHDL ;)

Reply to
Sylvain Munaut

I guess I misundertood your original ask.

I was thinking that the data was coming in like

Assuming a 12 pixel line for discussion puposes.

clk1 pixel 1, 2 clk2 pixel 3, 4 clk3 pixel 5, 6

--------------------split clk4 pixel 7, 8 clk 5 pixel 9, 10 clk 6 pixel 11, 12

And what you wanted to do was to split the line in half and ship out two data streams.

First line will be used to initially fill the memory buffers

So, to fill the memory you would: clk1: write into address 0 of mem buf_a pixels 1/2 clk2: write into address 1 of mem buf_a pixels 3/4 clk3: write into address 2 of mem buf_a pixels 5/6 Then reset write pointer for mem buf_a back to address 0 while waiting for mem buf_b to fill

clk4: write into address 0 of mem buf_b pixels 7/8 clk5: write into address 1 of mem buf_b pixels 9/10 clk6: write into address 2 of mem buf_b pixels 11/12 Then reset write pointer for mem buf_b back to address 0

L2 clk1: read from address 0 of mem buf_a and mem buf_b -- pixles 1/2 and 7/8 go out write into address 0 of mem buf_a L2 pixel 1/2 clk2: read from address 1 of mem buf_a and mem buf_b -- pixles 3/4 and 9/10 go out write into address 1 of mem buf_a L2 pixel 3/4 clk3: read from address 2 of mem buf_a and mem buf_b -- pixles 5/6 and 11/12 go out write into address 2 of mem buf_a L2 pixel 5/6 reset write pointer for mem buf_a back to address 0 while waiting for the rest of L2 to be written to mem buf_b reset write pointer for mem buf_b back to address 0 clk4: write into address 0 of mem buf_b L2 pixel 7/8 clk5: write into address 1 of mem buf_b L2 pixel 9/10 clk6: write into address 2 of mem buf_b L2 pixel 11/12 reset write pointer for mem buf_b back to address 0

Then repeat for each addition line.

Rob

"Sylvain Munaut " wrote in message news: snipped-for-privacy@l53g2000cwa.googlegroups.com...

Reply to
Rob

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.