Bresenham Algorithms

Are there any Bresenham line drawing algorithms that are suitable for FPGA pipeline video streaming? In VHDL?

b r a d @ a i v i s i o n . c o m

Reply to
Brad Smallridge
Loading thread data ...

Hi Ray,

If they're in VHDL, why would they only be for Xilinx? ;-)

Ben

Reply to
Ben Twijnstra

Hi Brad,

I think there's a line-drawing implementation somewhere on

formatting link
in the "own IP" section. I remember it not being pipelined, but you could set up a coordinate FIFO or so.

Best regards,

Ben

Reply to
Ben Twijnstra

I've got lots of VHDL (and Verilog) only for Xilinx. Instantiate a Xilinx primitive and it suddenly becomes very difficult to synthesize using Altera tools (unmodified, that is).

Jake

Reply to
Jake Janovetz

Hi Jake,

What kind of stuff do you need to instantiate then? I can imagine the DCMs and the MGTs because they're highly dedicated silicon, but every time I see a BUFG or a MUXCY - or even an SRL16 instantiated I tend to choke - XST by now should be quite capable to infer those itself. Heck, I once visited a company that instantiated all its multipliers where a simple '*' operator would have sufficed.

IMHO, instantiating vendor-only stuff only distracts from what you're trying to do - but that's probably my Mentor background.

Best regards,

Ben

Reply to
Ben Twijnstra

I agree -- the synthesis tool should do a lot of this. And they're getting pretty good. Ray has been around a while. He's got lots of stuff that was written to pull every bit of speed from the devices. To do so meant (and still means, sometimes) to force certain structure on the design.

Also, XST changes what it does from version to version. Writing structural HDL makes the design immune to these changes at the expense of readability. For a lot of optimized blocks, it's best just to tell the tool exactly what you want and then not worry about it changing the results as the rest of the design changes.

For example, the mults are relatively new to the FPGA. I remember old designs where writing "*" would blow up into using half of the gates of the entire FPGA (when the 4062 was a big device). CORE modules or hand-written mults were the only way to go.

Jake

Reply to
Jake Janovetz

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759
Reply to
Ray Andraka

Ben Twijnstra wrote:

They have instanced primitives for performance as well as to enforce a particular structure. Synthesis tools (both Synplify and XST are notorious for this) frequently change how they infer things from version to version. While designs where you are not particularly concerned about maximum performance, minimum power or maximum density work fine with 100% RTL, designs where you are pushing these corners cannot tolerate the looseness of synthesized RTL. RTL synthesis also frequently does not synthesize to specific FPGA structures such as SRL16's in dynamic shift mode, dual port RAM etc. Finally, even on a design that is strictly RTL, differences in FPGA structures and features from different vendors make a design that is efficient in one become a pig in the other. For example, the older Altera 10K and 20K families, when used for arithmetic functions broke the logic into a pair of 3 luts, which meant that arithmetic in a single level was restricted to a basic add, where Xillinx of the same vintage retained the 4 input function for arithmetic. Xilinx carry chain structure has some quirks that can force an arithmetic sturcture to two levels if not described correctly, and the synthesizers have a hard time picking that out (consider the case of an add followed by a mux: it can be done in one level in xilinx, but the synths will not recognize that structure and instead produce a design that uses two levels of logic). Often the correct structure can be enforced, still in RTL, by breaking the combinatorial function down and with judicious use of syn_keeps. Other times, it is just easier to instance a component built up from primitives.

I've developed a rather extensive library of components, including adders, registers as well as more complicated blocks like FFT's and everything in between. This library is mostly instanced primitives with placement attributes on it that save a tremendous amount of time in floorplanning. By using this library, I also avoid the pushing on a rope syndrome that often happens in speed critical rtl. In the case of the Bresenham circuits, I'd have to look, but I think I had some srl16s in there with dynamic addressing, as well as some placed adders.

Yes, in the optimum world, RTL only code would be the best way, but the fact remains that synthesis is still far from perfect, especially when it comes to dealing with the increasingly heterogenous arrays that are today's FPGAs. For example, the DSP48 slice in Xilinx is only inferred in it's basic form by the synthesis tools. You leave a heck of a lot of performance on the table by not taking time to understand and force the ideal implementation of circuits using them. Another V4 gotcha that the synths don't seem to notice is that the SR pins on the flip-flops are deadly slow. if you aren't careful about your coding, the synthesizer gets cute using the resets in order to reduce the logic complexity at an entirely unacceptable hit in performance.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.