An idea for a product (FPGA/ASIC based)

Folx,

At work, we encode tons of video and some/most have fairly short TTM requirements. It is all Web-destined and sooner we get it out, the better it is for us and consumers alike.

So we use SW that supports variety of video format and encode video in Flash/WM and other formats.

Now the problem statement: it is dead slow. We use latest-greatest in computer tech (latest AMD and Intel CPUs), but still even @ lower Web rez, the encoded happen at slower than real-time. By now, the video compression algorythms are prolly well perfected and it is unreasonable to expect any breaktrhoughs there. The situation will improve somewhat as Intel's Woodcrests/Conroe become available, but still.

So about the only way one can accelerate the process is by HW acceleration. As suprising as it is, there doesn't appear to be any products out there ! While most cell phones these days have HW-accelerated video DEcodes, there's nothing for encoding.

I know some folx here have pretty good idea as to what might be possible with FPGAs, as far FFT/DCTs are concerned. Do you think a board can designed (price does NOT matter) that can rediacally improve the situation ?

the market is ripe for something like that. Of course, one would need to make sure there is a clear API and major vendors enable their codes to work with it.

Something like PCI-X or PCIEx4, with (a few of ?) Virtex4 on it, may be

512MB of fastest SRAM for buffering.

Input would be say DV25 or MPEG2, output: MPEG4, Flash 8, WM 9/10, in

500x300 rez or thereabouts.

So ... put it together and laugh all the way to the bank :) Market is

10s of thou, easily. You can ask 15-20K for something like that, assuming it radically cuts encoding time.

Ideas ?

Reply to
Rashid
Loading thread data ...

Searching at Google:

formatting link

shows some graphics cards, which has hardware encoders. And of course, if price doesn't matter, for professional television broadcasting there are some realtime hardware encoders for MPEG4 etc. available.

--
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Reply to
Frank Buss

Thnx for google :) . Of course I looked, before, long and hard :), could not find the stuff I am looking for.

I guess I should qualify request - it is more of transcoding solution. Dealing with preexisting DV/MPEG files that need to be flipped into dift format(s). So there, one'd face a double problem: decoding pre-existing, most-of-the-time compressed format and re-encoding to a different one.

With DSP performances of today's FPGAs in multiple GMACs and parallel possibilities (up to 512 DSP slices in top V4s) I figured it might be a good fit for this application.

Also, FPGAs being programmable in matter of millisecs these days, one can prolly tailor processing for particulars of a format @ hand ...

Frank Buss wrote:

Reply to
Rashid

Rashid schrieb:

[]
15KUSD is only sufficient (maybe) to make 1 or 2 rounds of high-end proto PCBs with LARGE Xilinx FPGA's on it. And making the PCB is only a fraction of the actual costs.

The minimum NRE is more likely 250K USD for your project. Add some profit too and you end up paying 0.5M USD or more. To make it attractive you should be willing to pay 1M USD or more.

I'ts defenetly not something 2 students glue together as last year project. You are not going to find anyone who 'puts it together' and takes your 25KUSD with a smile.

hm after reading your post, the 15-20K was meant as sale price of one ready made unit? Thats more realistic if there is guaranteed sale of

50pcs at that price. But someone still has to pay up front the NRE costs which are not small.

Antti

Reply to
Antti

On a sunny day (16 Jul 2006 09:06:42 -0700) it happened "Antti" wrote in :

If you want to use FPGA, have a look here:

formatting link
There is H264 main profile encoder IP mentioned too.

Reply to
Jan Panteltje

Generally anything you can do in software can be done in a hardware FPGA design. Usually much faster in the FPGA and other techniques like parallel processing can be used to enhance speed further. Usually the main deciding factor is cost, if performance is the issue then usually the FPGA wins.

We have done major parts of a video conferencing system this way in the past and other get FPGA co-processing applications are common too. As for a board to start with have a look at our Broaddown4 product.

John Adair Enterpo> Folx,

Reply to
John Adair

There is plenty of stuff for encoding video in hardware (like mpeg4 compression).

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply to
Nico Coesel

Antii Calculations are correct.

But you can find a ready solution from matrox video editing cards.

formatting link
Most of them could be the solution for your needs. With Matrox RT.X100 Xtreme Pro (among other functions ) with Matrox MediaExport provides hardware-accelerated simultaneous batch encoding of Windows Media/RealMedia streaming formats and MPEG-1/MPEG-2 multimedia formats with multiple resolutions, bit rates, and frame rates.

RT.X10 and most of the other cards provide the above functionality.

FPGA projects can provide lightining fast solutions for specific tasks but are VERY expensive to develop and need specialist enginners (hopefully among us in this newsgroup) to develop the products.

Regards, Viron. Viron.

Reply to
viron

...

Texas Instruments' DaVinci chips may be just the ticket. $20-$70 price range, containing an ARM and a DSP core. Codecs run on the DSP while ARM runs the I/O. I think they have a development board for few thousand US$. Their development env. is Linux based, which suits me just fine.

Reply to
przemek klosowski

update: check Avivo=99 Video Converter from ATI

formatting link

Various sites have checked and reported significant improvements in encoding time. Google it to find relative links.

Viron.

vir>

Reply to
viron

I talked to DivX and Nero last year about building an accelerator card for doing their encoding. They were both very excited at the idea. Unfortunately, the company I work for decided not to pursue that path. I think it would be a very fun project. I've done JPEG encoding on FPGA boards before. It took about three months of development time for one engineer. I assume MPEG2 would be three times that. MPEG2 might need some SRAM for the frame buffer.

Concerning MPEG4, well, that's a whole other beast. MPEG4 uses wavelets, which are scatter-gather algorithm based. Those algorithms are a pain in the butt to code on FPGA. And the only way you would ever make them faster than a CPU is if you had some fast SRAM and a lot of it. Incidentally, I've seen some MPEG4 cores for sale. You would not want to develop that yourself.

Concerning a board, well, dang it people, this is what I've been crying for for years! Just build a simple board with four Spartan3 chips in a loop. Put some cheap SRAM and DRAM sockets on there with two PCIe 4x controller chips and we have done deal. It would cost $0.5 Mil to develop the board, a fast driver, and get all the bugs out, but in the end you could sell millions of the "general purpose accelerator card" for $300 a piece. Various software vendors would ship modules for use with that board when detected.

Reply to
Brannon

Just curious: How would one connect two PCIe 4x ports in one card?

Or would PICe 8x (or PCI 16x) be a better way to go?

Reply to
John_H

For a simple progressive encoder that's probably about right. Add all the b-frame/interlace field/frame coding tools and things get rather more hairy.

A large RAM is very likely required to store the reference frames for motion estimation.

I think the majority of applications use the simple and advanced simple profiles where wavelets are not required. Not sure that they are needed in main profile either.

Still, there are plenty of scary algorithms in mpeg-4 left to implement. Of course for an encoder you can ignore whatever algorithms you don't like, but quality inevitably suffers if you do that too often.

Fully custom encoders are incredibly complex beasts - professional ones often have tens of DSP's and FPGA's on board.

I think a board which accelerated motion estimation and left residual coding (and thus the rate distortion optimisation process and the majority of differences between codec standards) up to the CPU would be a reasonable solution.

Cheers,

Andy

Reply to
Andy Ray

Last I checked, you couldn't just buy an PCIe 8x controller chip. For example, the Dini board (

formatting link
) has two 4-lane PHYs and they expect all the DMA logic to be coded in the connecting chip (using cores from
formatting link

I don't understand what is so hard about building an 8-lane PHY with a DMA channel for each lane and putting it all in a little chip that comes with some nice driver code base for the host machine and LVDS output on the far end. I guess that's asking too much. (But it has been done for PCI...)

Reply to
Brannon

I fully agree. The existing boards are too complex because they do too much. This is where "reconfigurable" computing really comes into play. Consider JPEG: you can first configure the board to do DCT. Then you stream all your data through on a high speed connection and get your DCT'd code back into host memory. Then you configure the board to do quantization and Z-ordering, then run the huffman analysis on the host machine, then configure the board to do bit packing load it with the new huffman codes, etc. Most people that do huffman encoding in FPGA just use predetermined lookup tables because to do a full huffman analysis requires some significant DSP code. Don't code that in FPGA! Use the FPGA for what it is good at (i.e., pipelined DCT).

Reply to
Brannon

Hi Brannon, Can you explain further about a full huffman analysis for JPEG?

I have no any knowledge about JPEG and what books you recommend about the topics?

Weng

Brann> > Fully custom encoders are incredibly complex beasts - professional ones

Reply to
Weng Tianxiang

Since I haven't designed a plugin board for PCIe yet, I haven't had to deal with this particular issue. Perhaps the ability to split a PCIe 16x into (at least) two PCIe 4x ports is part of the standard. I thought it would require more than one slot to implement more than one PCIe.

I'll look deeper into the spec to see what I might expect from plug-in cards. It's a pretty fundamental change from the "one slot, one port" mentality I had.

Reply to
John_H

Google this: "MIL-STD-188-198A"

and Google this: "itu-t81.pdf"

Those are two fundamental JPEG documents. There is one nice book on the topic I've seen, but I don't have a copy handy. If you can understand the above two documents, you won't need it. It has a pink cover. And if you don't know what Huffman encoding is, you had better Google that one as well.

Reply to
Brannon

Hi Brannon, I want to buy a book on JPEG, what books do you recommend? itu-t81.pdf is 176 pages and it is too long.

I know Huffman encoding, and it encodes and decodes data based on appearance frequency of keywords. I just want to know what generates appearance frequency of keywords in JPEG algorithm.

Thank you.

Weng

Brann> > Hi Brannon,

Reply to
Weng Tianxiang

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.