Need ideas for FYP

I am student of Bachelors and going to start my FYP in some days. I am going into the field of high computation in verilog. These are some projects which I might be doing:

1.the n-body gravitational problem 2.Oceanic modeling 3.Cancer biology modeling

Any other projects you might suggest that may be beneficial for me. And also my main aim after Bachelors is to get admission in some US university. Thanks!

Reply to
awaish2011
Loading thread data ...

And also, I have spartan 6 xc6slx45 kit available in college Lab.

Reply to
awaish2011

Hi,

this is not to discourage you. But please be warned that heavy-duty FPGA implementations as youre planning are

*much*

and I mean "much" harder than it looks from all those shiny webpages that make it look like Lego bricks because they want to sell you stuff.

Here's my proposal: Why don't you implement "Hello world!" in Morse code. Which is ".... . .-.. .-.. --- .-- --- .-. .-.. -.. "

Just a blinking LED. Expect that i'll take between a day and two weeks. This includes things that "should" be easy but are not, such as installing ISE 14.7 when you've never done it before, making the JTAG interface work etc.

In my personal opinion, the xc6slx45 is an excellent choice to get started. Because

a) it does not require a Xilinx license to program it b) I can get one cheaply if you ever need one, i.e. "Numato Saturn" or "Pipistrello" boards, for ~$130..160.

c) If it breaks, it's no big deal, compared to a $3000+ board. To learn Verilog, the smallest and cheapest FPGA will do, if you decide to buy one for yourself. The typical feedback from the board is "this doesn't work - go simulate some more".

Note, you said "Verilog", not using some intermediate wizardry that generates the code. For the latter, a sxl45 is probably too small (guessing, haven't done it myself).

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

heh that was supposed to read "-you- can get one cheaply". I don't run a shop :-)

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

There has been work for years on doing computationally intensive problems in FPGAs, much of it hasn't worked out very well.

One problem is that it is hard to do economically. That is, such that someone will buy the product.

I was once thinking about doing a biology related problem, and decided that I could do it with 2000 of the largest S3 FPGAs. That was still a little too big at the time.

There are some interesting projects out there, but you have to compete with the non-FPGA based solutions, and the fact that the actual solution might not be needed.

For example, the n-body problem might be useful to those at NASA trying to get rockets to far away planets. But being able to do 0.1% or 1% better (in fuel use or arrival time) isn't worth all that much. You have to be able to do a lot better to make it worth spending money on.

Planning ahead is useful. The projects you mention will take teams of researchers years to work on. (You have to include packaging and software and user manuals, so that ordinary people can use them.)

For a more reasonably sized project, implement a character based computer terminal. That is, a keyboard, display, and serial port. (Note that many FPGA boards have the hardware to do this.)

In years long past, it wasn't an unusual undergraduate project built out of a microprocessor and TTL parts. Now, it should be a reasonable sized FPGA project.

You can separately work on the keyboard input, UART (find one on opencores, but you still have to figure out how to use it), and character based raster display. You have to get the timing right for an available video monitor.

-- glen

Reply to
glen herrmannsfeldt

maybe this still, before someone quotes me later: When I say the FPGA you mentioned is an "excellent choice", I meant for _learning_, as you can use it on your own PC without $2999 license. Not for high performance computing.

Spartan 6 is marketed with "delivers an optimal balance of low risk, low cost, and low power for cost-sensitive applications".

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

How about something a little more practical? I expect there is not much room for improvement for solving gravitational problems much faster or more accurately. Is there really a need?

Oceanic modeling is a huge area. You might want to narrow the focus on that one a *lot* more before you try to narrow your list... or just remove it.

Cancer biology modeling is also a *huge* area. I do recall some years ago there was a small project at NIH (if I remember correctly) who was working on a real time interactive model of complex molecules. They would model the forces of a molecule and let the researcher use a pair of many degrees of freedom controls to bend and twist the molecule while getting haptic feedback. I'm not sure what happened to this project as I haven't seen anywhere that it became a widely useful tool. Interesting though.

An area I find interesting is low power processing. You might consider what it takes to do something with a minimum of power consumption using off the shelf devices. There are a lot of potential applications there.

--

Rick
Reply to
rickman

The trouble with all these projects is they're something a GPU could do with much less programming effort (at least to make it work non-optimally). So I'm not sure the advantage of using an FPGA. In an FPGA it's a lot harder to change the architecture if the problem changes (at least if you're writing in Verilog/VHDL it is).

One thing FPGAs are good at I/O. So a nice example is video processing - you take in video from a camera, do something clever to it, and output to a display. There's a lot of data so you have to process it fast, and it's a nice visual demo. It's also easy to debug - you can see what's going wrong on the screen.

Likewise other kinds of non-optical data (eg scan data from a 2D sensor of some kind). You can also use audio or other sensors, as long as you have a useful output.

Theo

Reply to
Theo Markettos

(previously snipped project suggestions)

If you can do them in fixed point, you can make really big arrays to process really big data sets, though not so cheap.

There are people who need that, but not so many of them.

Learning how to do it isn't bad, though.

(snip)

Well, many filtering algorithms can be implemented as systolic arrays, which allow for minimal I/O for the processing done.

Implementing an FIR filter in fixed point in an FPGA would be a reasonable sized project. Again, learn about systolic arrays.

-- glen

Reply to
glen herrmannsfeldt

Why is an HDL harder to change than any other code? I use the same editor for both...

That is very true.

I/O is a big plus for an FPGA. But I think the OP wants something that deals with some current major problem.

I wonder what medical app might be suitable for an FPGA. Something that uses an array of sensors to measure body contour or pressure maybe, like a footstep?

--

Rick
Reply to
rickman

Changing small-scale stuff is easy, in any language. Re-architecting the problem is harder. In Verilog/VHDL it's hard because you have to rewrite all the control logic as well as reorganise the datapath.

Let's say you built a single-issue CPU and you want to convert it to superscalar. Not only do you need to rewrite the datapath (not trivial) you have to manage all the enable signals on the pipeline stages and the state machine about when each stage fires. If you get one of those interlocks wrong you get subtle bugs. If you change something, you may get a different set of subtle bugs. Rinse and repeat.

If you're writing code on a GPU you're writing in a much higher level language: the API doesn't even know how many cores you have (it'll depend what model of GPU your machine has) - you just give it the work to do and it'll partition it up amongst the cores.

While there are many subtleties about writing efficient GPU code (you need to know a lot about the underlying architecture to achieve good performance), it's relatively simple to write bad GPU code that works, and then you can refine it later. Bad HDL tends not to work. Not working means staring at simulator traces, which is not a pleasant experience. Or it works in the simulator but not on the board, because the language (I'm looking at Verilog particularly) isn't sufficiently strict about what the expected behaviour should be (and then you get to stare at ChipScope/SignalTap traces, an even less pleasant experience).

The issue for compute problems is always going to be that the FPGA at, say,

200MHz and one stick of DDR3 RAM is up against the multi-GHz GPU with thousands of threads, GDDR5 memory, and so on. There are applications that don't suit GPUs, but unless you have a good architectural reason why it won't work I'd say in most cases you're better off starting with a GPU.

However, this is putting the cart before the horse. If you stare at your algorithm for long enough, with the FPGA or GPU architecture in mind, you can probably make significant performance increase by refactoring the task before writing a line of code.

I realise this is a student project so 'doing something with an FPGA' might be more of a goal than 'making X go faster', but we tend to see a lot of papers which go like this:

  1. Built a Matlab/Java/Python simulator that ran at speed X
  2. Built an FPGA system that runs at speed 100X
  3. Profit!^H^H^H^H Publish!

When the intermediate steps might be

1b. Refactored algorithm (with caches, memory bandwidth, etc in mind) 1c. Built a multithreaded C/C++ simulator that runs at speed 70X on the same hardware as Matlab result 1d. Run that on a proper server, not their 5 year old laptop

at which point why bother with this FPGA stuff?

Theo

Reply to
Theo Markettos

For those problems, you should use a GPU.

There are some problems where an FPGA is a good solution, though.

First, they pretty much have to be able to be done in fixed point. Next, they have to be done on a really huge scale.

If all the arithmetic operations are fixed point add, subtract, and compare, you can do a really huge number of them in an array af FPGAs.

Linear systolic arrays are pretty easy to change. It is a linear array of relatively simple (but in any case, a module in the appropriate HDL) cells. You can put more or less in a single FPGA, and make a linear array of such FPGAs when needed.

In that case, no, don't use an FPGA.

I am not so sure what is now being done with large arrays of FPGAs (not clusters of PCs with a few GPUs in them). If it needs floating point, and not all problems that are commonly done in floating point should be, then GPU might be a better choice. (sni)

I have written verilog that could do 1e19 operations, which are 5 bit add/subtract/compare per day. There is an actual problem that can use that much computation.

How many GPUs does it take to do 1e19 arithmetic operations per day?

Well, at this point he only needs to show that it could be done. That is, proof of concept. Only when someone puts of the money does he have to show that it can scale.

For some actual examples of FPGA based hardware processors see:

formatting link

-- glen

Reply to
glen herrmannsfeldt

ing into the field of high computation in verilog.

It isn't so much computation, and I know nothing about the bioinformatics f ield, but some sort of DNA pattern matching algo always struck me as being an interesting area to explore. The data objects are small, the data set si zes are large, and the parallel nature of FPGAs and internal memory bandwid th can be exploited. A processor can compare a couple symbols per cycle, a GPU might do a few 100 or a thousand symbols per cycle. An low end FPGA cou ld do a few thousands per cycle.

Is it best to have 'n' tiny little state machines, each detecting one of 'n ' patterns, or do you timeslice 'x' state machines, each looking for x/n pa tterns? Is it best to look at data in big gulps, or one symbol at a time?

How is the best way to look for patterns? A giant grep-like FSM, or multipl e smaller FSMs? Do you spread the FSMs into a pipeline (each stage feeding onto the next) or do you use local feedback? Can FSMs be partitioned to max imise efficiency? Can you leverage the underlying FPGA architecture to your advantage (e.g. cascades in DSP blocks, coupling between BRAM blocks).

I like this idea because the FPGA side FSMs are relatively simple, and most of the technology is in how you generate the tables that allow you to sear ch quickly and efficiently.

It could also easily implement pattern matches that are tricky to do in S/W .

Gosh Darn - looks like somebody has been there before (not that I've actual ly read the papers... )

formatting link
formatting link

I also thought that some sort of particle simulation (e.g. Photon Mapping) would be interesting to explore, but never had the time.

Mike

Reply to
Mike Field

I don't follow. How is rearchitecting a CPU in an FPGA anything like changing GPU code? Your explanation makes no sense. I think you are working from a huge lack of knowledge of HDLs.

The CPU has only a handful of ALUs to perform useful calculations on, the FPGA is limited only by its size. The clock speed is swamped by the sheer number of computations that can happen in parallel.

The GPU has lots of ALUs, but is limited in how they are used. It is

*nothing* like having 1000 separate processors. So it can only be useful on certain types of problems.

The FPGA gets around all of these issues and can be configured on the fly.

I gave one reason which you didn't respond to.

--

Rick
Reply to
rickman

(snip)

I think that is about right. And run at about 200MHz, maybe 300MHz. FPGAs have the registers built in, so you just have to be sure to use enough of them.

formatting link

The latter is probably the right description.

The idea of dynamic programming is that if you make the optimal decision at each point, you find the globally optimal solution.

Conveniently, systolic arrays are convenient for evaluating dynamic programming algorithms, and also nice and efficient to implement in FPGAs. (Or ASICs, sometimes.)

-- glen

Reply to
glen herrmannsfeldt

I think they have dealt with that one pretty well. After all, they have sequenced the human genome.

This is not a global truth. It assumes the path to the optimal solution is monotonic which it may not be.

--

Rick
Reply to
rickman

Sure, but in this age of "big data" how quickly can you query 1000s of genm oes, each approximately 4 billion symbols in size, looking for somewhat fuz zy matches?

Oldish (2006) papers talk about a 2GHz Xeon processing 32M characters per s econd, and 16 CPU Power system processing 1.2G symbols per second. This is obvoiusly bound by CPU cycles and not memory or I/O bandwidth.

FPGA hardware, has come a long way in that time, as have memory capacity, m emory bandwidths and I/O subsystems. Standard CPUs haven't progressed at su ch a dramatic pace, just adding more cores. I am pretty sure that revisitin g it with an FPGA board that could hold the entire genome in memory, full m emory bandwidth speed (a 16x 333 MHz DDR memory can deliver 1.2G symbols pe r second) should be able to get close to this on a tiny power budget.

If the process isn't limited by I/O bandwidth, then it isn't running fast e nough :-)

Maybe it could be implemented on one of those ARM/FPGA hybrid chips, with t he fabric having a larger private memory to hold the genome data, and the A RM just performing command and control... it would avoid a lot of the compl exity of high speed I/O.

Reply to
Mike Field

(snip)

For some actual numbers of what you can do today:

formatting link

this machine can generate 4 billion reads (sequences) of 125 base pairs, for a total of 500 Gbp in six days. You then want to compare that the the reference (human) genome (if it is human data) or 3Gbp.

The dynamic programming algorithm gives you the score for each 125bp fragment against the reference, including appropriate penalty (usually 1 each) for insertions, deletions, or substitutions.

(The algorithm is the same one, or similar to, the one used by diff. The original diff got the algorithm from one that was used for protein sequences in the 1970s.)

Since the reads are up to 125bp long, if you score +1 for a match, the score can't go over 127, and so 7 bits is enough. It takes five to seven add/subtract/compare operations, 7 bit fixed point, to compare each new base against each base of the reference. So, 5e11*3e9/6 days or 2.5e20 dynamic programming cells per day. Times 5, so 1.25e21 7 bit add/subtract/compare per day.

How fast is your GPU?

(If you want to sequence a new genome, it is done at about 10x coverage. You randomly select 30Gbp of 125bp fragments, and hope that they cover most of the genome to a depth of at least two. So, the above machine can sequence about 12 humans in 6 days.)

The sequencers have gotten somewhat faster since the last time I did this calculations. Note that for many years now, it isn't the chemistry that limits it, but the data processing.

(snip)

People only think up algorithms that satisfy the restrictions for dynamic programming.

The one commonly used does local alignment, so finds the highest scoring match between each input sequence and the reference, including all combinations of insertion, deletion, or substitution. (Think about spell checkers finding the close words to you misspelled word.) The five operation algorithm scores pretty much the way you would for words. With a little more work, you an do affine gap scoring, there the penalty for a gap has an open penalty and extend penalty, such that longer gaps are not proportionally penalized. You can make even more complicated gap penalty functions.

The times are already long enough. No point in going to one that is exponential in the length of the fragments.

-- glen

Reply to
glen herrmannsfeldt

Not sure you can hold 3 Gsymbols on chip in an FPGA. They may have memory, but not GBs. So the ARM doing control isn't really all that helpful. It can't even be remotely in the data path so it doesn't need to be on chip at all. Why waste space that can be used for more memory and logic?

The real advantage of the FPGA approach is that it can connect to multiple memory chips and run them at max throughput. Multiple FPGAs can be used on one board potentially outpacing the density of PC CPUs and almost certainly reducing the power budget. What was ALU bound in a PC will be memory bound in an FPGA, so more memory means more processing.

--

Rick
Reply to
rickman

Fully agree, and assuming his board is something like an Digilent Atlys it may already have perhaps 128MB of DDR on it, allowing the design to be tested with 512 million DNA (2-bit) symbols - enough to hold a worm's genome.

However, I guess I've dragged this discussion far away from the the original poster's question of what to do for his final year project...

Mike

Reply to
Mike Field

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.