Distributed microcontroller computing

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
I've got a pure math problem implemented in C that will take about 3
years to solve using all 5pcs available to me (the algorithm is about
as efficient as it will get without some major mathematical insights).

The funny thing about this problem is that I can easily split it into
smaller problems each of which only requires about 1000 bytes of memory
and only uses bitwise operations and additions mod 256.

I was considering the possibility of creating a network of
microcontrollers, probably using I2C, with one main processor dishing
out smaller sub problems and reporting results back to a PC. To date I
have only used Microchip PICs, primarily the 18f4550 and done my
programming in MPLAB ASM. I think I could easily implement this setup
using these chips but they are too slow (40MHz about 20Mips I think).

I have heard that there are some high speed microcontrollers out there,
I had a quick look at Freescales range of PowerPC based embedded
processors. These are about $25/800Mips - as far as I can tell this
would be significantly cheaper on a per MIP basis than buying computer
hardware (even omitting the hdd and using minimal ram).

My questions:
1. What is the best $/Mips microcontroller out there, preferably with
I2C.

2. What is the best $/Mips microcontroller that I will be able to
understand without too much trouble given that I have only used
Microchip products so far?

3. The technical docs for the Freescale embedded processors left me
somewhat confused (quite a jump up from microchip microcontrollers!!!)
what external components do these devices need to run? In particular is
the program code stored in the device or does it obtain this via the
bus?

4. Now I'm not sure about the gory technical details but I assume that
given the simplicity of the instructions my program needs to execute
the MIPS figures for microcontrollers should be fairly comparable with
that for desktops. The main difference I can see here would be
regarding branching - I've heard that modern processors go to great
lengths to predict which instructions will be executed so that no time
is wasted while retrieving program instructions...

5. Another possibility that would keep me busy for a while is
implementing the program in assembler on my PC. Unfortunately every
computer that I have access to has a different architecture
(Athlon,Sempron,Duron,2xPentium M) so I guess I would have to restrict
myself to x86 instruction set? What sort of performance increase do you
think this might produce?

Thank you very much for your time and I look forward to hearing from
you!


Re: Distributed microcontroller computing

Quoted text here. Click to load it

Hi,

Your first description seems to suggest a easily parallellizable algorithm
of moderate complexity. This might well be in the ballpark of modern FPGAs,
which are inheritly parallel. If you could elaborate a bit more on your
algorithm?
I once studied a similar problem and the thoughest points were:
1) efficient communication of your system (you probably need to load your
data and collect the results)
2) fault tolerance: the more devices you have, the likelier it becomes that
one of them will fail (murphy: just 1 second before producing the answer)

Kind regards,
Alvin.



Re: Distributed microcontroller computing
Thanks Alvin,

The problem is to assign one of 5 colours for each of the integers from
1 upwards such that if any two integers have the same colour the
integer that they sum to must have  a different colour:

eg.
If we have
1-green 2-blue 3-green

then 4 cannot be green or blue as 1+3 = 4 and 2+2 = 4.

A correct sequence of 160 digits for 5 colours is known. I wish to find
a sequence of 162 digits.

I'm doing an exhaustive search on a severely restricted subset of all
the possible sequences. The sequence is built up one colour at a time
until you get to the point where you know that somewhere down the track
there will be no possible colour, then you go back one in the sequence
and try a different colour etc.

This is quite easy to split up each chip at a time should only increase
the sequence by a few digits (due to memory constraints) and report
each possible final sequence back to the coordinating chip which dishes
each of these out to other chips when they request more work. Of course
there needs to be a prioritisation of sequences such that the sequence
queue doesn't get too big (ie. always dish out the longest sequences
which will be exhaustively searched and therefore removed quickest).
All of this stuff isn't too hard.

To the points you made,
1) efficient communication. Each chip needs to get a sequence per unit
of work which is no biggy, but it will need to report back each
sequence it ends up at..... This could in fact be quite a few - maximum
5^(#of digits sequence was increased by) but usually a lot less. For
each of these it needs to return  (sequence length)/2 bytes. I think I
will need to consider this point in some more detail...
2) fault tolerance. I wish to find a single correct sequence and
believe (hope!) that there are many of these (expected running time is
time to first sequence not completion time for exhaustive search which
would in fact take 100s of years!!!), whilst missing one sequence due
to a faulty device/communications would be very bad this wouldn't be
disastrous. I'm not trying to say that no sequence of 162 digits
exists.

I don't know anything about FPGAs or how these would apply, do you
happen to have some useful links?

Thanks!


Re: Distributed microcontroller computing
Quoted text here. Click to load it

I can follow <> green, but <> blue seems to extend your rule ?

Quoted text here. Click to load it

So that's a string of ~160 characters, where each character can be one
of 5 values ?

Quoted text here. Click to load it

This reads a little like sorting primes.
The data set would certainly fit into a (very) small microcontroller
= you can even pack into nibbles, and consume just 80 bytes, but
the problem with many small uC will be ensuring there are no overlaps,
or holes, in their scan coverage.

ie the task is simple enough, but multi-uC management is likely to
be a nightmare.

Something like i2c for the backplane is also likely to be a serious
bottleneck.


Quoted text here. Click to load it

Look at Altera, Lattice, Xilinx - there are many demo/eval boards and
tool sets.
Also look at the Soft CPUs : Xilinx PicoBlaze, and Lattice Mico8

FPGAs can do hugely parallel tasks, and on a small data set like this,
you have no memory bandswidth issues.

With a FPGA, you could do exclusion mapping - that is, do not store the
Colour@integer, but instead have an array of N x 5 booleans, which are
excluded colours. [ALL 5 => Whoops, go back! ]
An FPGA could scan for all ahead exclusions, very efficently indeed.
One of the small soft CPUs could manage the re-seed process.

<paste>
Quoted text here. Click to load it

The algorithm is always where the biggest speed gains can be made,
especially in efficently mapping the algorithm to the hardware it runs
on.

In a FPGA you could set up 'algorithm races', where (eg) you code
4 algorithms in ~1/4 of the chip each, and run it for a couple of days,
and compare their Attained String Lengths.

If the present best is a langth of 160, don't just think about 162, look
to smash it ! :)

I've added comp.arch.fpga, as this really sounds more like a FPGA+smart
algorithm, than a "sea of uC" problem.


-jg


Re: Distributed microcontroller computing

Quoted text here. Click to load it

Hi,

The blue rule seems ok: 2 + 2 == 4.

I'd certainly favour FPGAs for this. The biggest issue would still be the
memory bandwidth: each time you split some work over different processing
nodes, you need to communicate the present N charactrs, while testing for an
expansion could easily take less than O(N) time (depends on how the
algorithm behaves), possibly leaving processing nodes starved for data and
while others might be spening more time on communicating than on processing.

My first guess is that a single dual-ported RAM per processing node should
suffice. That allows many processing nodes on a modern FPGA. Maybe if I
change the memory architecture that I have in mind, some communication
overhead could be eliminated. Oops ... I'm trying to solve it! Can I? :-)

Alvin.



Re: Distributed microcontroller computing
Since time is an important factor, building and testing a whole net of
FPGAs/uC's seems like a step in the wrong direction.
Common PC computing power is availible in abundancy. One way to get it
is:

Sun sells processing power (and it's quite cheap too :) ).
http://www.sun.com/service/sungrid/overview.jsp
One of (the many) advantages of this approach is that you can write
your program in java, which probably will cut down on development time
a lot.

Another way of doing it is writing your own BOINC program.
http://boinc.berkeley.edu /
This is a system for distributed computing. After you've written your
program you might post a message in an appropriate newsgroup and ask
for help running it. I know I would help out =)

Hope this helps
// eKIK


Re: Distributed microcontroller computing
Thanks for the suggestion eKIK,

The idea of this was really more for the fun of setting this up than
seriously trying to save money. The sungrid seems a little on the
expensive side for me here in New Zealand at $1US/hr.... I can get
access to supercomputing at $0.30NZ/hr.... (on 2200Athlons).

Something like BOINC sounds like a good idea, although I think most
people that would be inclined to do that are probably already running
something like mersenne prime or seti at home...

Cheers!


Re: Distributed microcontroller computing
I think you might be right about BOINC users already running other
things. My school has allowed students to use all the schools computers
idle time for distributed computing. Perhaps a similar deal can be
worked out for you?

I can see the fun in trying to setup the system you originally spoke
of.
You can thing of FPGAs as "blank microcontrollers". You "program" them
using either VHDL or Verilog, which is kind of tricky the first times
since it's all executing in parallell.
You can check on (http://www.xilinx.com /) for information about
CPLDs/FPGAs.
This is probably the most complicated (and also most fun :) ) way of
doing it. One problem is that you have to write the communication
interface by yourself as well. SPI and I2C ones are availiable at
(http://www.opencores.org/browse.cgi/by_category ).

MIPS (Meaningless Indicator of Processor Speed) is not the best way to
measure speed at all times, since most microcontrollers has different
CPI (Clockcycles Per Instruction) for different instructions. A LW
might take longer than a ADD for example.

Perhaps you should try to write the inner loop of your program in
assembly or pseudocode, then have a look at different uC datasheets to
see how many cycles the loop actually takes to complete?

Atmel is a good source for semi-high performance microcontrollers. They
even have a uC + FPGA hybrid (http://www.atmel.com/products/FPSLIC/)
which might be VERY interesting for you.

Regards
// eKIK


Re: Distributed microcontroller computing

Quoted text here. Click to load it

I would not use I2C but rather UART.

Also look at the small ARMs from Philips (LPC2xxx) and Atmel SAM7.

These are sometimes with very low pin count (48 QFP) and run with
60MHZ (or 64MHZ if Atmel).

Philips just announced the LPC2101 with 2K RAM and 8K Flash for about
1.5USD.

There are some companies selling those chips on a DIL outline PCB with
all the needed things like quarz and voltage regulator.

I don't think Freescale has currently something to compete this.



--
42Bastian
Do not email to snipped-for-privacy@yahoo.com, it's a spam-only account :-)
We've slightly trimmed the long signature. Click to see the full one.
Re: Distributed microcontroller computing

Quoted text here. Click to load it

I would use pairs of processor coupling with FPGA.  The FPGA acts as
cross-switch for setting up the vectors (data arrays) as well as
parallel processing on the vectors itself.   You can setup hundreds or
thousands of
vector processors in a high end FPGA.  You don't need high pin counts
on the FPGA, but unfortunately, they are usually proportional to the
macrocells.

Quoted text here. Click to load it

I would use ethernet.


Site Timeline