Distributed microcontroller computing

- X
- xarvia
  
  Contact options for registered users
posted
18 years ago

Thu, Oct 13, 2005 5:39 PM

I've got a pure math problem implemented in C that will take about 3 years to solve using all 5pcs available to me (the algorithm is about as efficient as it will get without some major mathematical insights).

The funny thing about this problem is that I can easily split it into smaller problems each of which only requires about 1000 bytes of memory and only uses bitwise operations and additions mod 256.

I was considering the possibility of creating a network of microcontrollers, probably using I2C, with one main processor dishing out smaller sub problems and reporting results back to a PC. To date I have only used Microchip PICs, primarily the 18f4550 and done my programming in MPLAB ASM. I think I could easily implement this setup using these chips but they are too slow (40MHz about 20Mips I think).

I have heard that there are some high speed microcontrollers out there, I had a quick look at Freescales range of PowerPC based embedded processors. These are about $25/800Mips - as far as I can tell this would be significantly cheaper on a per MIP basis than buying computer hardware (even omitting the hdd and using minimal ram).

My questions:

What is the best $/Mips microcontroller out there, preferably with I2C.

What is the best $/Mips microcontroller that I will be able to understand without too much trouble given that I have only used Microchip products so far?
The technical docs for the Freescale embedded processors left me somewhat confused (quite a jump up from microchip microcontrollers!!!) what external components do these devices need to run? In particular is the program code stored in the device or does it obtain this via the bus?

Now I'm not sure about the gory technical details but I assume that given the simplicity of the instructions my program needs to execute the MIPS figures for microcontrollers should be fairly comparable with that for desktops. The main difference I can see here would be regarding branching - I've heard that modern processors go to great lengths to predict which instructions will be executed so that no time is wasted while retrieving program instructions...
Another possibility that would keep me busy for a while is implementing the program in assembler on my PC. Unfortunately every computer that I have access to has a different architecture (Athlon,Sempron,Duron,2xPentium M) so I guess I would have to restrict myself to x86 instruction set? What sort of performance increase do you think this might produce?

Thank you very much for your time and I look forward to hearing from you!

- Z
- zwsdotcom
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 5:48 PM

If you can factor this problem as well as you claim, and it is only going to take 5,475 PC-days to solve it, you're much better off to either:

rent distributed computer time, or
buy as many PCs as it will take to solve the problem in the desired timeframe. Bear in mind that they only need minimal RAM and a tiny hard disk (or flash card) and network connection, no display or anything else. Probably each node can be built for ~0. If you want the answer in one month, you need 182 computers. If you're happy to get it in six months, you only need 31 computers (or rather, 26 additional machines - ,600, much less than the cost of developing your custom hardware and probably the same timeframe).

Unless this is an academic exercise - and it sounds more like you want to get into a specific piece of encrypted data - you're wasting time and money developing and debugging a custom multiprocessor engine.

- X
- xarvia
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 6:26 PM

I realise that building/renting distributed computing time would be the more efficient way to go were I to take my own time into account. However this is just a hobby/academic exercise. I should probably have explained that in my initial post. I think setting this up would be an interesting exercise and would teach me a lot that I want to understand about microcontrollers/asm/distributed computing.

Also you can heave a sigh of relief that I'm not attempting to crack into encrypted data! The problem I'm looking at is one that a few of the other postgrads and I have in the Math dept had a bit of a play with. As far as I can tell there is absolutely no practical application to anything arising from this other than having an answer!!!:

The problem is to assign one of 5 colours for each of the integers from

1 upwards such that if any two integers have the same colour the integer that they sum to must have a different colour:

eg. If we have

1-green 2-blue 3-green

then 4 cannot be green or blue as 1+3 = 4 and 2+2 = 4.

A correct sequence of 160 digits for 5 colours is known. I wish to find a sequence of 162 digits.

- X
- xarvia
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 6:33 PM

Hi Rich,

Communication speed between the processors is of relatively little importance, depending on how the processes are setup. I suggested I2C because that's what I'm familiar with with microcontrollers thus far.

The reason I thought of microcontrollers is that I'm utilising very little of the sophistication of a modern computer and none of the baggage that comes along with it in terms of motherboard etc. So I thought perhaps you could do the same with some of the faster microcontrollers out there which I think can run up to 1000Mips.

I'm just a student so it wouldn't be too many micros. If you're keen to donate CPU time I'm sure I can find a use for it - Internet mersenne / SETI @ home style.

Cheers!

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 6:56 PM

The Freescale 32-bit microcontrollers (at least the ones that I know of) don't have any on-board ROM, flash or otherwise. You'd have to design a PC board with flash, debug it, then build a bunch of iterations.

Even if your time is free by the time you have something working you'll have spent more than $100/ea -- PC boards are cheap when you build them in huge quantities, but not when you do onsie-twosie.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 6:59 PM

If you just see how many cheap motherboards you can pack into one case with all their ethernet ports wired together you'll learn lots about your chosen subject -- particularly if you write your core algorithms in assembly. You'll also end up spending less $/mip than you would if you tried to build stuff in small quantities from scratch, even if your time is free.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 7:02 PM

It's likely that your algorithm can be implemented in an FPGA, perhaps entirely in parallel, one iteration per clock. Then one largish FPGA may be able to crunch tens or even hundreds of copies every clock, and clock at 100 MHz or more. This could potentially out-compute the fastest Pentium by a factor on the order of thousands.

You can buy inexpensive PCI boards with one or several Xilinx FPGAs on board.

What's the problem being solved?

John

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 7:05 PM

2+2 ;-)

...Jim Thompson

--
|  James E.Thompson, P.E.                           |    mens     |
|  Analog Innovations, Inc.                         |     et      |
|  Analog/Mixed-Signal ASIC\'s and Discrete Systems  |    manus    |
|  Phoenix, Arizona            Voice:(480)460-2350  |             |
|  E-mail Address at Website     Fax:(480)460-2142  |  Brass Rat  |
|       http://www.analog-innovations.com           |    1962     |
             
I love to cook with wine.      Sometimes I even put it in the food.

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 7:06 PM

Ooh. Clever. I should have thought of that.

He describes it for Rich the Affluent Wacko, or whatever he's calling himself these days.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 9:21 PM

The Connection Machine! ;-)

How many chips are you planning to use? Are you saying it will take

5 X 3 PC-years? What happens if you do this same thing over an ordinary LAN?

I've written 8087 code, to do the inner loop for a Mandelbrot set on an AT. (anybody remember them?) Don't all PCs these days (I assume by "5pcs" you mean, "five personal computers") have a math coprocessor?

If you can't do this with today's multi-GHz PCs and a decent LAN, how many micros are you planning on throwing at this problem? Can I have a piece of the action? ;-)

Good Luck! Rich

- Z
- zwsdotcom
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Oct 13, 2005 9:48 PM

Maybe, but if I was your professor and you used this approach, I'd mark you down because it's poor engineering practice to invent a wheel where adequate and affordable solutions exist.

Engineering a device that matches the $/mip of a COTS PC is not a trivial job, especially if you consider reliability, and it's not necessary. The micros you are considering don't offer any specific performance benefit that seems particularly apposite to your application. Micros that approach the performance level of a desktop PC CPU are exceedingly difficult to design in - there are board-level engineering issues that you just don't want to get into, plus the parts themselves are hard to work with by hand.

Custom massively parallel engines are built when one of the following is true:

It's much cheaper, $/op (where "op" is "iteration of your specific algorithm" to build custom hardware than use COTS hardware with custom software, or
There is some operation that can be vastly speeded by a custom processor design.

- R
- Rich, Under the Affluence
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 12:36 AM

You've clearly come to the wrong group.

The genii at rec.puzzles will answer this in a day or so. ;-)

Good Luck! Rich

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 1:15 PM

If you ever start posting as "Under the Effluent" I'm going to worry.

But then you work as a support person at a university or corporation, don't you? So I guess you're already there, at least partially.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

- C
- cs_posting
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 2:27 PM

What you want is not a general purpose processor, but a dedicated computational unit implemented in an FPGA... or a lot of them.

- D
- dalai lamah
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 3:20 PM

Un bel giorno xarvia digitò:

Looks like the ideal task of one ore more FPGAs. For tasks like this, you can reach performances some orders of magnitude better than any general purpose CPU.

--
asd

- R
- Rich, Under the Affluence
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 8:27 PM

I'm a consulting engineer who's on standby most of the time. It doesn't pay anything unless there's work, but the fringe bennies are fantastic, like, I get to sit here and play on the computer all day, and the PHB isn't charging me any rent to park my RV in his lot. :-) (I've been considering sending in a picture to "Redneck Yard" on "Blue Collar TV" - I'm usually surrounded by big weldments:

formatting link

:-) )

Thanks! Rich (BTW, "under the affluence" is the short form of "under the affluence of incohol.", as in, "Hi, I'm John Audubon. I watch birds." "Hi, I'm Bill Spooner. I watch birds.") ;-)

- R
- Rich, Under the Affluence
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Oct 14, 2005 11:52 PM

Or, "Hi, I'm Gritch Rice. I jotch bokes. ;-D (apologies to Spehro, or John, or whoever was the originator of that particular boke. ;-P )

Cheers! Rich

- P
- petrus bitbyter
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Oct 15, 2005 2:50 AM

"xarvia" schreef in bericht news: snipped-for-privacy@g43g2000cwa.googlegroups.com...

Looks like interesting to me. Like others said already, I have the feeling a dedicated eight bits processor may be the best solution. Still I'd like to know somewhat more about your approach of this problem. I can imagine a brute force attack, just trying and backtracking, but I guess your distributed approach will be better. Can you tell more without revealing top secret objects? For example, how does the program looks like you want your micros to run?

Some other questions. How does the 160 digits solution looks like and how has this been calculated? Are you sure the 162 digits sequence exists? Why not looking at the 161 digits sequence first?

petrus bitbyter

- I
- Ian Stirling
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Oct 15, 2005 9:01 PM

Also. It may be possible to rewrite your algorithm, using the MMX units of a processor, or even just the registers, using the processor as a SIMD machine. DES IIRC has been done dramatically faster this way than normal.

- F
- Frithiof Andreas Jensen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Oct 24, 2005 5:40 PM

Buy an array of them then:

formatting link

The CPU on the PC or the GPU on the Graphics Card!

A NEGATIVE increase - the optimiser that comes with the compiler is likely to kick any naive assembler implementation.

The optimiser/compiler will "know" what works best on the target architecture - in most cases better than the developer.

Why not look at hacking the GPU instead??

The Graphics processor is *much* more powerful than the CPU is - for simple logic operations - and it is massively parallel to begin with, depending on how much money you shell on the card.

Finally, hacking GPU's is a HOT a worthy topic of Papers and Presentation, while grubbing around with mouse controllers and FPGA's is geeky and so yesterday.

Some meat here:

formatting link