Building high end Computers

Found this article interesting if you want to build a powerful computer for a very reasonable price. The only down side is higher power consumption than the latest chips. Small price to pay in my opinion.

formatting link

Mikek

Reply to
amdx
Loading thread data ...

Just Imaging what the current generation of Xeons can do.

Cheers

Reply to
Martin Riddle

I built a 250 Gflops compute server a few months ago, for a total outlay of about $900, mostly for the chassis/PSU ($239) and 500 GB SSD. It's a Super micro H8DGi-F mobo, dual AMD 6176 (maybe 6178) 12-core, 2.3 GHz Opterons, 6

4 GB ECC RAM, and a low-end Radeon video card.

I'm using it for multithreaded electromagnetic simulations, and it flies. S lightly-used server parts are cheap like borscht.

(It's a near copy of my main desktop machine, for which I paid almost $4k, about 4 years ago.)

Cheers

Phil Hobbs

Reply to
Phil Hobbs

Of course one reason server parts are cheap is the cost of electricity--newer stuff generally gets you more computrons per watt, and leaving that new box crunching costs about 400 kWh per month.

Which isn't a bad return. Of course if you don't leave it running all the time, it doesn't matter as much.

Cheers

Phil Hobbs

Reply to
Phil Hobbs

The other day I saw a show where someone talked about "sending it back to the office and running it through the mainframe". I thought, you mean the mainframe sitting under your desk? Who has "mainframes" anymore? What is a "high end computer"? Is that one that uses more power than the 100 watt lightbulbs I can't buy anymore. The same computer that will be obsolete by Christmas? Not to mention the one that will never leave your building until it is thrown in the trash or packed in a box when you move.

I thought nearly everyone had laptops these days. Everyone on TV has one with a lighted apple on the cover.

--

Rick
Reply to
rickman

I have a surplus 4 unit rackmount enclosure, 700 watt Corsair power supply, and Radeon pci-x video card collecting dust in my closet. Should I do dis

--
----Android NewsGroup Reader---- 
http://usenet.sinaapp.com/
 Click to see the full signature
Reply to
bitrex

Only if you have a use for it. Reading and responding to usenet doesn't count. Mikek

Reply to
amdx

I mostly use a $100 Samsung tablet for that. :)

--
----Android NewsGroup Reader---- 
http://usenet.sinaapp.com/
 Click to see the full signature
Reply to
bitrex

Did you price the equivalent (or better) grunt from Amazon EC2?

The machine is yours only for the hours and minutes that you need it, and you can schedule that cheaply, when it's surplus to other folk's requirements.

Reply to
Clifford Heath

Sure. The simulator is one that I wrote myself some years back, that runs o n multiprocessors and heterogeneous clusters. It's written on the bare soc kets API because it's very sensitive to latency. It would also be a signifi cant chunk of work to port it to a cloud system. Besides, it's only a grand 's worth of computer, for 250 Gflops peak. Ten years ago I spent about 60 g rand of IBM's money putting together a 14-processor cluster that was quite a bit slower than that, to run the same program. Nice piece of code imnsh o.

formatting link

Cheers

Phil Hobbs

Reply to
Phil Hobbs

You could almost certainly run the exact same code unchanged, in a c4.8xlarge (36CPU Intel Xeon E5-2666 v3 cluster with 60GB RAM and 10GB Ethernet network specially enhanced for low latency), very probably on the exact same operating system you're using now; just snapshot your disk image as an AMI and upload it.

LINPACK performance is about 1330GFlops (more than 5x your machine) and with faster lower-latency networking.

About $0.36/hour for spot instances at the moment; spot price updates every five minutes. How's that $1000 looking now? :)

Reply to
Clifford Heath

If it were just one disc, and the same image for each node, maybe. Wasn't true last time I checked, by a lot. Ever written a big iron simulator? Those "just"'s and "almost certainly"'s suggest not.

Cheers

Phil Hobbs

Reply to
Phil Hobbs

The software can't use the graphic card (Open CL/CUDA/...) for computation? GPUs are usually much more powerful than CPUs for brute force calculations.

Bye Jack

Reply to
jack4747

?

In lots of cases that's true. However GPUs can't manage as much memory (eve n more true when the code was written). There's also a lot more footwork in volved in communicating between them. Another worry is that these specializ ed toolkits will go away in favour of the Cool Flavour of the Month, and I' ll be stuck. C++, pthreads, and the sockets API aren't going anywhere soon. (That's also why I didn't use OpenMPI.)

The program is over 60k SLOC, which is a lot for one guy's side project, so it's going to stay where it is. Runs great, gets the right answer, and pro duces pretty output. Plus it has an amazingly (imho) powerful optimizer, wh ich was the point of writing it.

Cheers

Phil Hobbs

Reply to
Phil Hobbs

On Wed, 6 Apr 2016 07:31:06 -0700 (PDT), snipped-for-privacy@gmail.com Gave us:

NVidia rules!

Reply to
DecadentLinuxUserNumeroUno

On Wed, 6 Apr 2016 08:07:56 -0700 (PDT), Phil Hobbs Gave us:

So leave it where it is, and write another version of it to run in a CUDA realm.

Reply to
DecadentLinuxUserNumeroUno

ven more true when the code was written). There's also a lot more footwork involved

AMD Firepro and Nvidia Tesla have 32GB of RAM...

olkits >will go away in favour of the Cool Flavour of the Month, and I'll b e stuck. C++, >pthreads, and the sockets API aren't going anywhere soon. (T hat's also why I >didn't use OpenMPI.)

That's the point of OpenCL, it abstract whats under the hood, you focus on the algorithm and let OpenCL manage processor, memory, GPU, FPGA,...

Bye Jack

Reply to
jack4747

Wow. Out of curiosity, how long did you write it? Assuming 2hrs/day (side project), I'd wager 8 years. :)

How close am I? :)

Reply to
Aleksandar Kuktin

About half that. It started out as a big wrapper script for Berkeley Tempes t 6.0, to generate the geometry and sources, plus a C++ postprocessor to ca lculate stuff like field amplitudes, Poynting vector, power dissipation, fa r-field and all sorts of integrals of those things, for use in the optimize r's merit function.

It has a math-program type interface, so you can parameterize anything and apply Nelder-Mead optimization to anything.

That worked fine until I started needing to do multiparameter optimization on fine structures made of silicon and noble metals, at which point the run times went through the ceiling.

At that point I started coding the actual simulation engine, which accepts a subset of the Tempest input file syntax and puts out compatible field fil es (there are lots of extensions to both, of course). That meant that I cou ld check my results by redoing them on Tempest and comparing.

The program is convenient to run on a multicore box, and only a bit less co nvenient on a cluster. (You always wind up with The Cluster Script From Pur gatory, no matter what you start out with.) ;)

Most FDTD codes have a triple loop (ine per dimension) with a gigantic swit ch statement inside that picks the right set of E->H or H->E coefficients, depending on what material is in the current voxel. That's inefficient and doesn't optimize well on account of all the branches and pipeline funnies. It's also next to impossible to vectorize.

My code precomputes a strategy instead. It goes through the simulation volu me looking for runs of the same material. For each one, it makes a data str ucture with pointers to the nearest neighbours and the coefficients. That w ay the inner loop works on a single material, which saves a lot of copying and vectorizes well.

That makes my code about 2x faster than Tempest on a single core, and more like 35 times faster on my new box. That makes multiparameter optimizations pretty tractable.

On my 24-core Opteron box, it scales linearly to within 30% up to 24 cores. I usually use 16 or so, because with all 24 going, the other stuff running on the box slows the sim down.

A somewhat downlevel manual is at

formatting link

There are a few bugs and regressions, but overall it works great.

Cheers

Phil Hobbs

Reply to
Phil Hobbs

Where were they in 2006? ;)

Abstracting isn't usually the best way to get performance. In any case, it's another Cool Flavour of the Month that will very likely leave you high and dry eventually. C++, pthreads, and Berkeley Sockets will certainly outlive me.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
 Click to see the full signature
Reply to
Phil Hobbs

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.