New soft processor core paper publisher?

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 5:17 PM

verification of the processor). Apparently OT is my favorite subject as it seems I'm always busy derailing my own (and others) threads. That, and Y'all have very interesting takes on these and various and sundry other things.

None of those real problems are related to "virtual stack-based processor paradigm".

That is just about what HotSpot does :) As L Peter Deutsch remarked when people had similar reservations about the first (Smalltalk) JIT in the mid 80s, if you can't tell the difference externally, the internals don't matter.

Now for some fun how emulated processors can be faster than native processors *even when both processors are the same* - take processor X executing cpu intensive C benchmarks compiled with optimisation on, and measure speed S1 - write an emulator for processor X and run that emulator on processor X - run those C benchmarks on in the emulator, see what the code is actually doing (as opposed to what the compiler dared not assume) - use that knowledge to "patch the optimised binaries" - run the patched binaries in the emulator, and measure speed S2 - note that S2 can be faster than S1

formatting link

That's a bizarre assertion.

backwards compatibility is the dominant commercial imperative: don't inconvenience your existing customers. (Windows 8? Tee hee)

Only given preconditions that don't apply in the Wintel world.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 5:27 PM

to include a lot more assembly, but that's quite rare now (though it is not uncommon to have to make little snippets in

it will include more C++ (especially with C++11 features). I also do desktop and server programming, mostly in Python,

If C++ is the answer, I want to know what the question is.

How many *years* does it take before the first commercial implementation of a C++ standard becomes available? Yes, I know partial implementations become available pretty quickly.

Soustroup's tome describing "this is what I meant by the various bits of C++" started out at ~400 pages and is now around 1300 pages. 'Nuff said.

Don't forget that it is possible to get the compiler to emit the sequence of prime numbers during the (unterminating) compilation process. The language designers didn't realise they had created such a monster until it was demonstrated to them!

a better understanding of programming in general, and new ideas of different ways to handle tasks.

Very true. Choose the right tool for the task at hand.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 6:08 PM

verification of the processor). Apparently OT is my favorite subject as it seems I'm always busy derailing my own (and others) threads. That, and Y'all have very interesting takes on these and various and sundry other things.

I am cc'ing this to the forth group in case anyone there cares to join in. I'm still a novice at the language so I only can give you my take on things.

concept of a weird target processor. I mean, I get the portability thing, but at what cost? If my experience as a casual user (not programmer) of Java on my PC is any indication (data point of one, the plural of anecdote isn't data, etc.), the virtual stack-based processor paradigm has failed, as the constant updates, security issues, etc. pretty much forced me to uninstall it. And I would think that a language targeting a processor model that is radically different than the physically underlying one would be terribly inefficient unless the compiler can do hand stands while juggling spinning plates on fire - even if it is, god knows what it spits out. Canonical stack processors and their languages (Forth, Java, Postscript) at this point seem to be hanging by a legacy thread (even if every PC runs one peripherally at one time or another).

Off the topic at hand, here is one of thunderbird's many issues as a news reader. It displays messages just fine in the reading window, but in the edit window all of your quoted paragraphs show as single lines goiing far off the right side of the screen. I have to switch back and forth to read the text I am replying to!

Back to the discussion...

By weird target processor you mean the virtual machine? That is because it is a very simple model. It does seem odd that such a model would be adopted, but the use of the stack makes for a very simple parameter passing method supported by very simple language features. There is no need for syntax other than spaces. That is *very* powerful and allows the tool to be kept very small.

Chuck Moore is all about simplicity and this is how he got this level of simplicity in the language.

suspect the Propeller has, and my processor definitely has) is such a natural construct - it fully utilizes the HW pipeline by eliminating all hazards, bubbles, stalls, branch prediction, etc. and uses the interstage registering for data and control value storage - that it will come more into common usage as compilers better adapt to multi-cores and threads. Then again, the industry never met a billion transistor bizarro world processor it didn't absolutely love, so what do I know?

So what clock speeds does your processor achieve? It is an interesting idea to pipeline everything and then treat the one processor as N processors running in parallel. I think you have mentioned that here before and I seem to recall taking a quick look at the idea some time back. It fits well with many of the features available in FPGAs and likely would do ok in an ASIC. I just would not have much need for it in most of the things I am looking at doing.

Rather than N totally independent processors, have you considered using pipelining to implement SIMD? This could get around some of the difficulties in the N wide processor like memory bandwidth.

this point. Apple showed us you can just dump your processor and switch horses in midstream pretty much whenever you feel like it (68k => PowerPC => x86) and not torch your product line / lose your customer base. I suppose having Intel and MS go belly up overnight is beyond the pale and at the root of why we can't have nice things. I remember buying my first 286, imagining of all the wonderful projects it would enable, and then finding out what complete dogs the processor and OS were - it was quite disillusioning for the big boys to sell me a lump of shit like that (and for a lot more than 3 farthings).

You know why the x86 is still in use. It is not really that bad in relation to the other architectures when measured objectively. It may not be the best, but there is a large investment, mostly by Intel. if Intel doesn't change why would anyone else? But that is being eroded by the ARM processors in the handheld market. We'll see if Intel can continue to adapt the x86 to low power and maintain a low cost.

I don't think MS is propping up the x86. They offer a version of Windows for the ARM don't they? As you say, there is a bit of processor specific code but the vast bulk of it is just a matter of saying ARMxyz rather than X86xyz. Developers are another matter. Not many want to support yet another target, period. If the market opens up for Windows on ARM devices then that can change. In the mean time it will be business as usual for desktop computing.

--

Rick

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 8:11 PM

The core will do ~200 MHz in the smallest Cyclone 3 or 4 speed grade 8 (the cheapest and slowest). It looks to the outside world like 8 independent p rocessors (threads) running at 25 MHz, each with its own independent interr upt. Internally each thread has 4 private general purpose stacks that are each 32 entries deep, but all threads fully share main memory (combined ins truction/data).

I haven't given this very much thought. But different cores could simultan eously work on different byte fields in a word in main memory so I'm not su re HW SIMD support is all that necessary.

This is just a small FPGA core, not an x86 killer. Though it beats me why more muscular processors don't employ these simple techniques.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 8:22 PM

Have you defined what happens when one processor writes to a memory location that is being read by another processor? In other words, what primitives do you provide that allow one processor to reliably communicate with another?

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 8:37 PM

cheapest and slowest). It looks to the outside world like 8 independent processors (threads) running at 25 MHz, each with its own independent interrupt. Internally each thread has 4 private general purpose stacks that are each 32 entries deep, but all threads fully share main memory (combined instruction/data).

simultaneously work on different byte fields in a word in main memory so I'm not sure HW SIMD support is all that necessary.

more muscular processors don't employ these simple techniques.

That is my point. With SIMD you have 1/8th the instruction rate saving memory accesses, but the same amount of data can be processed. Of course, it all depends on your app.

--

Rick

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 9:11 PM

Main memory is accessed by the threads sequentially, so there is no real co ntention possible.

None, it's all up to the programmer. Off the top of my head, one thread mi ght keep tabs on a certain memory location A looking for a change of some s ort, perform some activity in response to this, then write to a separate lo cation B that one or more other threads are similarly watching.

Another option (that I didn't implement, but it would be simple to do) woul d be to enable interrupt access via the local register set, giving threads the ability to interrupt one another for whatever reason. But doing this v ia a single register could lead to confusion because there is no atomic rea d/write access (and I don't think it's worth implementing atomics just for this). Each thread interrupt could be in a separate register I suppose. W ith an ocean of main memory available for flags and mail boxes and such I g uess I don't see the need for added complexity.

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 9:21 PM

But this processor core doesn't have a memory bandwidth bottleneck, so the instruction rate is moot.

Main memory is a full dual port BRAM, so each thread gets a chance to read/write data and fetch an instruction every cycle. The bandwidth is actually overkill - the fetch side write port is unused.

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 9:32 PM

I think the design document is good enough for general public consumption, so I applied for a project over at opencores.org (they say they'll get arou nd to it in one working day).

Still doing verification and minor code polishing, no bugs so far. All bra nch immediate distances and conditionals check out; interrupts are working as expected; stack functionality, depth, and error reporting via the local register set checks out. A log base 2 subroutine returns the same values a s a spreadsheet, ditto for restoring unsigned division. I just need to con firm a few more things like logical and arithmetic ALU operations and the c ode should be good to go.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jun 22, 2013 9:57 PM

contention possible.

OK, so what *atomic* synchronisation primitives are available? Classic examples involve atomic read-modify-write operations (e.g. test and set, compare and swap). And they are bloody difficult and non-scalable if there is any memory hierarchy.

That raises red flags with software engineers. Infamously with the Itanic, for example!

location A looking for a change of some sort, perform some activity in response to this, then write to a separate location B that one or more other threads are similarly watching.

be to enable interrupt access via the local register set, giving threads the ability to interrupt one another for whatever reason. But doing this via a single register could lead to confusion because there is no atomic read/write access (and I don't think it's worth implementing atomics just for this). Each thread interrupt could be in a separate register I suppose. With an ocean of main memory available for flags and mail boxes and such I guess I don't see the need for added complexity.

How do you propose to implement mailboxes reliably? You need to think of all the possible memory-access sequences, of course.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 3:44 AM

I don't get the question. Weren't semaphores invented a long time ago and require no special support from the processor?

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 3:45 AM

instruction rate is moot.

read/write data and fetch an instruction every cycle. The bandwidth is actually overkill - the fetch side write port is unused.

But that is only true if you limit yourself to on chip memory.

What is the app you designed this processor for?

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 3:47 AM

I applied for a project over at opencores.org (they say they'll get around to it in one working day).

immediate distances and conditionals check out; interrupts are working as expected; stack functionality, depth, and error reporting via the local register set checks out. A log base 2 subroutine returns the same values as a spreadsheet, ditto for restoring unsigned division. I just need to confirm a few more things like logical and arithmetic ALU operations and the code should be good to go.

Someone was talking about this recently, I don't recall if it was you or someone else. It was pointed out that the most important aspect of any core is the documentation. opencores has lots of pretty worthless cores because you have to reverse engineer them to do anything with them.

--

Rick

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 7:54 AM

The x86 was already considered an old-fashioned architecture the day it was first released. It was picked for the IBM PC (against the opinion of all the technical people, who wanted the 68000) because some PHB decided that the PC was a marketing experiment of no more than a 1000 or so units, so the processor didn't matter and they could pick the cheaper x86 chip.

Modern x86 chips are fantastic pieces of engineering - but they are fantastic implementations of a terrible original design. They are the world's best example that given enough money and clever people, you /can/ polish a turd.

MS props up the x86 - of that there is no doubt. MS doesn't really care if the chips are made by Intel, AMD, or any of the other x86 manufacturers that have come and gone.

MS tried to make Windows independent of the processor architecture when they first made Windows NT. That original ran on x86, MIPS, PPC and Alpha. But they charged MIPS, PPC and Alpha for the privilege - and when they couldn't afford the high costs to MS, they stopped paying and MS stopped making these Windows ports. MS did virtually nothing to promote these ports of Windows, and almost nothing to encourage any other developers to target them. They just took the cash from the processor manufacturers, and used it to split the workstation market (which was dominated by Unix on non-x86 processors) and discourage Unix.

I don't think MS particularly cares about x86 in any way - they just care that you run /their/ software. Pushing x86 above ever other architecture just makes things easier and cheaper for them.

Part of the problem is that there is a non-negligible proportion of Windows (and many third-party programs) design and code that /is/ x86-specific, and it is not separated into portability layers because it was never designed for portability - so porting is a lot more work than just a re-compile. It is even more work if you want to make the code run fast - there is a lot of "manual optimisation" in key windows code that is fine-tuned to the x86. For example, sometimes 8-bit variables will be used because they are the fastest choice on old x86 processors and modern x86 cpus handle them fine - but 32-bit RISC processors will need extra masking and extending instructions to use them. To be fair on MS, such code was written long before int_fast8_t and friends came into use.

However, it is a lot better these days than it used to be - the processor-specific code is much less as more code is in higher level languages, and in particular, little of the old assembly code remained. It is also easier on the third-party side, as steadily more developers are making cross-platform code for Windows, MacOS and Linux - such code is far easier to port to other processors.

As for Windows on the ARM, it is widely considered to be a bad joke. It exists to try and take some of the ARM tablet market, but is basically a con - it is a poor substitute for Android or iOS as a pad system, and like other pads, it is a poor substitute for a "real" PC for content creation rather than just viewing. People buy it thinking they can run Windows programs on their new pad (they can't), or that they can use it for MS Office applications (they can't - it's a cut-down version that has less features than Polaris office on Android, and you can't do sensible work on a pad anyway).

If ARM takes off as a replacement for x86, it will not be due to MS - it will be due to Linux. The first "target" is the server world - software running on Linux servers is already highly portable across processors. For most of the software, you have the source code and you just re-compile (by "you", I mean usually Red Hat, Suse, or other distro). Proprietary Linux server apps are also usually equally portable - if Oracle sees a market for their software on ARM Linux servers, they'll do the re-compile quickly and easily.

/Real/ Windows for ARM, if we ever see it, will therefore come first to servers.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 8:00 AM

If threads can do some sort of compare-and-swap instruction, then semaphores should be no problem with this architecture. Without them, there are algorithms to make semaphores but they don't scale well (typically each semaphore needs a memory location for each thread that might want access, and taking the semaphore requires a read of each of these locations). It helps if you can guarantee that your thread has a certain proportion of the processor time - i.e., there is a limit to how much other threads can do in between instructions of your thread.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 8:38 AM

There are two main reasons.

The first, and perhaps most important, is non-technical - C (and to a much smaller extent, C++) is the most popular language for embedded development. That means it is the best supported by tools, best understood by other developers, has the most sample code and libraries, etc. There are a few niches where other languages are used - assembly, Ada, etc. And of course there are hobby developers, lone wolves, and amateurs pretending to be professionals who pick Pascal, Basic, or Forth.

I get to pick these things myself to a fair extent (with some FPGA work long ago, I used confluence rather than the standard VHDL/Verilog). But I would need very strong reasons to pick anything other than C or C++ for embedded development.

The other reason is more technical - Forth is simply not a great language for embedded development work.

It certainly has some good points - its interactivity is very nice, and you can write very compact source code.

But the stack model makes it hard to work with more complex functions, so it is difficult to be sure your code is correct and maintainable. The traditional Forth solution is to break your code into lots of tiny pieces - but that means the programmer is jumping back and forth in the code, rather than working with sequential events in a logical sequence. The arithmetic model makes it hard to work with different sized types, which are essential in embedded systems - the lack of overloading on arithmetic operators means a lot of manual work in manipulating types and getting the correct variant of the operator you want. The highly flexible syntax means that static error checking is almost non-existent.

- A
- Andrew Haley
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 9:04 AM

Let's pick this apart a bit. Firstly, most Java updates and security bugs have nothing whatsoever to do with the concept of a virtual machine. They're almost always caused by coding errors in the library, and they'd be bugs regardless of the architecture of the virtual machine. Secondly, targeting a processor model that is radically different than the physically underlying one is what every optimizing compiler does evey day, and Java is no different.

Not even remotely true. Java is either the most popular or the second most popular porgramming language in the world. Most Java runs on servers; the desktop is such a tiny part of the market that even if everyone drops Java in the browser it will make almost no difference.

Andrew.

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 9:09 AM

I think most Java runs on SIM cards. Of course there are more of those than desktops and servers put together.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 9:31 AM

require no special support from the processor?

Of course they are one communications mechanism, but not the only one. Implementation can be made impossible by some design decisions. Whether support is "special" depends on what you regard as "normal", so I can't give you an answer to that one!

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Jun 23, 2013 11:34 AM

I just posted the design document:

formatting link

I'd be interested in any comments, my email address is in the document. I'll post the verilog soon.

Cheers!