New soft processor core paper publisher?

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 3:21 AM

No. The only requirement for semaphores to work is to be able to turn off interrupts briefly.

--
Les Cargill

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 3:42 AM

That shouldn't be a surprise to anyone. The guy who designed the ZPU found that out the hard way.

Tell me about it ;^)

--

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 4:07 AM

and energy on canonical one and two stack machines. There just aren't enough stacks, so unless you want to deal with the top entry or two right now you'll be digging around, wasting both programming and real time, and getting confused. And they automatically toss data away that you often very much need, so you waste more time copying it or reloading it or whatever. I spent years trying to like them, thinking the problem was me. The J processor really helped break the spell.

way, but I do have to sell it to some degree (the paper ends with the down sides that I'm aware of, I'm sure there are more).

I'm glad you can take (hopefully) constructive criticism. I was concerned when I wrote the above that it might be a bit too blunt.

It will be a while before I get to the end of your paper. Do you describe the applications you think the design would be good for? One reason I don't completely agree with you about the suitability of MISC type CPUs is that there are many apps with different requirements. Some will definitely do better with a design other than yours. I wonder if you had some specific class of applications that you were seeing that you didn't think the MISC approach was optimal for or if it was just the various "features" of MISC that didn't suit your tastes.

and book on stack machines and Forth that I've encountered has a vibe of "look at this revolutionary idea that the man has managed to keep down!" Absolutely no down sides mentioned, so the hapless noob is left with much too flattering of an impression. In my case this false impression was quite lasting, so I guess I've got something of an axe to grind. Perhaps I'll moderate this in future releases of the design document.

I can't argue with you on this one. When I first saw the GA144 design it sounded fantastic! But that is typical corporate product hype. The reality of the chip is very different. When it comes to CPU cores for FPGAs I don't see a lot of difference. Check out some of the other offerings on Opencores. Everyone touts their design as something pretty special even if they are just one of two or three that do the same thing! I think they had some five or six PIC implementations and all seemed to say they were the best!

I do have to say I am not in complete agreement with you about the issues of MISC machines. Yes, there can be a lot of stack ops compared to a register machine. But these can be minimized with careful programming. I know that from experience. However, part of the utility of a design is the ease of programming efficiently. I haven't looked at yours yet, but just picturing the four stacks makes it seem pretty simple... so far. :^)

I have to say I'm not crazy about the large instruction word. That is one of the appealing things about MISC to me. I work in very small FPGAs and 16 bit instructions are better avoided if possible, but that may be a red herring. What matters is how many bytes a given program uses, not how many bits are in an instruction.

I am supposed to present to the SVFIG and I think your design would be a very interesting part of the presentation unless you think you would rather present yourself. I'm sure they would like to hear about it and they likely would be interested in your opinions on MISC. I know I am.

--

Rick

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 4:49 AM

(snip)

What about other processors or I/O using the same memory?

-- glen

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 7:24 AM

Nope. I've too many other things to understand in detail. I have no bandwidth to debug your design.

never write to the same address then by definition it can't happen (unless there is a bug in the code). I probably haven't thought about this as much as you have, but I don't see the fundamental need for more hardware if the programmer does his/her job.

The problems that arise with the lack of atomic operations and/or semaphores are a known problem. Any respectable university-level software course will cover the problems and various solutions.

Consider trying to pass a message consisting of one integer from one thread to another such that the receiving thread is guaranteed to be able to picks it up exactly once.

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 12:03 PM

Thread A works on the integer value and when it is done it writes it to loc ation Z. It then reads a value at location X, increments it, and writes it back to location X.

Thread B has been repeatedly reading location X and notices it has been inc remented. It reads the integer value at Z, performs some function on it, a nd writes it back to location Z. It then reads a value at Y, increments it , and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.

The above seems airtight to me if reads and writes to memory are not cached or otherwise delayed, and I don't see how interrupts are germane, but perh aps I haven't taken everything into account.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 12:30 PM

location Z. It then reads a value at location X, increments it, and writes it back to location X.

incremented. It reads the integer value at Z, performs some function on it, and writes it back to location Z. It then reads a value at Y, increments it, and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.

otherwise delayed, and I don't see how interrupts are germane, but perhaps I haven't taken everything into account.

Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.

Do some research to find why "test and set" and "compare and swap" instructions exist.

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 12:31 PM

Then it's no longer what I would call a semaphore. A semaphore is, SFAIK, only a Dijkstra P() or V() operation.

It's not that things like this don't exist but rather that they should be called something else, like "bus arbitration scheme."

--
Les Cargill

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 12:50 PM

formatting link

--
Les Cargill

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 1:28 PM

sequence, and the other thread runs. You can get double or missed updates.

I think you're missing the point that in my processor the threads run concurrently, not sequentially.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 1:47 PM

sequence, and the other thread runs. You can get double or missed updates.

concurrently, not sequentially.

Nope. That usually exacerbates problems, plus having 8-port memory (one for each thread) is not cheap!

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 3:57 PM

nd exchange operations. What theoretical advance have you made?

I'm not exactly sure why we're having this generalized, theoretical discuss ion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be o n as solid a footing as possible.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 4:13 PM

exchange operations. What theoretical advance have you made?

when a simple reading the design document I've provided would probably answer your questions.

Please point me to the section which discusses the primitive operations/attributes/properties that you have provided to enable inter-thread communication.

include that info in the next rev.

Sorry, I don't have time to poorly recapitulate subjects that have been know about and solved for decades.

processor) to be on as solid a footing as possible.

Ditto, and I don't have time to find the flaws in your arguments.

If you want people to use your processor, it might be wise to give them the information they need to have confidence in its design.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 4:35 PM

exchange operations. What theoretical advance have you made?

when a simple reading the design document I've provided would probably answer your questions. If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.

Eric, I think you have explained properly how your design will deal with synchronization. I'm not sure what Tom is going on about. Clearly he doesn't understand your design.

If it is of any help, Eric's design is more like 8 cores running in parallel, time sharing memory and in fact, the same processor hardware on a machine cycle basis (so no 8 ported memory required). If an interrupt occurs it doesn't cause one of the other 7 tasks to run, they are already running, it simply invokes the interrupt handler. I believe Eric is not envisioning multiple tasks on a single processor.

As others have pointed out, test and set instructions are not required to support concurrency and communications. They are certainly nice to have, but are not essential. In your case they would be superfluous.

--

Rick

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 4:56 PM

exchange operations. What theoretical advance have you made?

discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then

the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a

synchronization. I'm not sure what Tom is going on about. Clearly he doesn't understand your design.

Correct.

time sharing memory and in fact, the same processor hardware on a machine cycle basis

Fair enough; sounds like it is in the same area as the propellor chip.

Is there anything to prevent multiple cores reading/writing the same memory location in the same machine cycle? What is the result when that happens?

are already running, it simply invokes the interrupt handler. I believe Eric is not envisioning multiple tasks on a

Such presumptions would be useful to have in the white paper.

support concurrency and communications. They are certainly nice to have, but are not essential.

Agreed. I'm perfectly prepared to accept alternative techniques, e.g. disable interrupts.

Not proven to me.

The trouble is I've seen too many hardware designs that leave the awkward problems to software - especially first efforts by small teams.

And too often those problems can be very difficult to solve in software. Nowadays it is hard to find people that have sufficient experience across the whole hardware/firmware/system software spectrum to enable them to avoid such traps.

I don't know whether Eric is such a person, but I'm afraid his answers have raised orange flags in my mind.

As a point of reference, I had similar misgivings when I first heard about the Itanium's architecture in, IIRC,

1994. I suppressed them because the people involved were undoubtedly more skilled in the area that I, and had been working for 5 years. Much later I regrettably came to the conclusion the orange flags were too optimistic.

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 6:01 PM

(snip)

Yes. Even if the threads don't communicate with each other, they might share I/O devices which needs some communication. (sniP)

The 8087 was originally designed to have a virtual stack, where on stack overflow an interrupt would trigger a software routine to spill some stack registers to memory, and on underflow bring them back again. But no-one tried to write the interrupt routine until the hardware was done, and it turned out that it wasn't possible. Not all the required state was available or settable.

They might have fixed it in the 80287, but then they had to be compatible with the 8087. Actually, I don't know why they didn't fix it, but it still isn't fixed.

-- glen

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 6:45 PM

Oh, inaccessible state is a problem that has been repeated many times in many companies! Often to be found near to virtual memory tables, exceptions, interrupts, and debuggers - making Heisenbugs the norm not the exception :(

I strongly suspect compatibility is the (fully justifiable) reason; it is the reason for all sorts of hardware and software cruft.

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 7:25 PM

I apologize to everyone here, I kind of barged in and have behaved somewhat brashly.

Writing a conventional stack machine in an HDL isn't too daunting, but prog ramming it afterward, for me anyway, was just too much.

Yes. Opcode space obviously expands exponentially with bit count, so one c an get a lot more with a small size increase. I think a 32 bit opcode is p ushing it for a small FPGA implementation, but a 16 bit opcode gives one a couple of small operand indices, and some reasonably sized immediate instru ctions (data, conditional jumps, shifts, add) that I find I'm using quite a bit during the testing and verification phase. Data plus operation in a s ingle opcode is hard to beat for efficiency but it has to earn it's keep in the expanded opcode space. With the operand indices you get a free copy/m ove with most single operand operations which is another efficiency.

I'm on the other coast so I most likely can't attend, but I would be most h onored if you were to present it to SVFIG.

- E
- Eric Wallin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 7:47 PM

I know what you mean. Maybe I'm too picky, or perhaps too rigid to conform to other's coding styles, but 90% of the HDL code I've encountered both vo cationally and avocationally has been quite poor. I tend to rewrite every thing, including my slightly older code if I'm using it again, because my s tyle keeps evolving, and it never hurts to take another look at things or a t least give them some more polish.

Processors aren't my main thing, but I do have need of them in my main thin g so I've been quite interested in them for ~15 years now. My EE graduate adviser was a professor of computer engineering and I took a couple of his courses. I own and have read the two Hennesey & Patterson (sp?) texts, tho ugh I must admit my eyes glazed over when TLBs and pipeline hazards were be ing discussed.

Discovering stack machines was a transforming experience for me, showing th at it was possible to have much simpler and tractable HW underlying it all. But it's hard to beat indexed registers. This hybrid is something of a m iddle ground, and so far I'm not finding it too revolting to program by han d - then again it might be something only a mother can love.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jun 24, 2013 8:02 PM

other's coding styles, but 90% of the HDL code I've encountered both vocationally and avocationally has been quite poor. I tend to rewrite everything, including my slightly older code if I'm using it again, because my style keeps evolving, and it never hurts to take another look at things or at least give them some more polish.

so I've been quite interested in them for ~15 years now. My EE graduate adviser was a professor of computer engineering and I took a couple of his courses. I own and have read the two Hennesey & Patterson (sp?) texts, though I must admit my eyes glazed over when TLBs and pipeline hazards were being discussed.

it was possible to have much simpler and tractable HW underlying it all. But it's hard to beat indexed registers. This hybrid is something of a middle ground, and so far I'm not finding it too revolting to program by hand - then again it might be something only a mother can love.

Doing something "just because I want to" is an _excellent_ reason. For me probably the next such project will be to make a cheapskate 1GS/s oscilloscope & TDR. But there's a heck of a learning curve w.r.t. FPGA clocking, i/o structures, floorplanning and dev kits nowadays!

Whenever I've come across people that say "if you have problem X then my product will solve it provided Y applies", I give them more credence if they also say "but my product doesn't do Z, you have to do that some other way". I'm sure you've had similar experiences.