Zero operand CPUs

- A
- AliBama
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 20, 2009 9:54 PM

------------ Previously Jacko wrote:

========= Does $lit , $666 , $lit , $222 mean: "push $666, push $222" ?

Then SI FA FO BA == (S)->A ; (S)->Q ; A->(Q) ; (R)->P == ?

If it's a stack-machine, then which register is the TOS-pointer ?

Let's try to work backwards:- (R)->P should move #$222 to address $665?6 ==> P == address $665 [or $666] and R pointed to mem-containing #$222

So, how did $lit , $222 [ push $222] get $222 into mem pointed to by R ?

------------- Perhaps this is all obvious for someone with a VHDL design background, but this forth-group has just degenerated into clowing, with this thread.

Can anybody c> > How many bits are in an opcode, 4 or 5?

IMO this corresponds to the wiki:

So, '16 basic instructions' [need 4 bits] and the BO instruction, would get the BasicInstr/Subroutine 1-bit-flag.

This would imply a 5bit wide word, which is obviously not the case ?

------------- who can uderstand/explain this:--

...

--
Since of the 16 instruction, the only ALU types: +, xor, and;
use 'A' as one would expect an accumulator to be used 
[as a source & destination for the binary op.], 'A' *is* the
accumulator !! Similarly 'S' is seen to be the TOS-pointer.
--- wiki says:
> All  opcodes above 15 are subroutine call addresses.
OK, so for an 8 bit wide word, you get 256-15 possible subroutines ?
And the subroutines also use the basic 16 instructions, including
 possibly nested-subroutine/s ?
No, because the first-level of subroutines has already been allocated
 the 256-16 subroutine-pointers ?
Ok, you can only have 256-16 *different* subroutines, but they can be
 nested - limited only by RAM to hold a stacked-returns ?

== Chris Glur.

- J
- Jacko
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Mar 20, 2009 11:39 PM

Yes when lit is defined as the soubroutine equivelent to lit.

I would think so!! TOS->A, TOS(2)->Q, STORE TOS -> TOS(2) ADDRESS, GET REURN TO PROGRAM COUNTER

S, but a top of stack optimization using A may be possible, for speed, but lower space efficiency.

RI FI SO RO BA where SO RO commutes to RO SO as a duplicate expression for same function.

(R)->Q,(Q)->A,A->(S),Q->(R),(R)->P get return address and get indirect next address following return address, and save this on stack, and put incremented return address back on return stack and get return address (modified by +1) into program counter to execute a return to the address following the literal value.

;-)

=A0

n BO.

of

No, a basic instruction is a number under 16, any number over 15 is an address. This does have a disadvantage of not being able to call a subroutine below address 16 but this is not a major fault, as boot code would be here, and it is possible to place a subroutine call instruction within these addresses.

The implication of the extra bit 'needed' is not a true account of functioning.

instructions, SU of the carry to (R)

(R)->Q,Q->(S),(S)->A,A+(S)->A,A->(S),(S)->Q,Q->(R),(R)->P

Hope this helps.

Cheers jacko

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 21, 2009 4:30 AM

Ah, but it is. In your specific implementation, you have not only a fifth bit, but also a sixth, seventh all the way up to 16th, no? You have a 16 bit instruction word and only 17 opcodes; 0 through 15 are the ones you list, and 16 through 65535 is the LIT or CALL instruction (I'm not sure which).

- J
- Jacko
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 21, 2009 6:38 AM

That would be like saying you need the extra )s on the front of numbers when you do arithmetic on paper.

(Depends on the subroutine start address) all subroutines are calls, so they are all calls, just one is LIT.

Yes, you will find primitives use codes 0-15 and colon definitions use

0-65535. If you are crazy enough to have a massive primitive set, or to implement such a set in full width memory, then you would be right. On the 12 bit version you could use 16 bit memory, and have the high 4 as the primitive part of the address space.

As stated on the website (somewhare) this processor is not designed for running monolith inlined code, and pay in space and cache slowdown such things will, say yoda.

So in the example I gave for the store, it's likely the last line of simple instructions would be a subroutine named store or +1!

You will find a large amount of primitive code can be optimized into a small logic area, especially if the address space over which these subroutines is spread is sparse to allow combinational alignment of product terms and boolean logic reduction.

To just generalize this code as something to slot into the threading is missing the point that this is an ocassional feature, not a best practice.

cheers jacko

"speak unto my mobile I will, sometime it may be a programming tool."

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 21, 2009 2:13 PM

Extra what??? Actually, you are quoting your own comment.

I have no idea what you are talking about. How does this instruction set specify literals?

Your obfuscation is getting to be annoying. You never explain what you mean, you speak in crypto language and you seem intent on never really explaining the principles of your design. Even your assembly language is some new symbolism that just serves to isolate what you are doing and thinking rather than to be at all useful for communication.

None of the rest of this is at all useful. You are presuming that I am making some sort of statement or that I am looking at your processor from a very different point of view. I am doing neither. I am trying to understand your processor from the point of view of a small, embeddable CPU for use in an FPGA and in particular, to be programmable in Forth. That is the target of my CPU. I am hoping to learn something about these processors that I don't know or that I haven't thought to try. What I am learning about this design is that it seems to have been designed without regard to a lot of knowledge available, not that I will ever know for sure because it will never really be explained.

Have you read Koopman's book on stack CPUs? He covers a lot of ground with that.

Rick

- J
- Jacko
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 21, 2009 5:43 PM

Zero's in the Most Significant Digits. (shift ')')

a

ou

n

The instuction set does not specify literals, BUT it does specify enough variety of instuction to write a subroutine to load a literal. On calling this subroutine, the return address is on the R stack, and is loaded, used as an index to memory to fetch the literal with post increment. It is then save back to R stack to allow contiuation of the instruction stream on exit from the subroutine. The fetched literal in the accumulator is stacked onto the stack before return.

: LIT RI ( get return address ) FI ( get literal at return address post increase also ) RO ( save new return address =3D addr+1 ) SO ( stack the fetched literal ) BA ( exit from subrotine ) ; ( well a code def may be better )

: LITERAL ['] LIT , , ;

I am open to anyone developing a 'longwind' instruction mnemonic set, there is no ONE CORRECT way to symbolize function.

I

Without regard? So what would be the point of exploring the avenue of regard to convention? I understand it gets followed verbatum thoundsands of times on university degree programs, and it hasn't produced many major improvements in design.

I think I did download it once, and have a long base of reading stretching from '82.

So you would be suggesting "literal common, need literal fast", where as I would suggest "literal in opcode create bulk" not in the instuction decode execute sense, but in the instruction representation sense. Literals by there nature are not nibble constrained, where as code/end-code definitions can be.

Also say I have the subroutine for LIT at address $0101, there is nothing inherant in the design to prevent me adding a IR=3D$0101 comparator with (P)->S routing in a single fetch execute cycle. Such a design is still code compatable with nibz. Let's call this a hard wired subroutine option for purpose technique. It is not a default as it will make your code definitions bigger (not nibble wide without significant extra logic).

Usual comment number 1: "I can't write primitives without literals!" -

Usual comment Number 2: "But nibbles waste memory in cells!" -> "The zero cell area can be factored to leave an area to place nibz in, or some other nibble oriented compaction protocol can be used."

Usual comment number 3: "Why don't you do everything at once!" ->

"Because extra arms occupy more volume, and require more motor cortex, hence more volume, hence slower reaction time, hence no such thing is possible."

cheers jacko

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Mar 21, 2009 10:28 PM

This why people have no idea what you are talking about. How does ')' imply a zero???

Back to the point, the issue is not the notation that the opcodes from

0 to 15 have to have zeros in front, the issue is that your opcode is N bits wide, not 4! The point is that there are only 17 opcodes in your machine and each one uses a full word for storage. The logic in this CPU may be small, but the program storage is nothing like compact.

I would say this is also very, very inefficient. Try reading Koopman's book on stack computers. His data indicates that LIT is the second most frequent Forth operation (on average) in terms of occurrence in the code and sixth most frequent in terms of execution. Having to embed a literal in the code and then call a subroutine to get it put on the stack may be novel and interesting, but very inefficient.

Funny, often the goal of literal instructions is to optimize small magnitude literals since they are more frequent than larger magnitudes. This is especially true for relative addressing due to the locality in time and space for memory accesses. Your scheme of using those low magnitude literals for opcodes shoots that in the foot.

No, there is no one way to do something correctly. But there are an unbounded number of ways to do it poorly. The question is are you documenting it so others will understand it or just for your own amusement? If the former I think the response you are getting is telling you it isn't working. If the later, only you can judge.

I don't think you understand what I am saying. I am not suggesting that you need to repeat the designs of others. I am suggesting that you might learn something from their failures (or just suboptimal successes). I have found a number of design decisions that make you machine inherently limited and inefficient. If you care about that, you will read a few references to learn what others have found before you. Then you can improve on their work rather than to go off blindly and make all your own mistakes.

Of course this is assuming that making a useful design is your goal. I don't know this is your goal. You may well be doing this to amuse yourself only. The lack of any real communication regarding your design tends to indicate the latter.

I have no idea what you mean by any of this. What does "nibble constrained" mean? As to the trade off between bulk and "speed", how large is your code if you have to use some, what, four, five, six words to insert a literal in your code -1, for example, versus a command that inserts a literal using perhaps two words? Koopman's book says Literals are some 10% of the occurence of Forth words. That would mean your usage of Literal requires the code to be some 20 to

30% larger!

That is what I mean...

Ok, this sounds a bit like the ZPU. They have reserved a number of opcodes for "emulate" instructions which can be a subroutine or done with logic. Of course that will work. But it means you address space gets chopped up which is something to be avoided in a CPU addressing limited internal FPGA memory. It also is still not as efficient as other schemes using two words when often only one will do, or better less than one. Is it efficient to use 32 bits to insert a -1 in your code?

Again, I don't know what you are talking about.

This on the other hand is perfectly clear... not!

Do you really care if you make anyone understand what you are talking about?

I'm going to start calling your "Cryptoman"! Your one weakness is "Cryptonite", a substance that makes you communicate with perfect lucidity and brings about your ultimate destruction!

Rick

- J
- Jacko
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Mar 22, 2009 3:13 PM

It doesn't imply a zero but infering a zero as being on the same key is well ...

The subroutine threading is very compact, although not token threading level compact. I would assume an FPGA needing this compaction, would not use token threading, but would use an indirection table for all subroutine addresses and literal constants.

To imply the advanced branch as one opcode misses the point entirely. One semantic yes but definatly different codes.

As you point out if you choose not to optimize the section of memory containing the primitive 16 opcode subroutines then yes you will have some space occupied by zeros. In a typical small forth system, the primitive code requirements are small, and so I think your dislike is mis-respresentative of the compact size a system may be programmed in.

,

on

And having a (P)->(S) double memory access opcode as a primitive opcode within the basic suported set, leads to quite a miss-match to a single memory access per instruction architecture. A descision was made at design time to limit meory accesses per instruction for the simple reason of size of design, and scalabiltiy of multiple dispatch algorithms. I do not consider load literal to TOS in anyway primitive in this sense. Hardware efficiency for a particular task, can be implemented by subroutine hardwiring.

Relative addressing? This definitly does not scale as well as stack or direct addressing. Like I said, the instruction set is designed for scaling option. Forcing certain instructions into hardware as must haves, destroys scalability options.

y

Well maybe the sylabic mnemonic set was for my own ammusement, or though ideas, and as such I did it that way. If people feel/require/ need it to be another way, then they are able to do it that way themselves. I am not an opcode translation service. Certain things get done free. If you want a free addition to the project, by all means make a request. If you want it doing right, right now, by some 'my way' standard, do it yourself.

I

of view. =A0I am doing neither. =A0I

g to

at

And a new way of thinking of literals, and restrictions on such, and the results when applied to scaled multi-processor/super-scalar have been tried by who? Just because all mainstream designs have literal instructions for 'performance reasons' does not in any way imply 'performance' has been a closed research field.

Compared to many 8 bit designs this is both useful and effective.

und

Subroutine only written once. 2 cells per literal. -! obviously can be

1 cell if it also becomes threaded. in fact any literal used more than 3 times can be more effectively shrunk to 1 cell per literal. (a subroutine)

constrained =3D> made to fit within limits or bounds. (nibble constrained =3D 0 to 15).

e

ZPU emulate some essential instructions in hardware, yes kind of like, but all essential instructions are in hardware. Extra instructions have no fixed opcode, they have virtual subroutine addresses. So rather than soft instructions, why not have hard subroutines? From a microcode point of view, such things are closely related.

making people understand? If people understand the y do, if people do not understand the may eventaually, if the is such an important need to indoctrinate people, making may be an unsavoury procedure.

Really care? As opposed to virtually care? As in expressing a duty of care? Please elaborate using non circular arguments ...

This would imply I am self destructive, by me not communicating my need to have the Cryptonite removed, as I would see my destruction, as I would see everything of my instamachinations for the concept of constrained infinite lucidity to be elaborated within me and flush out my band limited mouth to provide enlightenment to everyone within ear shot. Your statement is inconsistant bleep bleep, BA stack underflow ......

- E
- Elizabeth D Rather
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Mar 22, 2009 6:12 PM

...

In other words, no. Advice to Rick: give up.

Cheers, Elizabeth

--
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Mar 22, 2009 6:13 PM

This is the sort of obfuscation that you seem to revel in. Why infer or imply a zero when a zero could have been typed???

Yes, subroutine threading is compact. But your use of an word wide opcode for *every* instruction is not compact.

If you think I am concerned with 16 words of memory lost to the opcodes, then you are confused about what I have said. There are two ways your instruction set is less than optimal. The encoding uses a full word for every instruction. Many MISC machines use opcodes of five bits. My machine uses opcodes of 8 or 9 bits depending on the implementation. Using 16 or 32 bits is very wasteful for the code using primitives. Even in higher level code a significant percentage of the codes is still primitives. The other inefficiency is the poor integration of literals into the instruction set. Needing to call a subroutine to load a literal is not an efficient use of memory or processor speed. Adding an optimization for direct implementation of the literal subroutine is still not an efficient use of memory, requiring two words for each literal.

My main point is that you seem to be making your design decisions without the benefit of the work that has gone on before you. I am sure your design has advantages, although I doubt anyone here will ever know because of your poor attempts to communicate.

I can't say anything about what is efficient in your machine. I do know that loading a literal is frequent in most CPU architectures and needs to be optimized over many other things. If you are designing a CPU for a large, complex CPU, then it will not be close to optimal for small machines.

Is your stack in memory and not hardware? Since no one but yourself understands your instruction set, I can't tell what is happening with your code.

Why would relative addressing not scale? It is just a simple index off the PC. But since you have indicated that your goal is to have a scalable instruction set, I can understand why this machine will not be at all optimal for FPGA use where program memory is tight.

No one gives a rat's rear what you *call* your instructions. No one understands what they do because you have not *documented* the instructions in a coherent way. I have no real interest in your project since I don't see any value in it. You seem to be trying to communicate to others here about your ideas and designs, but are failing to do so. That is the reason for my statements. I don't really care that much about understanding your project. I'm just pointing out that you seem to be failing in your goal.

Since the literal instruction is used so often, and most literals are small values, the literal instruction can be smaller than a word on the average. In some MISC machines the literal instruction uses whatever is left of the current word. In my machine the literal (also used to specify addresses for calls and jumps) is the most optimized instruction with one bit plus the data field in the remaining bits. In the 9 bit instruction format, +127 -128 range takes a single byte and a 16 bit literal only takes two 9 bit bytes. Jumps and calls are further optimized by including a 5 bit field for the lsbs of the address calculation.

There is nothing wrong with the way you are doing things. I thought you were optimizing for a small design for FPGA use. But I see now you have other priorities.

Yes, I see your point (for once). I wonder how useful this really is compared to more conventional

Yeah, I guess I just can't write clearly...

Exactly!!! It worked just as I planned... BUWWWWWHHAAAHHAAA!!!

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Mar 22, 2009 6:15 PM

On one hand I would like to understand his thinking. On the other hand I also need to spend time thinking for myself and this is being a time sink. I think I may have pressed too hard and he is seeing me as an antagonist. So maybe it is time to stop pressing.

Rick

- J
- Jacko
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Mar 22, 2009 7:10 PM

g

I don't think you pressed to hard. The rotation about the word care and circles of infinite decent, would have just been of topic. Scalability to interleave CPU with memory was a major, major goal of the design. The fact it can be a small thing, with only one core needed to do much embedded work is an offshoot, not a raison d'etre.

Test in truth, it would have been easy to crunch the thread earlier had ant's been in strong pain, but this was not necessary.

Did you like the BAr stack underflow joke?

cheers jacko