Advice to a newbie

Cecil Bayona · 2016-05-27T18:00:28+00:00

I a retired person with limited resources and I've always been interested in CPU design, so now that I have time I wanted to give it a go using FPGAs to design simple CPUs. I would like some advice on several topics, one being which HDL language to learn first, although Verilog seems simpler I've read articles that encourages beginners to start with VHDL, although it seems more wordy, I believe it will help eliminate more dumb mistakes so I'm leaning that way. What is your advice? Another is which logic families/companies make products that are inexpensive and have good features useful in creating CPUs and DSP type products as I also am interested in Software Defined Radios. I have purchase for a starter kit a Lattice Brevia2 kit as it was inexpensive and has low power consumption, so I assume will be easier to use in the beginning. It's a simple device with 5K LUTs which seems to be enough for some of the simple processors that I'm interested in to start with, CPUs such as the J1 for starters. From some time back I've have a Xilinx Spartan 3 Started kit that I have not used, I plan to use it later for more complex designs. Your opinions and advice is welcomed. -- Cecil - k5nwa

M

Mike Perkins 10 years ago

Just an observation, but RISC instruction sets, and I'm largely basing my assumption on ARM, generally requires a few 'fast' instructions to do anything useful.

If you want a single stack or pop of multiple instructions, then you would probably need a CISC CPU.

YMMV

Mike Perkins Video Solutions Ltd www.videosolutions.ltd.uk

Vote

R

rickman 10 years ago

The register stacking approach you describe is not much different from the TMS990 mini and TMS9900 micro computers. They didn't have general purpose registers on chip, rather they had a pointer into memory which defined the general registers. Subroutine calls could be done by saving the workspace pointer, status register and program counter in the new registers allowing the context switches in a very minimal amount of time. This was the BLWP instruction.

It was also possible to use the simpler BL instruction which did not change the workspace pointer and use other instructions to modify the workspace pointer as if it were a stack pointer. A bit slower than desired, but workable giving not just stacks, but stack frames from registers.

Of course the limitation of this approach is the speed of memory which started out not much slower than registers, but quickly became a speed burden. This has come full circle in FPGAs where internal memory is not significantly slower than registers.

Rick C

Vote

J

jim.brakefield 10 years ago

]> > Ages ago I had a notion about combining the advantages of register and ]> > stack machines, which was to call the region of 16 addresses around the ]> > stack "registers", and to have the processor automagically cache them on ]> > a context switch. The idea was that the code itself wouldn't have to ]> > specify registers to save on push and pop because the processor would do ]> > it automatically.

In the context of a FPGA high performance implementation (of a soft core processor), there seem to be two/three cases:

1) "small" embedded processor where stack requirements are known in advance so that LUT RAM can serve as a register file/stack(s), and the instruction processing adds offsets to two or more register pointers. Pops & pushes modify the register pointers. 2) Larger applications that need a larger stack(s). One can either spill and refill the register file from main memory, or one can use block RAM to hold the entire stack(s), main memory being more distant than the block RAM.

A third approach could be to have an associate cache of the block RAM stack(s) such that cache registers "automatically" spill and refill. Not sure of how to implement this efficiently on an FPGA?

Jim Brakefield

Vote

R

rickman 10 years ago

I'm not sure what you are trying to address here. "Large" applications can still be implemented on an FPGA if it is big enough. The larger FPGAs have enormous amounts of RAM on chip, as much as 10's of MBs. It would be a large application that needed more than that. Still, if you weren't using one of the really large chips you might not have enough on chip RAM for a general stack for a C programmed processor. But it would be a really small FPGA that didn't have enough RAM for a register stack.

When you talk about "spilling" the stack I think we are talking two different things. If your registers are in memory, "spilling" the stack is just a matter of changing the pointer. That's what they do in the TI processor. The only thing they do wrong is to load the register pointer from a fixed address rather than using an offset to the present value. Using this approach there is no need to copy data from registers to stack. Even if it is automatic it takes a long time to do all the memory accesses.

Rick C

Vote

T

Tim Wescott 10 years ago

From the ARM architecture v7m reference manual, POP instruction:

"Pop Multiple Registers loads a subset, or possibly all, of the general- purpose registers R0-R12 and the PC or the LR from the stack"

In Thumb, the instruction is 7 bits (1011110) followed by a nine-bit bitfield specifying which registers to pop. PUSH is similar.

Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com

Vote

R

Rick C. Hodgin 10 years ago

I went to Lattice's website and also bought a Brevia2 development kit. I was able to download their Diamond software and get a license.dat file, and I found that someone from the Meetup posted Ting's project files online:

formatting link

I have been able to get the project loaded, but I haven't gotten to the part where it synthesizes yet. Still going through the videos:

Morning session, a lot of ISA and architecture review:

formatting link

Afternoon session, development, design, and synthesis:

formatting link

DRuffer was able to get the Forth code to run as well, and he includes his working JEDEC file:

formatting link

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 10 years ago

I may be missing an obvious link, but if anybody knows where I can get the PPT files used in these presentations, please pot a link:

ep8080 architecture morning sessions: Feb.27.2016:

formatting link

Mar.26.2016:

formatting link

Apr.23.2016:

formatting link

Thank you in advance.

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 10 years ago

Also, if anyone has a block diagram or logical component layout of some kind, one which shows the internal components and how they are all hooked up through this ep8080 design, please post that info as well.

Best regards, Rick C. Hodgin

Vote

C

Cecil Bayona 10 years ago

I would also be interested in those items, there are several nice looking soft CPUs available for use with Forth , the common thread among is lack of documentation.

Cecil - k5nwa

Vote

R

rickman 10 years ago

The best way to learn about the structure of the ep8080 would be to draw a block diagram from the VHDL code. I looked at the code when I debugged the problem I found and it is not so complex. There are separate registers for the user accessible registers as well as the internal registers like the PSW. There is a process for the control signals enabling the registers and controlling the various other functions in the CPU such as multiplexers and carry bits, etc. There are the multiplexers and the other data path logic.

To draw the block diagram, I would follow the data path from the registers backwards to the sources. I believe you will find there is a small multiplexer on the input of each register and two larger muxes controlled by the source and destination fields of the instruction opcode. I can't say much about the rest, I didn't dig in to understand it all. Once you have mapped out the data path, you can trace the control flow through the control logic to understand how the opcode is interpreted.

Rick C

Vote

R

Rick C. Hodgin 10 years ago

That's not good advice for everyone. I have dyslexia, for example, and have a very difficult time comprehending written text. And since I don't know VHDL, I was hoping to go the other way from your suggestion, to be able to look at a block diagram and understand the connectivity in that design / drawing form, and then look at the VHDL code and teach myself its verbose form that way.

Best regards, Rick C. Hodgin

Vote

J

jim.brakefield 10 years ago

]>That's not good advice for everyone.

Am not the world's fastest or best VHDL coder, this is what I do:

Write a description of what you want to do. In the case of a soft core, include the instruction set & formats, rational, implementation decisions, ...

Do a spreadsheet with one or more rows for each instruction. Create columns for anything involved in the implementation Mnemonics, their binary, registers accessed, registers modified, calculations, ... For instructions with multiple clocks, either a set of columns for each clock or multiple rows.

Choose some naming scheme for the signals and registers

Write the VHDL. You can optimize it later (merging adders that have similar inputs optimizes well) And only write what you are ready to test (using a short program) Now prefer not to do a data flow diagram (tends to result in too many signal names) Track resource utilization as instructions are coded

Test the VHDL both in simulation and on the FPGA evaluation board.

Jim Brakefield

Vote

S

Simon 10 years ago

One thing I haven't seen mentioned is Jan Gray's articles on the xr16 he de signed, implemented, and wrote a C compiler for. It's not the most up-to-da te design, in that it's designed for a (by modern standards) very old proce ssor, but it's a worked-example of designing a cpu, ultimately an SOC, and even the software development environment for it. It's written in verilog.

Start at

formatting link

and look for the verilog ver sion (IIRC, the original was a schematic design).

Cheers Simon

Vote

R

rickman 10 years ago

You forgot timing analysis. I prepared a presentation on test benches for Dr. Ting's workshop, but I wish I had included a mention of static timing analysis. I believe Ting talked another time about the design not running at 50 MHz as he had hoped but ran at 25. This is an issue that could be explored by a static timing analysis most efficiently. Trying to analyze timing paths by post route simulation is very labor intensive. Trying to debug anything in a real chip is even harder and should not be done until simulation and static timing analysis have wrung out the design as much as possible. That is the background that introduces the need for good test benches.

Rick C

Vote

C

Cecil Bayona 10 years ago

When I setup the project and compiled it, it gave warning on issues with the clock possibly being delayed in some sections of the circuit, so it's likely that he has timing issues.

Cecil - k5nwa

Vote

R

rickman 10 years ago

I looked and don't see any real problems with the clock circuit. In VHDL a clock buffered results in a delta delay which can mess up a simulation. It won't hurt a real circuit since the buffer won't be implemented in logic. It can *really* mess up a simulation though.

There is also an inversion of the clock which I don't understand, but again, it likely also won't be implemented in an FPGA since they typically have hardware to select the clock edge.

Rick C

Vote

R

Rick C. Hodgin 10 years ago

Thank you for your input, Jim. It's appreciated.

It's hard for me to operate in a theater which uses a lot of words. I can do it, but it takes a lot of effort and is very mentally taxing. I also make a lot of mistakes in reading (and subsequent comprehension) like that. It's really quite amazing sometimes what I read compared to what's really there. Sometimes they are completely separate meanings.

On my designs, I often go into a spreadsheet or GIMP and begin the design with separate components (using shapes, color blocks, outlines, etc.), which provide visual cues (rather than words) for that reason. My brain can isolate and identify the separate components much better that way. It winds up being much easier for me to understand things in images and their related diagrams than in those which just have words.

Some diagrams are also confusing though. Typically it's those being mostly words, or words in certain fonts (depending on how they were created), but not all of them. And even then they're usually better in some ways at least.

----- I'll track it down. If it doesn't exist, I may skip the ep8080 and go to another CPU ... possibly the 6502 as it's had a lot of reconstruction to create its entire gate layout:

formatting link

And they have a high-speed C gate simulator which is a soft 6502 that runs at about 1/4 speed on an 8-core machine:

formatting link

I may also just start with my Oppie-1 design:

formatting link

It's a very simple core with discrete stages which all operate in a single clock cycle, so it's very straight-forward.

----- I do like the Lattice Diamond software. It's awesome actually. Very fast. Nice simulator. I prefer it to Altera's Quartus II so far, but that may only be because I don't yet know how to use Quartus II well enough.

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 10 years ago

I received the Lattice board today. I'll begin working on something this weekend. Looking forward to getting a basic circuit to cycle through the on-board LEDs when on-board buttons are clicked.

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 10 years ago

Received:

formatting link

I am wanting to apply logic to this LED process, but I'm thinking there may be some analog issues that I need to consider. For example, when a button on the board is clicked, I assume there is some jitter time, such that if it were sampled at a MHz frequency it would record jittery on/off signals for a ms or two until the contact was made solid, and the same for releasing.

As such, any logic which samples the buttons, for example, must include things like identifying the first high signal, and then either sampling the high/low ratio over periods of time to determine if it's still high or low, and then using that value after the sampling period has expired, or wait until the high signal persists solidly for something like 10ms, and then consider that to be a single press event, and then wait for it to go low again for something like 10ms before concluding it is actually a release event.

Sound about right? Or, do boards like this automatically handle that for you so you have a direct digital input that has already sampled out those peculiarities in some way, so you have a solid press and release events when they switch from high to low, and low to high?

----- Also, I am thinking about hooking up a speaker to make sound. I am thinking there is no way to control volume using this method, but only the frequency, short of signaling out through multiple sets of pins which are wired to a single speaker, such that pins 0 are full voltage, pins 1 are slightly resisted, pins 2 are slightly more resisted, pins 3 are heavily resisted, and pins 4 are almost completely resisted, such that whenever a volume setting is made, it drives pins 0,1,2,3,4 or none, as needed.

Sound about right?

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 10 years ago

Someone named "Andy Bennet" sent me an email with an invalid return email address wherein "he" told me to go with God, as God is the knower of all things. My reply was: "LOL! That's about where I'm at because I doubt I'll get any help from the group."

We'll see. If not, it's okay. All is forgiven.

Best regards, Rick C. Hodgin

Vote

Advice to a newbie

Join the Discussion

Didn't find your answer?