PacoBlaze with multiply and 16-bit add/sub instructions

Hello people.

As I announced some days ago, I updated the PacoBlaze3 core

formatting link
now with a wide ALU that supports an 8x8 multiply instruction ('mul') and 16-bit add/sub operations ('addw', 'addwcy', 'subw', 'subwcy'). The new extension core is called PacoBlaze3M. It could be useful performing small DSP functions and math subroutines when there is a spare hardware multiplier block.

The implementation scheme modifies the PicoBlaze register model dividing it in odd/even (high/low) sections with a multiplexing layer.

16-bit writes are performed on both odd/even registers. The multiply operation accepts any two arbitrary registers and the wide add/sub instructions operate on contiguous 16-bit "extended" registers.

Eg: (KCAsm code)

---8

Reply to
Pablo Bleyer Kocik
Loading thread data ...

Sounds impressive. You have seen the AS Assembler, and the Mico8 from Lattice ?

FWIR the Mioo8 is very similar to PicoBlaze ( as expected, both are tiny FPGA targeted CPUs ), but I think with a larger jump and call reach (but simpler RET options). If you are loading on features, the call-lengths might need attention ?

Have you tried targeting this to a lattice device ?

-jg

Reply to
Jim Granville

Yes, I am very much aware of Mico8 and I have used AS in several projects in the past. I know that it supports PicoBlaze (and Mico8 now). But what I want to do now is a small version of a language like HLA or terse for PicoBlaze. Something simple and readable that is easy to modify like the current KCAsm (hey, adding the mul and add/sub instructions took less than one minute. ;o)

Here is what sarKCAsm is currently looking like (currently a JavaCC implementation, but I am swapping to ANTLR now because it has better support for trees).

---8 tiny FPGA targeted CPUs ), but I think with a larger jump and call reach

For now the limits of the PicoBlaze model have been within my needs (IIRC, mico8 has the same 10-bit jumps/calls as PB3 and it is very isomorphic to it). My main drive to create PacoBlaze was to get the most versatile processor that I could use as a peripheral controller in my projects (eg motor control, bus controller, PWM generator, audio co-processor, specifically in the JBRD of my Javabotics project,

formatting link
It isn't difficult to extend the memory model of PicoBlaze using PacoBlaze, though.

Not yet. I plan to synthesize the core using different tools that I may have access to, but that is not in my list of priorities.

Cheers.

-- /"Naturally, there's got to be some PabloBleyerKocik / limit, for I don't expect to live pablo / forever, but I do intend to hang on @bleyer.org / as long as possible." -- Isaac Asimov

Reply to
Pablo Bleyer Kocik

Cool, though I have not had had time to even get 2.0 running yet.. ( life got in the way of fun stuff )

Reply to
ziggy

I realised that; - just checking you knew of them :)

Good targets.

Will you also do boolean (Flag) functions ?

General comments: ( feel free to ignore... )

The expression clarity makes good sense, and I also like languages that can accept flexible constants: viz $55 or 0x55 or 55H, or 2#01010101 or

16#55, or 2#01_0101_01.

I've also seen XOR AND OR NOT etc keywords supported, as well as the terse C equivalents. ( which are a real throwback to when source size mattered ).

but I'm not sure about labels in the left most code-column - that makes code harder to scan, and indent etc, and not as clear in a syntax highighted editor....

ie If you have to add a comment, then the language is probably not clear enough....

# for return ? => why? - why not return, or RET or IFnZ RET label then condition ? => most languages are IF_Z THEN or if_nZ DestAddr Label for Loop jmp ? => REPEAT Label, or LOOP label

If a 12yr old kid can read the source, and not need a raft of prior knowledge, then that's a good test of any language :)

-jg

Reply to
Jim Granville

I think I recall the Mico8 had more obvious expansion space in the opcodes - but either way, this is the sort of expansion that is nice to allow for early-on.

With more smarts, users _are_ going to need larger address space :)

The assembler should accept either size, and warn on the smaller/larger ceiling, based on a target/build family define.

-jg

Reply to
Jim Granville

I think you'll need to code dummy bits at the middle and top of the adder to pull out the carries.

Here are some old posts with structural (Xilinx) and RTL versions:

formatting link
formatting link
formatting link

IIRC, using two dummy bits at the top ( '0' & copy_of_sign_bit ) makes coding synthesizable RTL signed/unsigned carry/borrow/overflow flags easy to implement, but quickly googling didn't turn up the post that I recall which explained that technique.

Brian

Reply to
Brian Davis

Thanks for the pointers. I will try that.

Cheers.

Reply to
Pablo Bleyer Kocik

I couldn't turn up that other post that I recalled, but I dug up a code snippet of the conditional signed skips of my own homebrew processor. ( no mid-chain split, but overflow logic coded with pad bits )

Basically, the copy of the MSB input bits at bit position MSB+1 lets you indirectly look for a difference in the carries into and out of the MSB position in the inferred RTL adder.

gen_sgbt: if CFG_SKIP_GROUP_B = TRUE generate

skip_b: block signal wide_diff : std_logic_vector( ALU_MSB+2 downto 0); signal pad_ar : std_logic_vector( ALU_MSB+2 downto 0); signal pad_br : std_logic_vector( ALU_MSB+2 downto 0);

begin pad_ar

Reply to
Brian Davis

Brian, with the following exploded setup I could finally instruct ISE to merge two 8-bit adders to create a 16-bit one and multiplex out the carry to get the half-carry. I don't know why my previous setups failed... It saves 4 slices in a SP3 instead of having two separate adders (the output mux is not considered) and the report indeed shows that the fanout of the half MUXCY is 2.

Regards.

---8out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUF:I->O 19 0.715 1.403 op_IBUF (op_IBUF) LUT2:I1->O 1 0.479 0.976 bl1 (bl) LUT2:I0->O 1 0.479 0.000 addsub1_yllut (N4) MUXCY:S->O 1 0.435 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_ylcy (addsub1_yl_cyo) MUXCY:CI->O 2 0.056 0.000 addsub1_ylcy (d) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.056 0.000 addsub1_yhcy (addsub1_yh_cyo) MUXCY:CI->O 1 0.265 0.976 addsub1_yhcy (e) LUT2:I0->O 1 0.479 0.681 c_out1 (c_out_OBUF) OBUF:I->O 4.909 c_out_OBUF (c_out) ---------------------------------------- Total 12.573ns (8.538ns logic, 4.035ns route) (67.9% logic, 32.1% route)

*/

/* Two separate adders */ module addsub2( op, oc, y, yl, a, b, c_in, c_out, h_out );

input op, oc; // 0: add, 1: sub output [`WIDTH-1:0] y; input [`WIDTH-1:0] a, b; input c_in; output c_out; output h_out;

output [`WIDTH/2-1:0] yl;

wire [`WIDTH/2-1:0] al = a[`WIDTH/2-1:0]; wire [`WIDTH-1:0] bs; wire [`WIDTH/2-1:0] bl; wire c = (!oc) ? 0 : (op) ? ~c_in : c_in; wire d, e;

assign bl = (op) ? ~b[`WIDTH/2-1:0] : b[`WIDTH/2-1:0]; assign bs = (op) ? ~b : b;

assign {d, yl} = al + bl + c; assign {e, y} = a + bs + c;

assign h_out = (op) ? ~d : d; assign c_out = (op) ? ~e : e;

endmodule

/

  • =========================================================================

  • HDL Synthesis
  • =========================================================================

Synthesizing Unit . Related source file is "C:/src/pacoblaze/pacoblaze/addsub.v". Found 16-bit adder carry in/out for signal . Found 8-bit adder carry in/out for signal . Found 1-bit 4-to-1 multiplexer for signal . Summary: inferred 2 Adder/Subtractor(s). inferred 1 Multiplexer(s). Unit synthesized.

========================================================================= HDL Synthesis Report

Macro Statistics # Adders/Subtractors : 2 16-bit adder carry in/out : 1 8-bit adder carry in/out : 1 # Multiplexers : 1 1-bit 4-to-1 multiplexer : 1

=========================================================================

=========================================================================

  • Advanced HDL Synthesis
  • =========================================================================

========================================================================= Advanced HDL Synthesis Report

Macro Statistics # Adders/Subtractors : 2 16-bit adder carry in/out : 1 8-bit adder carry in/out : 1 # Multiplexers : 1 1-bit 4-to-1 multiplexer : 1

=========================================================================

=========================================================================

  • Low Level Synthesis
  • =========================================================================
Loading device for application Rf_Device from file '3s200.nph' in environment C:\Xilinx.

Optimizing unit ...

Mapping all equations... Building and optimizing final netlist ... Found area constraint ratio of 100 (+ 5) on block addsub2, actual ratio is 1.

=========================================================================

  • Final Report
  • =========================================================================
Final Results RTL Top Level Output File Name : addsub2.ngr Top Level Output File Name : addsub2 Output Format : NGC Optimization Goal : Speed Keep Hierarchy : NO

Design Statistics # IOs : 61

Cell Usage : # BELS : 91 # LUT2 : 34 # LUT3 : 9 # MUXCY : 24 # XORCY : 24 # IO Buffers : 61 # IBUF : 35 # OBUF : 26 =========================================================================

Device utilization summary:

---------------------------

Selected Device : 3s200pq208-5

Number of Slices: 23 out of 1920 1% Number of 4 input LUTs: 43 out of 3840 1% Number of bonded IOBs: 61 out of 141 43%

========================================================================= TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE.

Clock Information:

------------------ No clock signals found in this design

Timing Summary:

--------------- Speed Grade: -5

Minimum period: No path found Minimum input arrival time before clock: No path found Maximum output required time after clock: No path found Maximum combinational path delay: 12.955ns

Timing Detail:

-------------- All values displayed in nanoseconds (ns)

========================================================================= Timing constraint: Default path analysis Total number of paths / destination ports: 1012 / 26

------------------------------------------------------------------------- Delay: 12.955ns (Levels of Logic = 21) Source: op (PAD) Destination: c_out (PAD)

Data Path: op to c_out Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUF:I->O 27 0.715 1.721 op_IBUF (op_IBUF) LUT2:I1->O 2 0.479 1.040 bs1 (bs) LUT2:I0->O 1 0.479 0.000 addsub2_ylut (N4) MUXCY:S->O 1 0.435 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.056 0.000 addsub2_ycy (addsub2_y_cyo) MUXCY:CI->O 1 0.265 0.976 addsub2_ycy (e) LUT2:I0->O 1 0.479 0.681 c_out1 (c_out_OBUF) OBUF:I->O 4.909 c_out_OBUF (c_out) ---------------------------------------- Total 12.955ns (8.538ns logic, 4.418ns route) (65.9% logic, 34.1% route)

*/

-- PabloBleyerKocik / pablo /"It is a terrible thing to see and have no vision." @bleyer.org / -- Helen Keller

Reply to
Pablo Bleyer Kocik

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.