JOP on Spartan-3 Starter Kit

Hi Martin,

The easiest way to use the DSE is to go the project directory, and type

  1. quartus_sh --dse

If you want help on this feature type

  1. quartus_sh --help=dse

Please make sure that the quartus\bin is in your path.

Additional information is available in the Quartus handbook at

formatting link

Here is the header from the help when you type in quartus_sh --help=dse

THE ALTERA DESIGN SPACE EXPLORER (DSE)

The Design Space Explorer (DSE) is a tool for exploring the

complex flow parameters in the Quartus(R) II software. DSE takes

the "guess work" out of selecting parameter values and exposes

the optimal Quartus II software settings for a design.

VERSION

2.1

SYNOPSIS

Usage: quartus_sh --dse [options]

Options:

-nogui

-project

-revision

-seeds

-llr-restructuring

-exploration-space

-optimization-goal

-search-method

-custom-file

-stop-after-gain

-stop-after-time

-ignore-failed-base

-archive

-run-assembler

-slaves

-use-lsf

-slack-column

-help

Note: To use DSE in command-line mode, specify the "-nogui"

option. If you do not specify this option, the DSE graphical

user interface (GUI) starts, regardless of the other

command-line options used.

EXAMPLES

quartus_sh --dse

This command launches the DSE GUI.

quartus_sh --dse -nogui -project main

This command starts a default command-line exploration. The

default seeds are used along with the default exploration

space, optimization goal, and search method.

quartus_sh --dse -nogui -project main -seeds 2,4,8-10

-exploration-space "Extra Effort Search"

This command starts a command-line exploration of an

"Extra Effort Search" space using the seeds 2, 4, 8

through 10, the default optimization goal, and the default

search method.

.............................

Hope this helps.

Subroto Datta Altera Corp.

Reply to
Subroto Datta
Loading thread data ...

Thanks! (so I guess similar 8-bit processors should be on its limits.)

Here follows more newbie-questions.

As the Spartan-3 Starter Kit comes with only sixty days evaluation version of "ISE Foundation", and not time-limited version of "ISE WebPACK", I listed their differences from

formatting link

and realized that WebPACK is lacking at least these features that come with the Foundation:

CORE Generator System Modular Design FPGA Editor with Probe SMARTModels for PowerPC and RocketIO.

What are these, and how essential they are if I (eventually/immediately) want to do my own designs?

Also, does WebPACK support both VHDL and Verilog fully?

Anybody knows if there there plans to port it (WebPACK) to Linux or other Unix-systems, and when? (And similar question for Altera's Quartus-II software.)

Yours,

Antti

Reply to
Antti Karttunen (remove the trailing .do from the address)

Not very essential. You can do a lot with the free versions of ISE and Quartus. All work I've done so far also compiles on the free Quartus version (I have little experience on the ISE). E.g. You don't need CoreGen to use the BRAM in the devices. It makes it simpler, but you can instantiate these blocks with straight VHDL. See as an example JOP that compiles from the plain VHDL sources on the free versions of Xilinxs ISE and Alteras Quartus.

Martin

---------------------------------------------- JOP - a Java Processor core for FPGAs:

formatting link

Reply to
Martin Schoeberl

[snip]

vs.

that's

designs.

out-performs

formatting link

Danger, Danger, Will Robinson, my B.S. sensors have detected significant marketing content. :-)

As made famous by Philip Freidin

formatting link
"There are four kinds of lies.

  1. Lies
  2. Damn lies
  3. Statistics
  4. Benchmarks
Reply to
Steven K. Knapp

Thank you. The results were about what I expected.

[snip]

Be careful with this kind of analysis. Yes, it's helpful from an academic point. However, the Spartan-3 XC3S1000 offers a sweet spot on cost per logic cell. Does this mean that everyone should use the XC3S1000 when a smaller part will do? No, you want to choose the lowest cost part that gets the job done.

BTW, nice job on the Java processor! Very cool.

--------------------------------- Steven K. Knapp Applications Manager, Xilinx Inc. General Products Division Spartan-3/II/IIE FPGAs

formatting link

--------------------------------- Spartan-3: Make it Your ASIC

Reply to
Steven K. Knapp

There are two reasons behind the price difference between the standard Sn/Pb packages and the Pb-free packages.

The number one price difference is due to volume. I don't have specific data at my fingertips but the Sn/Pb packages are produced in significantly larger quantities. In the semiconductor business, bigger volumes mean lower unit cost. The Pb-free packages are a relatively recent addition. I would expect that prices will drop to a certain extent as these packages become more the norm.

The second reason is content. Sn and Pb are inexpensive. The Ag and Cu in the Pb-free alloy is more expensive.

For more information, check out the following Xilinx application note. Note that the standard and Pb-free packages may require different assembly techniques.

XAPP427: Implementation and Solder Reflow Guidelines for Pb-Free Packages

formatting link

Also, just FYI for those that care, all Spartan-3 devices are available in Pb-free versions of all supported package types.

--------------------------------- Steven K. Knapp Applications Manager, Xilinx Inc. General Products Division Spartan-3/II/IIE FPGAs

formatting link

--------------------------------- Spartan-3: Make it Your ASIC

Reply to
Steven K. Knapp

I thought one of the targets of PbFree was to try and get a package/alloy solution that could be used in either flow ( and so the Pbfree would be phased in, to replace the older ones )

Is that turning out to not be practical ?

Also IIRC 'PbFree' actually means < 0.1% by weight ?

-jg

Reply to
Jim Granville

Hi Steve,

I have presented some of this (or similar) data at an academic conference. I would not have done so if I could not vouch for its veracity. Of course, I would not be talking about it if it were not good news, so in that sense it's marketing :-)

I agree that benchmarking results can be made to show what you would like. And in our results, we too see a few poor performers. Honestly, in the end averages and sweeps mean nothing to an end user; all they care about is that for their design, they get the performance they want/need. That's why the best thing a user can do is run Cyclone, and run Spartan-3, using the freely available tools. I would love nothing more than for every person considering Spartan out there to do this -- I firmly believe that 9/10 will be very happy with the results.

Yes. As I have previously posted, this design appears to fall into a known category of designs that exhibit a much smaller performance advantage; and still it seems to exhibit a ~15% advantage. I'll also take a deeper look at the design when I'm back at the office later in the week to make sure nothing weird is going on in Quartus -- always good to understand our failures.

BTW, the numbers from

formatting link
I believe are most relevant to users are the "best effort" DSE vs. Iteration at 70% fastest-to-fastest; I'd remove the seed-sweep aspect of DSE and instead include only the Physical Synthesis Optimizations which give most of the DSE gain. This should still end up at around a ~65% advantage for Cyclone vs. Spartan-3.

Do you have a faster speed grade? Are you implying you could choose to release and reliably yield one? Until then, I fail to see how this argument is relevant.

Happy Marketeering :-)

Paul Leventis Altera Corp.

Reply to
Paul Leventis at home

Steven,

Thanks :-)

However, Xilinx has its own Java processor, the Lightfoot:

formatting link
ey=Lightfoot

Is it possible to get an evaluation license? You know what I want to do: Produce new lies.... I mean benchmark it against JOP ;-) Or has somebody this processor running in an FPGA and can provide results for an embedded Java benchmark? downloadable from:

formatting link

Martin

---------------------------------------------- JOP - a Java Processor core for FPGAs:

formatting link

Reply to
Martin Schoeberl

^^^ You mean nice jop ;-)

This thread turns out into a contest to build the fastest JOP version. Choose an FPGA vendor of your choice and optimize HDL and tool settings. Maybe Martin should donate one of his boards to the winner ;-)

I allready submitted patches to martin to scrap 300 LUTs from the MUL and ALU. Paul removed another 200 by improved synthesis settings. At that rate the processor will be very small very soon ;-)

Kolja Sulimma

Reply to
Kolja Sulimma

OK, that's a good idea! Here's the contest in two categories: The smallest JOP in LC/LE count. The fastest JOP in turn of fmax.

Both versions must run the embedded benchmark to show that the processor is still working (I can verify this for the Cyclone and Spartan-3. Target devices are the low-cost FPGAs Cyclone and Spartan. You can change the pipeline to achieve a higher famx, but the benchmaks must still run AND be faster than with the original pipeline. However, I would avoid the pipeline change.

The prices: An ACEX 1K50 board

Yes, thanks. I think now it's the time to incorporate your changes :-)

Martin

Reply to
Martin Schoeberl

That's fun!

You might want to add the constraint that no instruction might be removed from the core even when they are not used by the benchmarks.

Kolja Sulimma

Reply to
Kolja Sulimma

version.

This constraint is more or less implicit: You're not allowed to remove any functionality of JOP and it does not make sense to optimize the IO system (like reducing the buffer size of the UART). Only the CPU core is the optimizing target.

And don't be too impressed by the saving (the 300 LCs) Kolja announced ;-) They're good, but broke JOP. I'm actually in the process of getting a version that works. He saved some LCs but not that many. I will not tell you the exact count, but if you can save 100 you're competing with Kolja... The chance to win the competition is still open.

To Kolja: I will send you the results with comments tomorrow, when I'm through all tests.

BTW: Let's set a deadline for the contest: I will accept all suggestions till next Friday (10/15) so I've time over the weekend to verify the results. The winner will be posted on Monday (10/18).

Martin

---------------------------------------------- JOP - a Java Processor core for FPGAs:

formatting link

Reply to
Martin Schoeberl

At the moment I'm playing around with Kolja's changes and Pauls Quartus suggestions. Setting the PLL factor to different values (to relax the timing constraints for smaller area) gives some surprising Fmax results from the timing analyzer: fin is always 20 MHz, LC count stays the same.

PLLout = 50 MHz => Fmax = 131 MHz PLLout = 100 MHz => Fmax = 100 MHz PLLout = 120 MHz => Fmax = 101 MHz

The first line is really strange!

Martin

--
----------------------------------------------
JOP - a Java Processor core for FPGAs:
http://www.jopdesign.com/
Reply to
Martin Schoeberl

Have you tried these files in real systems ? The timing numbers (presume after P&R?) will be corner values, but you should be able to overclock until it fails, and then get usefull RELATIVE speed limit comparisons.

-jg

Reply to
Jim Granville

I've worked with the 100 MHz setting, but I don't want to overclock the FPGA. I just want to get 'safe' values. I don't understand the different Fmax reports when only the PLL multiplication factor changes. As there is no 'external' fmax constraint the way I want to go is: Obtain fmax from one compilation (with any PLL setting) and than set the PLL factor to this limit. With the above results it seems not so easy.

Martin

Reply to
Martin Schoeberl

Understood. I was giving a suggestion for a way to check which numbers are correct - it does sound like something is in error. Reminds me of the saying "A man with one watch always knows that time it is, A man with two is never sure" :)

-jg

Reply to
Jim Granville

And the WINNER of the ACEX FPGA board is: Kolja!

He changed the multiplier and suggested changes in the ALU (stack.vhd). These two changes with a little bit of optimization by myself resulted in a saving of 136 LCs (with default synthesizer options). The suggestion from Paul for the Quartus settings reduced the area by another

130 LCs or 184 LCs (minimize Area). However, Koljas VHDL changes reduce the area in the Cyclone and Spartan-3 version of JOP.

To Paul: I hope you can accept this decision. And it makes more sense to send an ACEX board to a Xilinx user than to an Altera employee ;-)

To Kolja: Please drop me a note with your address.

The results of JOP on Cyclone and Spartan-3 (both fastest speed grade):

Cyclone, opt. for speed: 1800 LCs, fmax: 100MHz Cyclone, opt. for area: 1746 LCs, fmax: 98MHz Spartan-3, opt. for speed: 1844 LCs, fmax: 83MHz Spartan-3, opt. for area: 1689 LCs, fmax: 74MHz

If you need a very small JOP core you can implement the multiplier and the barrel shifter in software. Without the uart and the timer this results in

1077 LCs (at 98MHz) in the Cyclone.

Martin

PS.: The optimized versions of JOP are uploaded on the website.

---------------------------------------------- JOP - a Java Processor core for FPGAs:

formatting link

Reply to
Martin Schoeberl

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.