Compiler to FPSLIC

Hi everyone,

I'm considering making a compiler for Atmels FPSLIC (combined microcontroller with FPGA). The idea is to mark expensive funktions, so they can be implemented in the FPGA instead of normal machine code. I have experience with microcontrollers and construction of compilers, but I have only made very small test-projects with a very old FPGA and really buggy software.

Is this possible to make such a compiler? Does it make sense at all to compile high-level language to a FPGA in this way? Has someone else made it/is this normal procedure today ?

Regards, Niels Sandmann

Reply to
Niels Sandmann
Loading thread data ...

Remember, that transferring operators to special purpose computing blocks takes some time. The gain of speed with the function block must be big enough to make it worth.

It is possible - if also usable, I don't know.

There is some research about Field Programmable Function Arrays. There compiler are used to analyze C code and map this to configurable function blocks in the FPFA.

Why FPFAs and not normal FPGAs? Its easier to map a software algorithm to a known block of hardware, where only "little" configuration is possible (add or sub or none, multiply or not, intermediate register storage or not...).

Mapping a software algorithm to FPGAs is possible with languages like SystemC.

I am not an expert in this area. I just want to give you some words, where your search may start.

Ralf

Reply to
Ralf Hildebrandt

I'm curious: why have you chosen Atmel's FPSLIC?

- a

Niels Sandmann writes:

--
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380
Reply to
Adam Megacz

There's one for nios:

formatting link

-- Mike Treseler

Reply to
Mike Treseler

Boy, that would be interesting. There may be part of a doctorate in there if you work it right.

I think this is possible. I think the shortest road to success in the confines of one file would be to make a preprocessor that would extract the FPGA stuff to a C language design file which would then go to one of those nifty new C language synthesis tools.

I suspect, though, that this would cause problems with concurrency. Either you would have concurrency problems in the function call, or in making the processor wait on the FPGA and visa versa. If you _really_ wanted to do this in one language you could investigate the possibility of reviving Occam*.

All in all, however, I think the best thing to do is find a good way that you can treat what the FPGA does as a concurrent process within C or C++. This would mean finding a way, within C, of 'telling' the compiler to write something to the FPGA and kick off a computation, then 'telling' the compiler to retrieve it, but leaving it to the software and FPGA designers to actually synchronize the FPGA with the processor.

  • Search for 'transputer'. The best quote I heard about Occam was "Oakum (Occam) is something you use to keep boats from leaking -- not a computer language".
--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

Sometimes an 'expensive' item might be an operator. ie Maths libraries, and their support is one productive area that does not need armloads of new software.

almost anything is possible ...

depends a lot on the FPGA, and the project, ( and the designer..) [ It is also a dangerous tool in the wrong hands....]

Someone has mentioned Altera's new C flow ( not cheap )

For good examples of FPGA centric work on other languages, look at

formatting link

formatting link

and for an example of a smaller FPGA-Core, and what can be done in core-extension, look at this carefully before commiting to the FpSLIC ( which is rather a dead-end pathway ).

formatting link

A google on Python AVR ( if you MUST use the FpSLIC ) finds quite a lot, including

formatting link

formatting link

-jg

Reply to
Jim Granville

Because I thought the integration af af Microncontroller and FPGA would be handy, and ease communication between the program and the FPGA.

I have prior experience with microcontrollers with Atmels AVR-core, and I have been pleased with them. Also, they are reasonable cheap.

Regards, Niels Sandmann

Reply to
Niels Sandmann

Of course. It would be necessary with a good profiler.

That sounds interesting. At first sight it seems to target simulation of system more than actual implementation, but I might be wrong.

I'm glad for any input and ideas I can get.

Regards Niels Sandmann

Reply to
Niels Sandmann

why ?

Reply to
Niels Sandmann

why ?

Do you mean their C2H-compiler for the Nios II as Mike Teseler mentioned in another post ?

They are interesting but not exactly what I'm looking for.

I'm not committed to the FpSLIC in any way. It was just chosen due to my prior experience with AVR.

Reply to
Niels Sandmann

That is exactly what I'm looking for. It proofs that it can be done, so I'll look further in to it. Thanks.

Best regards Niels Sandmann

Reply to
Niels Sandmann

Actually it is supposed to be a part of my master thesis. As the link Mike Treseler gave it can be done, so I don't think it is that revolutionary.

But it wouldn't be as elegant as an integrated solution.

Perhaps. I still don't know which language I will use. Probably I'll design my own pascal-like language. So I could design the language to take care of these problems.

Best regards Niels Sandmann

Reply to
Niels Sandmann

If you're going to go as far as looking into different languages then look into Occam. I haven't had to use it, but as prior art goes it is Not To Be Ignored. Perhaps noted and neglected, but not _ignored_.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

yes.

If you are prepared to contemplate your own language, then you should study these. The ability to write test benches should not be overlooked.

I guess it is OK to learn on, but the FpSLIC has many drawbacks. The biggest brick wall, is it needs a problem both large enough to need a FPGA (but not _too_ large an FPGA!), and still small enough to fit into the small AVR RAM-code space.

-jg

Reply to
Jim Granville

Simply put, because hardware design is not software design.

-jg

Reply to
Jim Granville

If your clean slate includes 'fresh langauge', then look at the Structured Text side of IEC 61131 - or, all of IEC 61131.

That covers the language(s) set used for Programmable Logic Controllers, so it has a firm basis in the real world of control, and also has high level structure support.

-jg

Reply to
Jim Granville

Hi Niels,

I think you are starting a difficult project.

In the past I've done some work with the FPSLIC and found that manual merging of hardware with software works best. Since the hardware is accessable as banked I/O registers only, there is a lot of setup overhead involved (especially if you support concurrency). You can't just combine 3 AVR instructions into one and add it as a magic instruction set extension.

I achieved actual benefit of the FPGA area by implementing functions with FPGA helper circuit. Initially I created a software-only function for reference. Then I took dissected the inner loop and implemented parts of it as VHDL code, and modified the software to use it. Benchmarks with the software-only implementation helped me decide how much FPGA area to dedicate to acceleration of the function (and also for testing).

I developped a concept to handle concurrency, by dividing the FPGA modules into two groups. One is accessable from interrupts, while the other one goes through a semaphore access control. This enabled me to reduce the state to 2 or 3 bytes, which is quick to save/restore on context switches.

Note that this was possible only because the FPGA circuits are not randomly called (as in a compiler environment), but manually designed towards a handcrafted assembler code library. Otherwise, solving the concurrency issues would probably have voided all the performance gains.

Another problem I encountered was that the PAR wasn't reliable. I frequently got bad bitstreams when the device utilization was high. The problem was probably related to the fact that the PAR tools don't address short paths (hold violations). I wrote extensive self-test software to overcome this problem. You might run into the same thing, if you rely on Atmels tools as backend for your compiler.

I suggest you should try to implement a few real-world problems on the FPSLIC to learn about all the issues, before you do the compiler.

Regards, Marc

Reply to
jetmarc

Marc, could you elaborate on this? Typically hold violations are the result of a path being too *long* (and, as a result, the signal has not propagated and held by the time the clock edge arrives).

Have Atmel's tools PARed your designs in such a way that some paths were too long for the clock rate you requested?

I'd be quite interested in any other wisdom you can share with us about the FPSLIC.

- a

--
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380
Reply to
Adam Megacz

Hi Adam,

Atmels timing analyser knows two types of critital paths: long and short. Long paths are those that you mention. A signal doesn't propagate fast enough to its destination before the next clock, which results in a setup violation at the destination.

The tools can be configured to detect this, and to optimize according to a timing constraints file. After this step, the tools inform you whether the constraints have been satisfied, and what the maximum allowed clock speed is.

The other type of critical path is the short path. That means that a signal races with the clock, and arrives at the destination too early. This can be caused by a poor clock distribution tree (with a lot of skew), or by slow flipflops with longish hold requirements. When the data signal arrives early, the hold time of the destination flipflop is violated (at all clock speeds).

As with the long paths, the Atmel tools can be configured to detect short paths. However, it is not possible to optimize for them to go away. If you have a short path, you are left alone. You can do only three things:

a) re-run the optimizer. It has a random decision element and produces a different output on each run. Maybe the short path goes away, or moves to a different part of your design.

b) change your design to be less efficient, for example by adding combinatorial logic. Make sure the synthesizer doesn't optimize it away, though.

c) do manual PAR, or touch-up the PAR output manually. I tried to do this. But since routing resources saturate quickly even on design with moderate resource usage, I almost never managed to "repair" a short path by hand.

I emailed a few times about this with the (excellent) Atmel technical support, but they had no plans to improve their tools with respect to the short path problem.

During development (working with just one silicon), I found it most convenient to implement extensive self-tests and re-run the optimizer until the test passed. This was the quickest way to make a design work.

Here's a sample timing report of a short path. The design had about

60% resource usage, and there were about 15 similar short paths (with actual hold violations).

Path #10

Slack = -1.18ns Type = Flop -> Flop ('rj45_i1_notri_reg_reg_t_plus_1(1) CLK' ->

'rj45_i1_notri_RX_STATUS_FILE/R2_0/$E1 AIN0')

ClockEdge: GCLK5 on SYSCLKBUF_inst ACT _/ 0.00ns ClockDelay: SYSCLKBUF_inst ACT -> rj45_i1_notri_RX_STATUS_FILE CLK _/ 4.69ns

------ Required Arrival Time: _/ 4.69ns

Clock Edge: GCLK5 on SYSCLKBUF_inst ACT _/ 0.00ns Clock Delay: SYSCLKBUF_inst ACT ->

rj45_i1_notri_reg_reg_t_plus_1(1) CLK _/ 4.39ns Data Path Delay: rj45_i1_notri_reg_reg_t_plus_1(1) CLK ->

rj45_i1_notri_RX_STATUS_FILE/R2_0/$E1 AIN0 _/ 4.22ns Hold Time: rj45_i1_notri_RX_STATUS_FILE/R2_0/$E1 AIN0 _/ -5.10ns

------- Actual Arrival Time: _/ 3.51ns

Regards, Marc

Reply to
jetmarc

Ah, I get it! The FF's require a stable input for a period *after* the clock as well. I had forgotten about this -- or rather, I had (wrongly) assumed that this time period was so small in comparison to the propagation delays between cells (even on the fast nearest-neighbor lines).

Do you think this is always explained by clock skew? I've been experimenting with asynchronous circuits on the At94k and I've gotten some awesome results, but so far I'm only using the LUTs and the internal feedback line (routed *around* the register) to create various stateful elements (Muller C-element, for example). It's incredibly robust, even in response to temperature changes -- the self-timed lore really works.

I was planning on trying to exploit the FF's to make an asynchronous latch without having to waste another LUT (don't ask for the details; I'm still working this out!). I don't distribute a global clock, so skew is not a concern, and I use handshaking to ensure the pre-"clock" setup time is obeyed (no runts or glitches on the H4/V4 lines that drive the "clock" input to the FF's). But I hadn't given any thought to the post-clock hold time.

The problem scenario would be two FF's in the same sector-column (group of four cells clocked together) wherein the following events occur:

  1. Edge of clock 2. FF #1's output changes as a result of clock 3. Output of FF #1 propagates to Input of FF #2, causing a change there

... where the time between #1 and #3 is less than the FF hold time. Can this happen? Or are hold-time violations always a result of clock skew? I always handshake between differently-clocked cells.

I had hoped that none of the routes would be that fast (or rather, that the FF's wouldn't be so slow). But now that I think about it, the internal feedback wire (from the register's output to the W-input) is local to the cell, so it must be pretty fast. I guess some experiments are in order.

- a

--
PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380
Reply to
Adam Megacz

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.