PipelineC - C-like almost hardware description language - AWS F1 Example

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi folks,
Here to talk about PipelineC.


What is it?:
- C-like almost hardware description language
- A compiler that produces VHDL for specific devices/operating frequencies
I am looking for:
- anyone who wants to help me develop (Python, VHDL, C)
- suggestions on how to make PipelineC more useful/new features
- project ideas (heyo open source folks)

In the mean time, I am also here to share my most interesting example so fa
r: Using PipelineC with an AWS F1 instance.  


I have made an AMI that you can use to play around with. However, it cannot
 be made public; I can only share it with specific AWS accounts, please mes
sage me if interested.

I want to share with you why I think PipelineC is particularly powerful:

First, it can mostly replace VHDL/Verilog for describing low level, clock b
y clock, hardware control logic. Consider the following generic VHDL:

-- Combinatorial logic with a storage register
signal the_reg : some_type_t;
signal the_wire : some_type_t;
process(input, the_reg) is -- inputs sync to clk
    variable input_variable: some_type_t;
    variable the_reg_variable : some_type_t;
    input_variable := input;
    the_reg_variable := the_reg;

    ... Do work with 'input_variable', 'the_reg_variable'
    and other variables, functions, etc and it kinda looks like C ...

    the_wire <= the_reg_variable;
end process;
the_reg <= the_wire when rising_edge(clk);
output <= the_wire;

The equivalent PipelineC is

some_type_t the_reg;
some_type_t some_func_name(some_type_t input)  
    ... Do work with 'input', 'the_reg'
    ... and other variables, functions, etc...

    // Return==output
    return the_reg;

Using that functionality I was able write very RTL-esque serialize+deserial
ize logic for the AXI4 interface that the AWS F1 shell logic provides to 'c
ustomer logic' for DMA. The AXI4 is deserialized to a stream of 4096 byte i
nput data chunks that can be processed by a 'work' function.

I find that most HLS tools have trouble giving the user this sort of low le
vel control, probably under the assumption that its too low level and not m
eant for software folks to be concerned with. Most hardware description lan
guages are built for exactly this though.

Second, PipelineC can replace the most basic feature of other HLS tools: au
to-pipelineing functions:

This AWS example sums 1024 floating point values via an N clock cycle pipel
ined binary tree of 1023 floating point adders (soft logic, not hard cores  

Below is the PipelineC code:

float work(float inputs[1024])
    // All the nodes of the tree in arrays so can be written using loops
    // ~log2(N) levels, max of N values in parallel
    float nodes[11][1024]; // Unused elements optimize away
    // Assign inputs to level 0
    uint32_t i;
    for(i=0; i<1024; i=i+1)
        nodes[0][i] = inputs[i];
    // Do the computation starting at level 1
    uint32_t n_adds;
    n_adds = 1024/2;
    uint32_t level;
    for(level=1; level<11; level=level+1)
        // Parallel sums at this level
        for(i=0; i<n_adds; i=i+1)
            nodes[level][i] =  
                          nodes[level-1][i*2] + nodes[level-1][(i*2)+1];
        // Each level decreases adders in next level by half
        n_adds = n_adds / 2;
    // Return the last node in tree
    return nodes[10][0];

(To be clear, I am NOT claiming that this is the best way to sum floats in  
hardware - its just a basic example big enough to use most of the FPGA).

The PipelineC tool inserts pipeline registers as needed to meet timing on t
he particular device technology + operating frequency. I find that most HLS
 tools are pretty good at this (and will do alot more than inferring pipeli
nes too) but often require some ugly pragmas that - in a way - can make the
 code undesirably device specific. Hardware description languages can certa
inly describe the above hardware. But the code will almost certainly descri
be a pipeline designed specific to device technology/operating frequency -  
making the code hard for others to reuse even if you are kind enough to sha
re it.

The very capable Virtex Ultrascale+ AWS hardware allows the PipelineC tool  
to fit the work() function into a pipeline depth/latency of 15 clock cycles
 (might be able to squeeze into few as 10 clocks). Running  at 125MHz, it t
hus is capable of summing 1024 floating point values in 120 nanoseconds, wi
th an 8 ns cycle time.

work() Pipeline:
- Frequency: 125 MHz, new inputs each cycle
- Latency: 15 clocks / 120 ns
LUTS   Registers CARRY8 CLB
322144 137181    16307  62664

Here is the 'main' function / top level for the full hardware implementatio

aws_fpga_dma_outputs_t aws_fpga_dma(aws_fpga_dma_inputs_t i)
  // Pull messages out of incoming DMA write data
  dma_msg_s msg_in;
  msg_in = deserializer(i.pcis);
  // Convert incoming DMA message bytes to 'work' inputs
  work_inputs_t work_inputs;
  work_inputs = bytes_to_inputs(msg_in.data);
  // Do some work
  work_outputs_t work_outputs;
  work_outputs = work(work_inputs);
  // Convert 'work' outputs into outgoing DMA message bytes
  dma_msg_s msg_out;
  msg_out.data = outputs_to_bytes(work_outputs);
  msg_out.valid = msg_in.valid;
  // Put output message into outgoing DMA read data when requested
  aws_fpga_dma_outputs_t o;
  o.pcis = serializer(msg_out, i.pcis.arvalid);
  return o;

On the software side, utilizing the FPGA hardware with user space file I/O  
calls looks like:

// Do work() using the FPGA hardware
work_outputs_t work_fpga(work_inputs_t inputs)
    // Convert input into bytes
    dma_msg_t write_msg;
    write_msg = inputs_to_bytes(inputs);
    // Write those DMA bytes to the FPGA
    // Read a DMA bytes back from FPGA
    dma_msg_t read_msg;
    read_msg = dma_read();
    // Convert bytes to outputs and return
    work_outputs_t work_outputs;
    work_outputs = bytes_to_outputs(read_msg);
    return work_outputs;

So there you have it: Low level RTL-like control, working right beside high
ly pipelined logic. All in a familiar C look that could just be compiled wi
th gcc for 'simulation'. Ex. this example uses the same work() function cod
e as hardware description and as the 'golden C model' compiled with gcc to  
compare against.

In the sense that C abstracts away the hardware specifics of each CPU archi
tecture + memory model, but only at a very minimal level, I want PipelineC  
to be the same for digital logic. The same PipelineC code should produce co
mputationally equivalent hardware on any FPGA/ASIC device technology throug
h smarts in the compiler. But C/PipelineC obviously doesn't do everything,  
there isnt a whole lot of higher level abstraction done for you. Its just t
he bedrock to build shareable libraries.

Some big features PipelineC lacks as of the moment
- Flow control/combinatorial feed-backward signals through N clock pipeline
d logic
 - PipelineC can describe FIFOs, BRAMs (hard BRAM IP is the only IP support
ed right now) to work with data flows, but the equivalent off a bare combin
atorial <= assignment operator feedback is missing
- Multiple clock domains / clock crossings (have some neat ideas about this
 - This would likely be my next big...many month... task?
- The C parser I'm using doesnt let you return constant sized arrays, but P
ipelineC as a language really should, but I think if I modified it (oh gosh
 help me?) and said 'use g++' to compile this 'C code that returns arrays'  
I think it could work out?

Got any ideas on what you'd want to do with PipelineC? Let me know maybe we
 can make something cool together. Want support for an open source synthesi
s tool, I can give Yosys a try?

Thanks for your time folks

Re: PipelineC - C-like almost hardware description language - AWS F1 Example
On 22/03/20 01:15, Julian Kemmerer wrote:
Quoted text here. Click to load it

With anything like this you have 30s to convince me
to spend some of my remaining life looking at it rather
than something else. Hence I want to see:
  - what benefit would it give me, and how
  - what won't it do for me (it isn't a panacea)
  - what do I have to do to use it (scope of work)
  - what don't I have to do if I use it (I'm lazy)
  - how it fits into the well-documented toolchains
    that many people use (since it doesn't do everything)

If I see the negatives, I'm more likely to believe
the claimed positives.

Re: PipelineC - C-like almost hardware description language - AWS F1 Example
On Sunday, March 22, 2020 at 6:43:31 AM UTC-4, Tom Gardner wrote:
Quoted text here. Click to load it

Give a quick go:

what benefit would it give me, and how:
Feels like RTL when doing clock by clock logic, and can auto pipeline logic otherwise.

what won't it do for me (it isn't a panacea):
Not a full RTL replacement yet. Would love help to get it there.

what do I have to do to use it (scope of work)
Write C-looking code, tool generates VHDL that can dropped into any existing project. Mostly a matter of time to run the tool in addition to already long builds.

what don't I have to do if I use it (I'm lazy):
Dont have to manually pipeline all you logic to specific devices / operating frequencies. Can share 'cross-platform' code.

how it fits into the well-documented toolchains:
Outputs VHDL. And C-looking code can be used with gcc for debug/modeling.

Thanks eh!

Site Timeline