My invention: Coding wave-pipelined circuits with buffering function in HDL

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart exc ept that the wive-pipelined circuit has only one stage, a critical path fro m the input register passing through a piece of computational logic to the output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:

  1. Write a Critical Path Component (CPC) with defined interface;

  1. Call a Wave-Pipelining Component (WPC) provided by a system library;

  2. Call one of 3 link statement to link a CPC instantiation with a paired W PC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C

Reply to
Weng Tianxiang
Loading thread data ...

xcept that the wive-pipelined circuit has only one stage, a critical path f rom the input register passing through a piece of computational logic to th e output register, and no intermediate registers.

d logic code about the critical path, and leave all complex logic designs t o a synthesizer and a system library that is what an HDL should do.

WPC instantiation to specify what your target is.

in VHDL

cycle

g 1 data per

A & B

A
B

a C

A
B

r C

logic

-------

t is simple as writing a one-cycle logic circuit.

Hi,

The following information is from Wikipedia:

  1. The Intel 8087, announced in 1980, was the first x87 floating-point copr ocessor for the 8086 line of microprocessors.

  1. MMX is a single instruction, multiple data (SIMD) instruction set design ed by Intel, introduced in 1997 with its P5-based Pentium line of microproc essors, designated as "Pentium with MMX Technology".[1] It developed out of a similar unit introduced on the Intel i860,[2] and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on recent IA-32 processors by Intel and other vendors.

MMX has subsequently been extended by several programs by Intel and others: 3DNow!, Streaming SIMD Extensions (SSE), and ongoing revisions of Advanced Vector Extensions (AVX).

8087's floating 64-bit multiplier needs 5 cycles to finish a data processin g with one input data per cycle.

MMX floating 64-bit floating multiplier needs 4 cycles to finish a data pro cessing with one set of input data per 2 cycles.

Because each multiplier needs one multiplicand A and one multiplier B to ge t the result C, so naturally many testing benches claim MMX 64-bit floating multiplier is 20% faster than 8087 (4 cycles vs 5 cycles).

With my invention, any college students with knowledge of HDL can write a M MX wave-pipelined 64-bit floating multiplier within half an hour under foll owing conditions:

  1. My invented system is fully accepted to HDL;

  1. Synthesizer manufacturers have updated their products to handle the gene ration of related wave-pipelined circuits. All related technology and algorithms are available off selves.

  2. It needs time.

One of wonderful wave-pipelined circuits I think may be 16 channels FFT pro cessor with wave-pipelined technology: the benefits are faster running freq uency and a lot of saving in respect of logic area and power consumption.

Thank you.

Weng

Reply to
Weng Tianxiang

Do you have a YouTube example? And an example that wil synthesize in Icarus? So we can see your method compares to a standard example.

--
Rick C. Hodgin
Reply to
Rick C. Hodgin

Hi Rick,

Actually I have got 3 patents issued for the subject:

  1. 9,747,252: Systematic method of coding wave-pipelined circuits in HDL.
  2. 9,734,127: Systematic method of synthesizing wave-pipelined circuits in HDL.
  3. 9,575,929: Apparatus of wave-pipelined circuits.

All 3 patents have the same specification, drawings, abstract with differen t claims

Here is my new non-provisional patent application 15,861,093 (application, hereafter), "Coding wave-pipelined circuits with buffering function in HDL" , filed to USPTO on 2018/01/03.

The non-provisional patent application 15,861,093 has a *txt (*.vhd) file a ttached so that they are not secrets and any persons who are interested in the subject can email me to get what he wants, I would email the file set t o him, even full application set will be published 18 months later.

The following is part of my sell-promotional file to some big companies:

"The new application can be viewed in some extents as the continuation of t he 3 patents logically, but legally it is a brand new invention devoting th e main attention to coding buffering function for wave-pipelined circuits i n HDL, a topic never mentioned in the 3 patents, while it is still paying g reat attention to improve the 3 patents to make them more robust, friendlie r and more complete in point of view from coding designers."

In the 3 previous patents a first version of source code was attached, the new application provides the second version. With the 2nd version of VHDL s ource code available you can use a VHDL-2002 or above simulator to simulate all workings and generate waves. The source file is also well noted with i nserted debugging function code.

Please email me what you want me to send: for 3 patents:

1.1 Specification

1.2. 3 sets of claims.

1.3. Drawings.

1.4. Source code.

1.5. ZIP file of all above.

For new application:

2.1 Specification.

2.2. claims.

2.3. Drawings.

2.4. Abstract.

2.5. Source code.

2.6. ZIP file of all above.

For the new application, specification has 81 pages, 48 claims have 15 page s and drawings have 24 pages.

If you lack time, the best way to learn all working structures needs only 2 .1 Specification; 2.3. Drawings; and 2.4. Abstract.

Because the target of my patents and new application is a) to make my inven ted system as part of HDL (not only VHDL, but all languages in HDL), and b) to make the source code as part of system library in HDL, I am willing to distribute my code and all related files to any persons who are really inte rested in how I did it.

Through CPC_1_2 you may know that my scheme needs the least logic informati on and coding from a designer to resolve a very difficult problem, an almos t 50-years open problem.

My Email address is wtx wtx @ gmail . com (please remove spaces between cha racters)

Thank you.

Weng

Reply to
Weng Tianxiang

xcept that the wive-pipelined circuit has only one stage, a critical path f rom the input register passing through a piece of computational logic to th e output register, and no intermediate registers.

d logic code about the critical path, and leave all complex logic designs t o a synthesizer and a system library that is what an HDL should do.

WPC instantiation to specify what your target is.

in VHDL

cycle

g 1 data per

A & B

A
B

a C

A
B

r C

logic

-------

t is simple as writing a one-cycle logic circuit.

Hi,

Here is more information on WPC (Wave-Pipelining Component) provided by a s ystem library (I wroted).

  1. There are only 2 WPCs to cover all wave-piplined circuits: a) It is used for the situation under which only one critical path is use d. b) It is used for the situation under which more than one same critical p ath is used.

  1. There are 5 types of structures of all wave-pipelined circuits based on my classification: a) A one cycle non-pipelining circuit when it is coded as a wave-pipeline d circuit, but finally it turns out to be a 1-cycle regular circuit.

b) A wave-pipelined circuit that can accept one input data per cycle with one critical path.

c) A wave-pipelined circuit that can accept one input data per multiple c ycles with one critical path.

d) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register a nd an output register.

e) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register a nd sharing a sole output register.

  1. The method guarantees 100% success rate for generating a specific wave-p ipelined circuit.

Thank you.

Weng

Reply to
Weng Tianxiang

There is perhaps some explanation in "Wave-Pipelining: A Tutorial and Research Survey"[1], and "DESIGN AND TIMING ANALYSIS OF WAVE PIPELINED CIRCUITS"[2].

Jan Coombs

--

[1] IEEE  Transactions on VLSI Systems  
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf 

[2] Recep Ozgun's MSc thesis 
https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=3
Reply to
Jan Coombs

rep1&type=pdf

=3

Hi Jan,

I appreciate your efforts to dig deep into my inventions.I would like to pa tiently answer all reasonable technical questions.

Your reference [1] is none but what activates my inspiration to resolve the open problem: design both a coding and a synthesizing methods so that any logic design engineers, including college students with basic knowledge in HDL, can code and generate a wave-piplined circuit.

All published materials I have read are centered on how to eliminate data c ontamination, a special feature which is never heard in any non-wave-pipeli ned circuit design.

A data contamination is defined as a later entered data catches up an earli er entered data, damaging the earlier entered data.

What my inventions do is to build a bridge between code designers and synth esizers in order to code and generate a wave-pipelined circuit in the easie st way:

If a code designer provides all necessary and sufficient information to a s ynthesizer, the synthesizer should and can generate a wave-pipelined circui t as it is specified.

Your reference [1] (1998) at page 142 below table 1 indicates that "Last, d ue to a lack of commercial tools that are directly applicable to designs us ing wave-pipelining, each group has more or less developed in-house design analysis and optimization tools which enable VLSI design using wave-pipelin ing."

So I have assumed at the beginning of my project that if a new part on wave

-pipelined circuit in HDL standard is well designed and laid out,any synthe sizer manufacturers have the ability to generate a wave-pipelined circuit. The assumption was also based on your reference [1] (1998) at table 1 at pa ge 142 where it indicates there are 30 wave-pipelined circuits (20 years ag o), none of their authors have any relationships with a professional synthe sizer manufacturer.

Furthermore during the development period I found that no matter how many t ypes of wave-pipelined circuits are in the past or in the future, each of a ll wave-pipelined circuits comprises two part, one is the critical path, pr esented by CPC (Critical Path Component), all remaining logic is always the same for a group of wave-pipelined circuits WPC (Wave-Pipelining Component ), depending on what target a designer wants for his circuit.

In my design no timings related to a wave-pipelined circuit appear, never, because they are within the scope of a synthesizer operation and have nothi ng to do with their coding.

There is no a commercial synthesizer in the world which can directly genera te a wave-pipelined circuit. To prove my WPCs are correct, I coded a CPC wh ich does nothing but passes the data in the critical path obeying a critica l path behavior: if the critical path needs 5 cycle for signals to travel, its output would be available in 6 cycles and if the critical path is block ed, a later entered data would have a chance to damage an earlier entered d ata if design is not right. So essentially I have no very sophisticated too ls used, nor timing analysis.

Thank you.

Weng

Reply to
Weng Tianxiang

Hi,

I have told that my invention kernel idea is: A designer provides the least information and logic code about the critical path, and leaves all complex logic designs to a synthesizer and a system library that is what an HDL sh ould do.

Here are the technique key points that I have used used to fully develop my technique, assuming that you are an experienced code designer in HDL.

Even though the technique is tricky, but it is easy to understand if you fu lly understand the concepts in this and next posts, each in 20 or more minu tes for 80% engineers here

Here I am using 64*64 bits signed multiplexer as the target circuit example .

  1. If my CPC_1_2 code is presented to a synthesizer, the first question you may ask is how do you code your WPC (Wive-Pipelining Component). For clari ty, I copied the CPC_1_2 code here again.

By the way, I claim that nobody can further simplify the CPC_1_2 code to de liver full information about a critical path to a synthesizer for generatin g a wave-pipelined circuit! If you can, please challenge my claim.

entity CPC_1_2 is generic ( input_data_width : positive := 64; -- optional output_data_width : positive := 128 -- optional ); port ( CLK : in std_logic; WE_i : in std_logic; -- '1': write enable to input registers A & B Da_i : in signed(input_data_width-1 downto 0); -- input data A Db_i : in signed(input_data_width-1 downto 0); -- input data B WE_o_i: in std_logic; -- '1': write enable to output register C Dc_o : out unsigned(output_data_width -1 downto 0) -- output data C ); end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is signal Ra : signed(input_data_width-1 downto 0); -- input register A signal Rb : signed(input_data_width-1 downto 0); -- input register B signal Rc : signed(output_data_width-1 downto 0); -- output register C signal Cl : signed(output_data_width-1 downto 0); -- combinational lo gic begin Cl

Reply to
Weng Tianxiang

Weng Tianxiang wrote on 1/10/2018 8:56 PM:

What is SMB?

I think I understand the concept of wave pipelining. It is just eliminating the intermediate registers of a pipeline circuit and designing the combinational logic so that the delays are even enough across the many paths so the output can be clocked at a given time and will receive a stable result from the input N clocks earlier. In other words, the logic is designed so that the changes rippling through the logic never catch up to the changes created by the data entered 1 clock cycle earlier. Nice if you can do it.

I can see where this would be useful in an ASIC. In ASICs FFs and logic compete for space within the chip. In FPGAs the ratio between FFs and logic are fixed and predetermined. So using logic without using the FFs that are already there is not of much value.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

...

Thanks, interesting, but sounds complex to get reliable operation.

Generally true, but

1) You might be able to combine three stages that require 2/3 of a clock cycle for maximum propagation delay, and get the result in in the time of two clock cycles.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then each tile can be a latch or a LUT, so flops are not wasted.

Either way there must be a great deal of complex floor planning and/or timing constraints needed to make this work. Automating this would be amazing?

Jan Coombs

Reply to
Jan Coombs

Jan Coombs wrote on 1/20/2018 2:20 PM:

If your stages are only using 2/3 of a clock, you can regroup the logic to make it 1 clock each in two stages. There is supposed to be software to handle that for you although I've never used it.

There's your first mistake, no one uses Actel/Microsemi FPGAs. They long for the day they are as big as Lattice, lol!

Isn't that what the OP is claiming? I'm surprised he could make this work over PVT. The actual stable time has to be on a clock edge, the same clock edge under all conditions. I wouldn't want to try that manually in a simple circuit.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

o

k

ck

ple

Rick?

SMB stands for Series Master component with Buffering function, one of 2 WP C (Wive-Pipelining Component).

I don't understand what you are saying: "Isn't that what the OP is claiming? I'm surprised he could make this work over PVT. "

What do OP and PVT stand for?

My attention on this topic is centered on introduction of my inventions to public and asking for their critical comments, challenge or suspicion from technical point of view, not specially on whether or not they are useful.

Personally I never have a chance to write a pipelined circuit, not mention designing for a wave-pipelined circuit.

What I did is a result of my observation that such an important problem can be perfectly resolved by my insight as a person outside the wave-pipelined design circle, fully based on only one reference [1] IEEE Transactions on VLSI Systems

formatting link
p1&type=pdf .

Weng

Reply to
Weng Tianxiang
[much irrelevant stuff snipped - please help with this]

I was unable to quickly understand the "2 fast reading materials" which you sent me.

Why do you have patents. A patent should disclose the method of the novelty, so would need an implementation. Perhaps this is what I am missing?

Perhaps if you follow wave-pipelined techniques to the limit, you will find yourself looking at asynchronous (or self clocked) logic. There is also much historical work on this, and it may be easier to test on FPGA chips[1].

Jan Coombs

--
[1] or at least drum up some business for Microsemi/Actel
Reply to
Jan Coombs

Microsemi has been at the number 3 spot for as long as I use FPGA's (+/-

28 years starting with Actel's A1010). They are twice as large as Lattice.

Here is a reference:

formatting link

Hans

formatting link

Reply to
HT-Lab

OP = Original Poster, the person who started the topic

PVT = Process / Voltage / Temperature (I presume)

The issue being that gate delay isn't a hard fixed value, but changes slightly (or not so slightly) from device to device and under varying operating conditions, which brings in to question the designing of a gate tree that presents results stably and reliably two clock cycles after application, even with the inputs changing after one clock cycles.

Reply to
Richard Damon

=rep1&type=pdf .

Jam,

I don't think you are right: "Perhaps if you follow wave-pipelined techniqu es to the limit, you will find yourself looking at asynchronous (or self cl ocked) logic."

I had studied the asynchronous circuit, but found that it is a dead road ba sed on its structural inefficiency and current commercial trend. And coding or synthesizing a wave-pipelined circuit has nothing to do with their coun terpart for an asynchronous circuit, and the former is much more complex th an asynchronous circuit!

Synthesizing a wave-pipelined circuit needs much more complex algorithms th at have been matured since 1969 based on my observation.

My design never considers PVT, it belongs to another specialty field and I have zero knowledge on it.

From my point of view building a bridge between a code designer and a synth esizer is a very important issue to publicize the technology for wave-pipel ined circuits:

in 1980 Intel published and developed 8087 for 32-bit floating multiplier;

10 and more years later, in 1997 they claimed MMX technology, including a s econd version of 64-bit floating multiplier. From my point of view the seco nd version of 64-bit floating multiplier using MMX technology is none but a technology using wave-pipelined circuit.

Regular engineers never have a chance to implement a wave-pipelined circuit because of the complexity of all related PVT.

But according to my scheme, the most complex part of generating a wave-pipe lined circuit is fully left to synthesizer manufacturers and a code designe r in HDL only focuses his attention to how to code it with zero knowledge a bout how a wave-pipelined circuit is synthesized and generated that hopeful ly leads to a situation that any college student with basic knowledge in HD L can generate the second version of 64-bit floating multiplier within half an hour.

As far as 2 fast reading materials are concerned, please communicate with m e through private email and let me know what you want: specification, drawi ng and source code in VHDL. Sorry, I mistakenly thought you were a lawyer, not an engineer.

Thank you.

Weng

Reply to
Weng Tianxiang

OP means "original poster" and is a common abbreviation in newsgroups. PVT means Process, Voltage, Temperature and are the three main factors causing variations in delay times in silicon chip. If you don't account for these effects in your timing calculations you wave pipelining idea won't work. If you aren't aware of this, I suspect you don't really understand how to design FPGA devices. It isn't all text book analysis.

Then I think you have not solved anything. The problem with wave pipelining is that the timing can vary so much that the output of the combinational circuit won't be stable during the clock edges. If you haven't tested your ideas by designing a circuit and running it on an FPGA, you don't know any of this will work in the real world.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

The multiplier is not a good example to use as many FPGAs contain multiplier blocks. But then they are pipelined and so won't work in a non-pipelined solution, so maybe you can show your technique even if it has little practical value in this case.

The problem is "the most complex part of generating a wave-pipelined circuit is fully left to synthesizer manufacturers". Your method leaves me wondering what your software is doing??? Asking the synthesizer companies to solve your problems of making it work is a bit of a stretch. What makes you think they will even take on your idea rather than provide their own solution.

If your patent only covers the idea of writing simple HDL to describe the circuit desired and leaving the implementation details to the synthesis companies, I don't think you have actually patented anything. This part if very obvious. The *real* work is in synthesizing a circuit that will work in the FPGA.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

MUCH more than slightly. The numbers I have been told is 2:1 is not uncommon. That's why overclockers can get CPU chips to run *much* faster than they are rated. They provide very excellent cooling, tweak the PSU voltage and select their special chips.

This is also why we use synchronous logic with registers for pipelines.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

There's some BS somewhere...

formatting link

More importantly, look at the numbers in your link. The Actell/Microsemi numbers are going in the wrong direction! X, A and L are headed upward year-to-year and Actel is headed down!

While looking this up I found a link indicating the JTAG interface of the ProASIC3 devices has a back door which would allow their security to be bypassed. Security was their claim to fame and this could be a major blow to the company.

--

Rick C 

Viewed the eclipse at Wintercrest Farms, 
on the centerline of totality since 1998
Reply to
rickman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.