Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

ng more than 10 states must have a clock gating function to save power cons umption:

d not be generated to keep the state unchanged and save power consumption.

the fact that the state does not change. A flip flop that is clocked but d oes not happen to change its output does not consume much power. The power is needed to charge/discharge the loads that are being driven. Any decrea sed power consumption would have to do with the decrease in power in genera ting the clock input to the flip flop. But shifting from a common clock to adding a gate that generates a clock probably does not lower power since t he same number of clock signals are being generated. If the gated clock ro uting is a higher capacitive route then when using a free-running clock the n you can consume more power. This is the result when trying to implement gated clocks in FPGA. ASIC will be different.

ion may not be necessary because too few state machines are implemented in any normal application.

escribe in an FPGA results in an increase in power consumption. I provided you with all of the details for your sample design. The results of that a nalysis are not "because too few state machines are implemented", it is bec ause gated clocks in FPGA use more power, not less. Again, that was with y our sample design of that time which appears to be the same thing you are r eusing here.

VHDL as follows after the post is posted:

e apparent usage of a possibly free running clock.

ays it is. No worries though, synthesis tools should optimize out the 'els if' and leave the assignment 'WState > > elsif WState /= WState_NS then -- WState /= WState_NS is n ecessary!

nd claiming since the code is not complete and does not compile...as usual.

do not find Hans of

formatting link
giving his opinion. Usually his opinion is reasonable and informative and he knows many things outside the FPGA chi ps beyond my knowledge.

r earlier publication. The publication will happen about 14 weeks later sin ce its filing date.

to IEEE Transaction of circuits and System for publication. The review proc ess may take up to 3 months.

cannot disclose any details about my invention until the transaction agree s to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.

s almost the same as conventional method would generate, or maybe even simp ler than conventional method.

I think you missed the mark by a wide margin on this one. The logic needed for the clock gating is this...

elsif WState /= WState_NS then

This is not so trivial compared to the FSM itself, especially in an ASIC. I would estimate it is approximately the same amount of logic in general.

in state machines for the Cache II control. If they don't care about the p ower saving or they have implemented some scheme in the implementation, my invention would be of few values, or otherwise it would be worth million of dollars.

For a patent to be valid it has to be non-obvious to a practitioner in the field. I don't know how this is non-obvious to someone in the field of CPU design. You may obtain a patent, but then lose a patent defense case in c ourt. But again, I didn't think cell phones would take off and now I have two.

out how to implement a state machine with clock gating function.

What exactly is your "invention"??? Clock gating is nothing new. It is ap plied to many parts of a CPU. Is your invention the idea of applying it to the individual FSMs in a CPU cache? So if someone instead applies it to g roupings of FSMs in a CPU cache they will have worked around your patent.

gister and sell the application at

formatting link

-to-ast/. I know the website because Google refers to the website and indic ates they are a member of the site. I expect that Intel, IBM, AMD, Apple ma y also be the members of the website. The site asks for the selling price d uring registration. So it is important for me to assess my invention's valu e properly.

What value have you assessed so far?

is website, not mention taking part in the discussion of my post.

ore my registrations in the patent selling website.

input for 8 registers in the block. Altera may be in the same situation. So clock enable is never a new thing and we don't have to pay attention to ho w the clock trees work. For a CPU design, in my opinion, logic design and c lock tree design are 2 separated domains one after another, and logic desig ners never have to pay attention to the clock trees.

Clock enable and clock gating are not the same thing. Clock enable saves p ower by not changing the FF state, but if the FF input is the same as the o utput the state won't change anyway.

Here is something to consider. Clock gating saves power compared to clock enabling by reducing the power consumed in the clock tree. How much of the clock tree will you actually be gating with a fine grained approach? Cloc k trees are exponential structures with a multiplier for the fan out at eac h level. With this fine grain approach you are only saving power in the fi nal level and in fact, may be adding a level if your clock gating control i s at a finer resolution than the last level of clock drive.

Generally clock gating is used at a high level to gate the clock to section s of a chip. I expect it is seldom if ever used at a low level because the power saved is not optimal and the logic required is maximal.

Rick C.

-- Get 6 months of free supercharging -- Tesla referral code -

formatting link

Reply to
gnuarm.deletethisbit
Loading thread data ...

Weng - I find your obsession with "state machines" a bit puzzling. I seem to recall an poster a few years ago asking about the "largest" state machine in current designs - was this you? It seems likely, in the event that you consider a CPU cache as a large number (~100,000) of state machines running in parallel.

I've not designed a CPU cache. But I can pretty much guarantee that whomever designed that CPU cache you're thinking about did NOT model the design as such (a lot of state machines running in parallel). To be frank, I can see the entire design being done without implementing a "state machine" at all.

A state machine is simply a model to make it easier for humans to understand and design a circuit. It's not neccesary at all to apply this model to any or all digital circuits.

One of my co-workers (for whatever reason) abhors "State machine" design, and won't use them - at all. That's fine, he models things differently. And he's a very productive engineer - not hindered one bit by his lack of use of "state machines".

Conversely, one can model an entire ASIC (or FPGA) design as simply one large state machine. Or many smaller state machines running in parallel. (Assume a single clock for this analogy). It's just that a model to aid our (the designers) view of a design.

Take a full schematic of any full ASIC. Draw a random blob around ANY set of 4-5 FFs. Include some parts of the fanin and fanout logic of those flip flops. Bam - there's a state machine. Repeat 20,000 times for all FF's in the design. Is this useful - not really - but it will meet any definition of "State Machine" that you can define.

Regard,

Mark

Reply to
gtwrek

w do you handle it using your scheme?

Hi Mark,

You really has good memory!!!

I posted a post with title: "What is largest number of state machines in a chip" at this FPGA group several years ago.

Here are tons of state machine patents about how to design a L2 cache. I li st only the search word "L2 cache inassignee:intel" and you can find throug h Google there are 4,830 patents filed and issued by Intel, the search word "L2 cache state machine inassignee:intel" and it leads to 4,360, each of t hem is related to a type of state machines.

I believe that anyone cannot be accounted as a professional digital circuit designer if he does not seriously consider or design a state machine.

One of my hobbies is to look at patents filed by Intel, IBM, AMD, Xilinx an d Altera. Reading Xilinx and Altera' patents gives me the knowledge on how they design their FPGA chips. Reading Intel, IBM and AMD' patents gives me the knowledge on how they design something very complex and new technology trend. And through the reading I find many topics for me to further develop .

I disagree with your following opinion: "I've not designed a CPU cache. But I can pretty much guarantee that whomever designed that CPU cache you're thinking about did NOT model the design as such (a lot of state machines running in parallel). To be frank, I can see the entire design being done without implementing a "state machine" at all. "

Here is an Intel patent: US8493397B1: "Circuit for placing a cache memory i nto low power mode in response to special bus cycles executed on the bus"

formatting link

formatting link

I agree with your following opinion: "a lot of state machines running in parallel".

After my invention all state machine design will be benefited to be in lowe r power status, no matter what type of state machines is, and the logic res ource usage is less than a conventional synthesizer would generate.

Rick, I disagree with your opinion: "elsif WState /= WState_NS then

This is not so trivial compared to the FSM itself, especially in an ASIC. I would estimate it is approximately the same amount of logic in general. "

In my invention there is no one single logic gate generated for comparison "WState /= WState_NS". Is it obvious to you?

That is the best point of my invention.

Thank you.

Weng

Reply to
Weng Tianxiang

Proof by counter-example. My coworked is an excellent "professional digital circuit designer" and has been for over 30 years. He does use a "state machine" to model any of his designs. He doesn't like the model. Again, we're talking about using a model as a tool. That model doesn't work for him. He has others that work quite nicely.

That's a non-sequiter: There's nothing in that patent search that says the designer is using a state machine model to design the CPU cache. That's all part of your imagination.

A "digital circuit" is just FF's and combinatorial gates tied together in clever ways. Whether you apply a "state machine" model to the circuit is just something between your ears. Most of the tools see just FF's, gates (or LUTs), and timing paths.

I assert (without eny evidence whatsoever) that whomever designed that CPU cache memory at Intel did NOT model it (in his head or otherwise) as

100,000 or more state machines running in parallel. That's just crazy.

You're still not hearing me. If you have some Super Snazy Algorithm that does some magic low power thing targetting state machines, then the Super Snazy Algorithm would also be capable of targetting ANY digital circuit. (If I recall, most FPGA Low power optimizers run rather late in the implementation process - i.e. after synthesis and "state machine" optimizations)

As a skeptical engineer (any engineer that's been around for any time whatsover fits this description) I have sincere doubts in your Super Snazy Algorithm. Bright folks have been designing low-power tools for quite some time. I've doubt there's any room for improvement (at least in the digital logic sense). And digitally, the problem's not hard to define at all. The devil is in all the details with respect to timing, and other optimization metrics. (Hint if your only metric is "logic resource usage" then you're not understanding the full problem by a long shot).

On the other hand maybe you're a digital logic savant, and are seeing new and creative solutions.

Good luck with your further patent googling, and applications.

Regards,

Mark

Reply to
gtwrek

^^^

---- Arg! Edit to make my point -------------------------------------

Reply to
gtwrek

Hi Mark,

"I assert (without eny evidence whatsoever) that whomever designed that CPU cache memory at Intel did NOT model it (in his head or otherwise) as

100,000 or more state machines running in parallel. That's just crazy. "

Here are the facts, you are welcome and no matter whether you agree or not:

  1. 6M L2 cache, the largest L2 cache I can search for with a commercial CPU;

  1. Every 64 bytes in L2 cache constitute a cache line;

  2. Each L2 cache line works independently;

  1. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

  2. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

  1. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

  2. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

  1. Each L2 cache line has at least one state machine to control its data or instructions in coherence. I will not be surprised that each L2 cache line may have up to 8 state machines to control its working.

  2. IBM: Cache-coherency protocol with upstream undefined state
    formatting link

  1. IBM: Cache-coherency protocol with recently read state for data and instructions
    formatting link

  2. NVidia: State machine control for a pipelined L2 cache to implement memory transfers for a video processor.
    formatting link

Thank you.

Weng

Reply to
Weng Tianxiang

If this is from the state machine code you posted on Jan 5, I already point ed out that the "WState /= WState_NS" is not necessary in that design eve n though your code comment said it was needed. Logic synthesis will optimiz e it out. In can do that because your posted design is not an example of a gated clock design.

However, if you move "WState /= WState_NS" to create logic that is used t o generate a gated clock in some fashion that is used to clock the state ma chine, then there will be extra logic generated to implement "WState /= W State_NS" which will consume power.

So what are you talking about...

  1. Your earlier posted code that is not of a gated clock design?
  2. Some other unpublished gated clock design where you are making unsubstan tiated claims?

Well that's too bad.

Kevin

Reply to
KJ

Kevin,

In my invention, all state machines will be synthesized to have clock gating function, no matter whether or not it is coded to have clock gating device!

Thank you.

Weng

Reply to
Weng Tianxiang

ing function, no matter whether or not it is coded to have clock gating dev

Then anything using your invention...

-Will use additional logic. The power consumed by that logic will have to be subtracted out from whatever power savings might get realized from clock ing less frequently.

-Will be impossible to get timing closure in an FPGA environment, maybe ASI C tools can handle it.

-Will consume more power in an FPGA, TBD if it will in an ASIC.

-Will not end up saving much power since state machine consume a relatively small portion of the power... the majority of the power is consumed by the data path that is being controlled.

So, function, performance, and power are all negatively impacted. Is anyon e here interested?

You also didn't answer my question about if you were referring to your Jan

5 code or some unpublished code...same 'ol story with your ideas.

Kevin Jennings

Reply to
KJ

By the way, the whole idea of not clocking a flip flop except when needed t o change state is loooooong ago pre-existing knowledge. The storage device is called a toggle flip flop, the ripple counter being the classic example of a function that is easy to understand and uses the device...you did inc lude that in your description of prior art in your patent disclosure, right ?

Kevin Jennings

Reply to
KJ

Hi Kevin,

  1. No source code is provided for a testing bench except demonstrating my ideas.

Then anything using your invention...

  1. "-Will use additional logic." No additional logic is used except a clock gating device.

  1. " The power consumed by that logic will have to be subtracted out from whatever power savings might get realized from clocking less frequently. " No additional power is consumed on no additional logic.

  2. "-Will be impossible to get timing closure in an FPGA environment" Wrong! Xilinx has a built-in clock enable input for 8 register in a LUT6 block.

  1. "-Will consume more power in an FPGA." Wrong!

  2. "TBD if it will in an ASIC." I don't know what "TBD" stands for.

  1. "-Will not end up saving much power since state machine consume a relatively small portion of the power" It is right if for a single state machine, but not correct when dealing with 100,000 state machines.

  2. I just mentioned that skipping a cycle pulse would save power. No more than that is mentioned. It is not my business.

Thank you.

Weng

Reply to
Weng Tianxiang

Unless you are using the term different than I am used to I would disagree somewhat.

A "latch" is, to my language, and asynchronous memory unit that copies it input to its output for one level of the enable, and the output holds its current value for the other level of the enable. It is one of the more primitive memory unit.

A latch could be used for clock gating, but is highly inefficient for doing so, as the properly designed clock gate knows what state the output should be in the gated off state, so doesn't need to the logic to maintain current state. The clock gating device is basically a GATE.

There may be a way to use a latch to build a gated ff, but again, there are simpler methods with better timing.

Reply to
Richard Damon

ideas.

You stated in an earlier post "In my invention there is no one single logic gate generated for comparison "WState /= WState_NS". Is it obvious to yo u?" but the code being referenced was not from a gated clock design so ther e is nothing 'demonstrating your idea' whatever that may be.

Did you not even notice your use of the word 'except' after you typed it?

No matter. So this 'clock gating device', either has only one input (which is the only thing that would not require logic resource to implement) or i t has more than one input and can generate the correct gated clock output w ithout any logic resources, which means it works by magic. The absurdity m eter is pegged at the highest setting with this claim of yours.

m whatever power savings might get realized from clocking less frequently. "

Well of course. Why would the 'operates by magic' clock gating device whic h is only needed with your "invention" require any power in order to operat e? Absurdity meter has gone off scale.

block.

You've been told this before by others, but a clock enable input is not the same thing as a gated clock. Specifically, in typical electrical engineer ing parlance, a 'clock enable' signal modifies the data input to a flip flo p, not the clock input. 'Clock enable' signals do not modify the clock in any way. Do some more research, this is a pretty basic logic design concep t.

I am correct and I sent you the full details back in 2010. The governing N DA for that work is no longer in force but I won't post all the details her e that back my claim in order to avoid embarrassing you any further. If yo u would like to post your actual design, methods and measurements here to p rovide evidence to justify your stance, feel free. Simply making statement s and claims is not evidence.

You seem to have a lot of outages of Google at your place.

tively small portion of the power"

ith 100,000 state machines.

No, the number of state machines does not matter since they will (or should ) be controlling much larger stuff that would consume the bulk of the power . If you have 100,000 state machines controlling 10,000 things in a data p ath, you likely have incompetently designed state machines.

than that is mentioned. It is not my business.

Yes, you stated that but can provide no evidence to back that claim. Witho ut that, you're just making unfounded statements, many of which are clearly incorrect and have been pointed out to you...for many years now.

Kevin Jennings

Reply to
KJ

Hi Richard,

I don't think so: "The clock gating device is basically a GATE"!

Kevin, "No, the number of state machines does not matter since they will (or shoul d) be controlling much larger stuff that would consume the bulk of the powe r. If you have 100,000 state machines controlling 10,000 things in a data path, you likely have incompetently designed state machines. "

One state machine controls the status for a 64 bytes L2 cache line, and 100 ,000 state machines fully control 6M L2 cache status. It does not control d ata path! Their states will be affect how each of L2 cache line behaves.

If you have time have a look at the following 2 patents, at least you can u nderstand what each of those 1000,000 state machines is and and how it work s.

Thank you.

Weng

Reply to
Weng Tianxiang

Hi Kevin,

I thank you for your help many years ago.

It is not correct: "a 'clock enable' signal modifies the data input to a flip flop, not the clock input. 'Clock enable' signals do not modify the clock in any way."

When a CLOCK ENABLE is deasserted, no clock pulse will feed a FF, and the FF will keep unchanged on the next cycle. If a CLOCK ENABLE is asserted, a clock pulse will feed a FF, and the FF will be updated on the next cycle.

Thank you.

Weng

Reply to
Weng Tianxiang

Hi Richard and Kevin,

Here is a copy from Wikipedia "clock gating":

formatting link

Clock gating is a popular technique used in many synchronous circuits for r educing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruning the clock disables port ions of the circuitry so that the flip-flops in them do not have to switch states. Switching states consumes power. When not being switched, the switc hing power consumption goes to zero, and only leakage currents are incurred .[1]

Clock gating works by taking the enable conditions attached to registers, a nd uses them to gate the clocks. A design must contain these enable conditi ons in order to use and benefit from clock gating. This clock gating proces s can also save significant die area as well as power, since it removes lar ge numbers of muxes and replaces them with clock gating logic. This clock g ating logic is generally in the form of "integrated clock gating" (ICG) cel ls. However, the clock gating logic will change the clock tree structure, s ince the clock gating logic will sit in the clock tree.

Clock gating logic can be added into a design in a variety of ways:

Coded into the register transfer level (RTL) code as enable conditions that can be automatically translated into clock gating logic by synthesis tools (fine grain clock gating).

Inserted into the design manually by the RTL designers (typically as module level clock gating) by instantiating library specific integrated clock gat ing (ICG) cells to gate the clocks of specific modules or registers. Semi-automatically inserted into the RTL by automated clock gating tools. T hese tools either insert ICG cells into the RTL, or add enable conditions i nto the RTL code. These typically also offer sequential clock gating optimi sations.

Any RTL modifications to improve clock gating will result in functional cha nges to the design (since the registers will now hold different values) whi ch need to be verified.

Sequential clock gating is the process of extracting/propagating the enable conditions to the upstream/downstream sequential elements, so that additio nal registers can be clock gated.

Although asynchronous circuits by definition do not have a "clock", the ter m perfect clock gating is used to illustrate how various clock gating techn iques are simply approximations of the data-dependent behavior exhibited by asynchronous circuitry. As the granularity on which you gate the clock of a synchronous circuit approaches zero, the power consumption of that circui t approaches that of an asynchronous circuit: the circuit only generates lo gic transitions when it is actively computing.[2]

Chip intended to run on batteries or with very low power such as those used in the mobile phones, wearable devices, etc. would implement several forms of clock gating together. At one end is the manual gating of clocks by sof tware, where a driver enables or disables the various clocks used by a give n idle controller. On the other end is automatic clock gating, where the ha rdware can be told to detect whether there's any work to do, and turn off a given clock if it is not needed. These forms interact with each other and may be part of the same enable tree. For example, an internal bridge or bus might use automatic gating so that it is gated off until the CPU or a DMA engine needs to use it, while several of the peripherals on that bus might be permanently gated off if they are unused on that board.

Weng

Reply to
Weng Tianxiang

I heard long ago that the 'clock enable' signal in Xilinx FPGAs does not affect the clock signal. This is likely to allow sharing clock edge detection, and to minimise the routing to a block of flops with shared clock signal. Patent here [1].

This is from a logic user's guide, with simplified explanation based on the implementation of a single flip-flop, and is not intended to be a circuit description.

Suggestion: 'The hardest thing to know is (the extent of) what we do not know.'

Jan Coombs

--

[1] Clock enable control circuit for flip flops  
United States Patent 6466049 [2002] 
http://www.freepatentsonline.com/6466049.html
Reply to
Jan Coombs

Am Mittwoch, 9. Januar 2019 06:31:03 UTC+1 schrieb Weng Tianxiang:

clock input. 'Clock enable' signals do not modify the clock in any way."

FF will keep unchanged on the next cycle. If a CLOCK ENABLE is asserted, a clock pulse will feed a FF, and the FF will be updated on the next cycle.

In theory a "clock enable" gates the clock line, but in reality it usually switches only the data path to the FF. In most technologies the enable of a FF with Clock enable is synchronous us ed. If you zoom into a typical clock enable-FF you will find the following hard ware implemented. (use fixed font for view)

_________________________ | | | +---+ +-------+ | --| | | | | |MUX|---|D Q|----------- D ----| | | | +---+ | FF | | | | Enable------- | | | | Clock ___________|\ | |/ | +-------+

A clock tree is the tree of buffer (inverter) between clock source and each FF and the gating is often performed on a dedicated branch of the clock tr ee which is no leaf. It is ofc possible and most flexible to gate the clock direct before the FF (and therefore at the end of the leaf) but this has the least power saving effect and the worst impact in resource usage. The best effect is gained when gating as near as possible on to the clock s ource. On the other hand this is not trivial as the clock tree without any clock g ate would connect maybe 8 FF that are functional close together on same lea f of the clock tree but if of these 8 FF only one should be gated than you need to move the gating FF from non gated branch to a gated branch which mi ght connect this FF to some other FF that are pyhsically located further aw ay increasing routing effort and routing delay.

In many cases the power consumption of the clock tree switching with clock gating only on the FF itself is not smaller than the power consumption of t he same tree with synchronous data gating as the FF itself is in both imple mentations keeping its outputs constant when "gated" and the load of the FF located clock gate is same as the load of the FF.

The synchronous enable has from timing point of view a strong advantage vs clock gating and is therefore easier to handle in layout.

regards,

Thomas

Reply to
Thomas Stanka

Xilinx recommends clock gating be fed through bufgce to prevent skew and timing issues (you also gain good fanout of course) if feeding large enough numbers of blocks., Vivado automatcaly moves the gating to the enable path for flip flops or latches (can be manually overriden though I've not done that yet)

As for writing patents based on other peoples patents - this thread confirms the obvious:

To quote Daniel Whitehall: "Discovery requires experimentation" Marvels agents of SHIELD

john

=========================

formatting link
=========================

Reply to
john

ing function, no matter whether or not it is coded to have clock gating dev ice!

Then your invention will optimally use the toggle flip flop as the fundamen tal storage device. There are several flavors of basic flip flops: SR (se t-reset), JK (improved set-reset), T (toggle) and D. The industry has long since settled on using essentially only the D type and presumably has opti mized that one. So to use your invention one would have to either use a no n-optimal flip flop or construct it from the D type, which presumably would be less optimal than if it were a true T type.

If the industry had settled on using only T flip flops then we would all be doing gated clock designs now. But just because it hasn't does not mean t hat the T flip flop and the associated gated clock logic required to use th at flip flop type is not already existing prior art. It is simply prior ar t that is not widely used. A single logic description can be synthesized t o use any of the basic flip flop types inherent in the underlying hardware. So the mapping of some VHDL/Verilog source code to be implemented using T flip flops as storage is not novel.

While nearly every invention is a new novel use that builds on prior art yo ur apparent claim here "all state machines will be synthesized to have cloc k gating function" is nothing more than stating that "all state machines wi ll be synthesized using T flip flops" which is neither new nor novel. The limitation to "all state machines" rather than "all memory storage" is a re striction over what is already existing so that is not novel either.

Kevin Jennings

Reply to
KJ

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.