Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng

Reply to
Weng Tianxiang
Loading thread data ...

Clock gating can be written in any language you like. It's FPGAs that don't support clock gating.

Nicolas

Reply to
Nicolas Matringe

gating function?

erilog has the clock gating function for a state machine.

Hi Nicolas,

I am asking if Verilog or SystemVerilog has the ability to automatically ge nerate a state machine with clock gating function without any extra new sta tements? For example, do they have an attribute if the attribute being set the state machine generated will have the clock gating function?

At least VHDL-2008 does not have the ability.

Thank you.

Weng

Reply to
Weng Tianxiang

Well then I don't know what that "clock gating function" is, I'm sorry.

Nicolas

Reply to
Nicolas Matringe

Apparently you cannot, but yes it can be done by others. It can also be written in VHDL but apparently you don't like how to do that so you state that it can't be done. Perhaps you should more clearly state your problem.

Kevin

Reply to
KJ

What do you mean 'extra new statements'? This looks to me like clock gating:

input clk; input enable; wire gated;

assign gated = clk & enable;

always @(posedge gated) begin ... end

I don't know what you mean by that. (System)Verilog's abstraction doesn't generate abstract state machines, it just allows you to write them. Whatever synthesis tools do with that code is up to them. I presume tools could pick up the above style if they so desire (I don't know if any ASIC tools do but expect they would).

Theo

Reply to
Theo

One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that you need to worry about clock skew when you do this, as well as glitches (the second version works better for gate changing on the rising edge of clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on the gate signal, something like

always @(posedge clk) begin if(gate) begin ... state machine here. end end

This make the machine run on the original clock, but it will only change on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it. If doing the first version, of actually gating the clock, you may want to use some implementation defined macro function to buffer the clock and put it into a low skew distribution network, like may have been done for the original clock.

Reply to
Richard Damon

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption:

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

type STATE_TYPE is (s0, s1, ..., Sn);

signal WState, WState_NS: STATE_TYPE;

...; a: process(clk) begin if rising_edge(clk) then if SINI then WState

Reply to
Weng Tianxiang

One issue with gated clocks is that each gating of the clock needs to be considered a different clock domain from every other gating of the clock and from the ungated clock, because the gating (and rebuffering) of the clock introduces a delay in the clock, so you need to take precautions when the signal passes from one domain to another. A FPGA might have, and a gate array may provide a special circuit to generate a set of gated clocks that will be kept in good enough alignment to not need this, but then that would be a special application macro that needs to be instanced.

Second, the power consumption between my first and second method (actual gating of the clock and using a clock enable) is primarily in the power to drive the clock line as the clock enable also keeps the state the same in the 'skipped' clock cycle.

Reply to
Richard Damon

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:

  1. Generate CE logic.
  2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1).

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function.

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng

Reply to
Weng Tianxiang

Actually gating the clock is a single gate (but then in an ASIC it can't drive much logic, so things start to get more complicated). Making it work gets things much more complicated, and probably gets you out of the domain of portable Verilog or VHDL. That is the nature of clock trees.

Thus, step one is in a sense trivial if you are ignoring step two, but doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock enable functionality to the flip flops or gate the clock and deal with all the timing/buffering issues, and it wouldn't surprise me if it turned out that which is better very much depends on the process and other criteria.

The only real answer would be to talk to the process people, but my guess is that the answer is very much proprietary, and unless it looks like you are willing and planning on spending the big bucks to actually do this, won't waste their time talking about it.

Reply to
Richard Damon

:

ock gating function?

temVerilog has the clock gating function for a state machine.

at

ches

e of

on

ange

g

ut

the

he question is:

has 64 (2**6) bytes and each cache line must have a state machine to keep d ata coherence among data over all situations.

ent market) a CPU must have at least (2**16 + 2**15) state machines, ~= 1

00,000, and those ~100,000 state machines don't change states most of time.

ng more than 10 states must have a clock gating function to save power cons umption:

d not be generated to keep the state unchanged and save power consumption.

ion may not be necessary because too few state machines are implemented in any normal application.

be

ck

e

al

r

on:

an skipping the clock pulse.

ually has clock gating function.

00 state machine implementation actually has clock gating function.

Sure, in full custom ASICs it is not uncommon to gate the clock. In fast c hips the clock tree design can consume half the dynamic power in the chip. So gating the clock can bring significant power savings. However, the clo ck gating being described here is over far too small a portion of the chip to be effective on many levels if I understand what is going on. The OP is talking about 100,000 identical state machines, one for each cache item. I believe what he is talking about as FSMs are really just a handful of FFs but I'm not sure. If so, the clock gating logic is nearly as large and so would consume nearly as much power and area as the logic it is controlling .

Will it be practical to design 100,000 clock gating circuits to control 100 ,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe it would be practical to combine the clock gating to many of the 100,000 FSMs so they are shut off in large blocks? I don't know, but the OP seems preoc cupied with the idea of this being a language feature rather than a design feature added by the user. I'm sure he wants to produce an idea using a li brary or something that he can patent. That seems to be his MO. Oh well.. .

Rick C.

- Get 6 months of free supercharging - Tesla referral code -

formatting link

Reply to
gnuarm.deletethisbit

:

te:

clock gating function?

ystemVerilog has the clock gating function for a state machine.

that

itches

dge of

n on

change

ing

se

put

or the

the question is:

y has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations.

rrent market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of tim e.

ving more than 10 states must have a clock gating function to save power co nsumption:

uld not be generated to keep the state unchanged and save power consumption .

ction may not be necessary because too few state machines are implemented i n any normal application.

o be

lock

the

ns

,

to

tual

wer

tion:

than skipping the clock pulse.

ctually has clock gating function.

,000 state machine implementation actually has clock gating function.

t
e

chips the clock tree design can consume half the dynamic power in the chip . So gating the clock can bring significant power savings. However, the c lock gating being described here is over far too small a portion of the chi p to be effective on many levels if I understand what is going on. The OP is talking about 100,000 identical state machines, one for each cache item. I believe what he is talking about as FSMs are really just a handful of F Fs but I'm not sure. If so, the clock gating logic is nearly as large and so would consume nearly as much power and area as the logic it is controlli ng.

00,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe i t would be practical to combine the clock gating to many of the 100,000 FSM s so they are shut off in large blocks? I don't know, but the OP seems pre occupied with the idea of this being a language feature rather than a desig n feature added by the user. I'm sure he wants to produce an idea using a library or something that he can patent. That seems to be his MO. Oh well ...

Hi Rick, You misunderstand and ~100,000 state machines are even coded as the same bu t with different input signals and output signals, act differently and you cannot "combine the clock gating to many of the 100,000 FSMs". Each has mor e than 10 states, so each state machine must have 4 registers to implement and each has its clock gating logic and clock gating device.

Weng

Reply to
Weng Tianxiang

If you are really talking gating for 4 FFs, than my guess is that using Clock Enabled ffs would be much simpler and probably better than trying to gate the clock and keeping things synchronized.

The big issue would be that to make the gated clocking work you may need double the clock distribution tree, one for an 'early' clock that is to be gated, and a second 'late' clock that ungated parts of the system used that will line up with the gated clocks. This need for the second clock distribution tree probably eats up more power than you are saving by stopping the clock to those flip flops.

The primary alternative to two clocks would be running on opposite edges (so skew isn't as much of a problem), but that then limits the speed the system can run at.

Reply to
Richard Damon

I want to use my method in all types of circuits. A clock gating device is basically a latch. A FF with a clock enable input is a FF having a latch. Thank you.

Reply to
Weng Tianxiang

Am Samstag, 5. Januar 2019 05:30:08 UTC+1 schrieb Weng Tianxiang:

ting function?

ilog has the clock gating function for a state machine.

All languages support clock gating when explicit expressed and no language has an implicit statement for it. This is as the clock is not really anything special in the language [1] and clock gating has several side effects that needs to be dealed with during layout. But in many cases you need to deal with some implications of clock gating during architectural design phase when writing the code.

[1] rising_edge(enable) or rising_edge(clock) have no difference for the la nguage but very different results when using synthesis tools

bye Thomas

Reply to
Thomas Stanka

ore than 10 states must have a clock gating function to save power consumpt ion:

That is your unsubstantiated claim, not a fact.

t be generated to keep the state unchanged and save power consumption.

Any perceived lower power consumption has very, very little to do with the fact that the state does not change. A flip flop that is clocked but does not happen to change its output does not consume much power. The power is needed to charge/discharge the loads that are being driven. Any decreased power consumption would have to do with the decrease in power in generating the clock input to the flip flop. But shifting from a common clock to add ing a gate that generates a clock probably does not lower power since the s ame number of clock signals are being generated. If the gated clock routin g is a higher capacitive route then when using a free-running clock then yo u can consume more power. This is the result when trying to implement gate d clocks in FPGA. ASIC will be different.

may not be necessary because too few state machines are implemented in any normal application.

As I pointed out to you back in 2010 (I think), implementing what you descr ibe in an FPGA results in an increase in power consumption. I provided you with all of the details for your sample design. The results of that analy sis are not "because too few state machines are implemented", it is because gated clocks in FPGA use more power, not less. Again, that was with your sample design of that time which appears to be the same thing you are reusi ng here.

as follows after the post is posted:

I noticed that you did not show the actual gating of the clock, only the ap parent usage of a possibly free running clock.

Also, the following 'elsif' is not necessary even though your comment says it is. No worries though, synthesis tools should optimize out the 'elsif' and leave the assignment 'WState elsif WState /= WState_NS then -- WState /= WState_NS is neces sary!

I suspect that you did not actually test any of this prior to posting and c laiming since the code is not complete and does not compile...as usual.

Kevin

Reply to
KJ

more than 10 states must have a clock gating function to save power consum ption:

not be generated to keep the state unchanged and save power consumption.

e fact that the state does not change. A flip flop that is clocked but doe s not happen to change its output does not consume much power. The power i s needed to charge/discharge the loads that are being driven. Any decrease d power consumption would have to do with the decrease in power in generati ng the clock input to the flip flop. But shifting from a common clock to a dding a gate that generates a clock probably does not lower power since the same number of clock signals are being generated. If the gated clock rout ing is a higher capacitive route then when using a free-running clock then you can consume more power. This is the result when trying to implement ga ted clocks in FPGA. ASIC will be different.

n may not be necessary because too few state machines are implemented in an y normal application.

cribe in an FPGA results in an increase in power consumption. I provided y ou with all of the details for your sample design. The results of that ana lysis are not "because too few state machines are implemented", it is becau se gated clocks in FPGA use more power, not less. Again, that was with you r sample design of that time which appears to be the same thing you are reu sing here.

DL as follows after the post is posted:

apparent usage of a possibly free running clock.

s it is. No worries though, synthesis tools should optimize out the 'elsif ' and leave the assignment 'WState > elsif WState /= WState_NS then -- WState /= WState_NS is nec essary!

claiming since the code is not complete and does not compile...as usual.

Hi,

There are several experts responding to my post. Thank you. Noticeably I do not find Hans of

formatting link
giving his opinion. Usually his opinion is reasonable and informative and he knows many things outside the FPGA chips beyond my knowledge.

Here is the background for the purpose of my post:

  1. On 12/31/2018 I filed a non-provisional patent application. I asked for earlier publication. The publication will happen about 14 weeks later since its filing date.

  1. On 01/06/2019 I sent it in almost the same version as a regular paper to IEEE Transaction of circuits and System for publication. The review proces s may take up to 3 months.

Because IEEE Transaction strict restriction on the paper's originality, I c annot disclose any details about my invention until the transaction agrees to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.

Here are some facts of my invention:

  1. The logic used to generate a state machine with clock gating devices is almost the same as conventional method would generate, or maybe even simple r than conventional method.

  1. I don't know how CPU deals with its 100,000*4 FFs clocking scheme used i n state machines for the Cache II control. If they don't care about the pow er saving or they have implemented some scheme in the implementation, my in vention would be of few values, or otherwise it would be worth million of d ollars.

  2. My post's purpose is to test if such invention is of any value, not abou t how to implement a state machine with clock gating function.

  1. After my application is published 3 months later I will immediately regi ster and sell the application at
    formatting link
    o-ast/. I know the website because Google refers to the website and indicat es they are a member of the site. I expect that Intel, IBM, AMD, Apple may also be the members of the website. The site asks for the selling price dur ing registration. So it is important for me to assess my invention's value properly.

  2. I think no developing persons at Intel, IBM, AMD, Apple would visit this website, not mention taking part in the discussion of my post.

  1. I hope I will discuss the invention in more details 3 months later befor e my registrations in the patent selling website.

  2. Xilinx chip has clock enable signal built into its cell block, one CE in put for 8 registers in the block. Altera may be in the same situation. So c lock enable is never a new thing and we don't have to pay attention to how the clock trees work. For a CPU design, in my opinion, logic design and clo ck tree design are 2 separated domains one after another, and logic designe rs never have to pay attention to the clock trees.

Thank you.

Weng

Reply to
Weng Tianxiang

te:

rote:

h clock gating function?

SystemVerilog has the clock gating function for a state machine.

s that

glitches

edge of

ion on

y change

doing

use

nd put

for the

sk the question is:

lly has 64 (2**6) bytes and each cache line must have a state machine to ke ep data coherence among data over all situations.

current market) a CPU must have at least (2**16 + 2**15) state machines, ~ = 100,000, and those ~100,000 state machines don't change states most of time.

having more than 10 states must have a clock gating function to save power consumption:

hould not be generated to keep the state unchanged and save power consumpti on.

unction may not be necessary because too few state machines are implemented in any normal application.

to be

clock

f the

ions

ve,

f
d

s to

actual

power

he

nction:

r than skipping the clock pulse.

actually has clock gating function.

00,000 state machine implementation actually has clock gating function.

n.

n't

the

.
t
h
s

ly

st chips the clock tree design can consume half the dynamic power in the ch ip. So gating the clock can bring significant power savings. However, the clock gating being described here is over far too small a portion of the c hip to be effective on many levels if I understand what is going on. The O P is talking about 100,000 identical state machines, one for each cache ite m. I believe what he is talking about as FSMs are really just a handful of FFs but I'm not sure. If so, the clock gating logic is nearly as large an d so would consume nearly as much power and area as the logic it is control ling.

100,000 tiny FSMs? Maybe I am wrong about the size of the FSMs. Or maybe it would be practical to combine the clock gating to many of the 100,000 F SMs so they are shut off in large blocks? I don't know, but the OP seems p reoccupied with the idea of this being a language feature rather than a des ign feature added by the user. I'm sure he wants to produce an idea using a library or something that he can patent. That seems to be his MO. Oh we ll...

but with different input signals and output signals, act differently and yo u cannot "combine the clock gating to many of the 100,000 FSMs". Each has m ore than 10 states, so each state machine must have 4 registers to implemen t and each has its clock gating logic and clock gating device.

So you know what the clock gating circuity would look like? Try comparing that circuit to the FSM circuit. You will see they are comparable in size and the gating circuit adds to the timing delay as well.

Please keep in mind that the 4 FFs in a single FSM can be lumped together w ith the 4 FFs from another FSM in your analysis to consider them to be a si ngle FSM for the purposes of clock gating. When any one FSM is active you can make the entire circuit active. This still retains the clock power sav ings for all the remaining 99,998 FSMs not in that circuit.

I'm not sure this will provide much in the way of logic savings. But I am confident no one is going to want to implement clock gating circuits for ea ch 100,000 FSMs independently. But then it seems they are scrounging aroun d for ways to improve power consumption of CPUs these days and there are lo ts of transistors available. I'm also a guy who thought cell phones would not be widely accepted. lol

Rick C.

  • Get 6 months of free supercharging + Tesla referral code -
    formatting link
Reply to
gnuarm.deletethisbit

If 2 state machines as you suggested may be active on the same clock, how do you handle it using your scheme?

Reply to
Weng Tianxiang

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.