Estimating ROM gate count in ASIC

- K
- Kevin Neilson
  
  Contact options for registered users
posted
5 years ago

Thu, Dec 6, 2018 11:02 PM

I've searched for this but to no avail. I'd like a function f(D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

Yes, I know it's dependent on the contents. However, if half the bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

It's easy for me to figure out an upper limit. A basic ROM is an AND-OR ar ray. The D address decoders (comprising ANDs/NOTs) can be shared amongst t he W columns. Each of the W columns would require D/2-1 OR gates if half t he ROM bits in each column are 1.

What I don't know is how many gates can be eliminated by sharing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula

*.

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Thu, Dec 6, 2018 11:05 PM

e D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

e ones and the contents are randomly distributed, a formula should be prett y accurate.

array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

As W increases, term sharing should go up. Again, I'm looking for a *formu la*.

I meant each column would require D/4-1 ORs. Approximately.

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Thu, Dec 6, 2018 11:16 PM

ere D=depth and W=width, which provides an estimate of the gate count o f a lookup ROM implemented in ASIC gates.

are ones and the contents are randomly distributed, a formula should be pre tty accurate.

R array. The D address decoders (comprising ANDs/NOTs) can be shared among st the W columns. Each of the W columns would require D/2-1 OR gates if ha lf the ROM bits in each column are 1.

As W increases, term sharing should go up. Again, I'm looking for a *for mula*.

I was right the first time. D/2-1

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Tue, Dec 18, 2018 4:41 PM

where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

s are ones and the contents are randomly distributed, a formula should be p retty accurate.

Not so easy as you think as the content of the ROM has a very strong influe nce on the result.

Assume a simple ROM content with 50% '0' and 50% '1'

Bit 3210

------------- Adr 000: 0000 Adr 001: 1010 Adr 010: 1100 Adr 011: 1110 Adr 100: 1000 Adr 101: 1010 Adr 110: 1100 Adr 111: 1111 This is RomData(0)= Adr(0) AND Adr(1) AND Adr(3) RomData(1)= Adr(0) RomData(2)= Adr(1) RomData(3)= Adr(0) OR Adr(1) OR ADR(3)

Now you can very easy scramble the content slightly to be no longer able to build that simple terms.

In worst case you end up with a full DNF for each Bit.

Each bit of datawidth would in worst case need a term in DNF with a number of clauses equal to the number of bit ='1' and each clause in DNF having addresswidth number of variables.

For a ROM of 10 bit address with equal distributed '1' and '0' this means 5

12 clauses of 10 variables in DNF is your upper limit per databit.

This would be 512 AND gates with 10 inputs and one OR gate with 512 inputs.

As no ASCI technology has an OR gate with 512 inputs or an AND gate with 10 inputs, you need to build this using a gate tree => 5 AND3 per clause an d ~300 OR3 for the or-tree. This means something like 2.9k Gates with three inputs per bit instead of your assumed Depth/2-1 gates. As you see it might still be possible to have a regularity in the formula t hat can reduce this to a simple gate per data bit, but in general random RO M data tend to have only low possibilities of term reduction and for larger depth of RAM you cannot see on first glance if any reduction is possible a t all.

bye Thomas

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Fri, Dec 21, 2018 4:35 AM

Maybe I should have specified that depth D is large. A formula will not be accurate for tiny ROMs but should be as D increases.

- G
- gnuarm.deletethisbit
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sat, Dec 22, 2018 7:14 PM

e D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

e ones and the contents are randomly distributed, a formula should be prett y accurate.

array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

As W increases, term sharing should go up. Again, I'm looking for a *formu la*.

That would be pretty easy. Consider the costs of a D wide multiplexer with 1 or 0 on each input. That would be an upper bound in any case.

I believe my text book of many years ago used one of the input variables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Rick C.

Tesla referral code -

formatting link

Get 6 months of free supercharging

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Fri, Dec 28, 2018 7:49 PM

ere D=depth and W=width, which provides an estimate of the gate count o f a lookup ROM implemented in ASIC gates.

are ones and the contents are randomly distributed, a formula should be pre tty accurate.

R array. The D address decoders (comprising ANDs/NOTs) can be shared among st the W columns. Each of the W columns would require D/2-1 OR gates if ha lf the ROM bits in each column are 1.

As W increases, term sharing should go up. Again, I'm looking for a *for mula*.

th 1 or 0 on each input. That would be an upper bound in any case.

in either true or inverted form combined with 1s and 0s as choices for inpu ts which simplified the mux by one address input.

Thanks, but I am looking for an accurate estimate.

- G
- gnuarm.deletethisbit
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sat, Dec 29, 2018 4:53 PM

where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

s are ones and the contents are randomly distributed, a formula should be p retty accurate.

-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amo ngst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

s. As W increases, term sharing should go up. Again, I'm looking for a *f ormula*.

with 1 or 0 on each input. That would be an upper bound in any case.

s in either true or inverted form combined with 1s and 0s as choices for in puts which simplified the mux by one address input.

I'm confused. Do you want an accurate measurement or an estimate?

Rick C.

- Get 6 months of free supercharging - Tesla referral code -

formatting link

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sun, Dec 30, 2018 7:52 PM

e:

, where D=depth and W=width, which provides an estimate of the gate cou nt of a lookup ROM implemented in ASIC gates.

its are ones and the contents are randomly distributed, a formula should be pretty accurate.

ND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared a mongst the W columns. Each of the W columns would require D/2-1 OR gates i f half the ROM bits in each column are 1.

rms. As W increases, term sharing should go up. Again, I'm looking for a

*formula*.

r with 1 or 0 on each input. That would be an upper bound in any case.

les in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

Both. An estimate will be very close for large D,W. If I roll a die 1e6 t imes, estimating there will be 5e5 heads is pretty accurate. Estimating th ere will be less than or equal to the upper bound of 1e6 heads is correct b ut not helpful.

- G
- gnuarm.deletethisbit
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Sun, Dec 30, 2018 9:37 PM

ote:

W), where D=depth and W=width, which provides an estimate of the gate c ount of a lookup ROM implemented in ASIC gates.

bits are ones and the contents are randomly distributed, a formula should be pretty accurate.

AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shared amongst the W columns. Each of the W columns would require D/2-1 OR gates if half the ROM bits in each column are 1.

terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

xer with 1 or 0 on each input. That would be an upper bound in any case.

ables in either true or inverted form combined with 1s and 0s as choices fo r inputs which simplified the mux by one address input.

times, estimating there will be 5e5 heads is pretty accurate. Estimating there will be less than or equal to the upper bound of 1e6 heads is correct but not helpful.

Good thing you aren't rolling a die.

How much do you think this upper bound will vary from your estimate? Have you tried any tests? What is the result of your "estimate" under-estimatin g?

I think if I were doing this I'd try to get a handle on the expected result s and how much they might vary before I start looking for an equation to "e stimate" the number. The one thing you didn't do with the die example is t o figure out how much you need to adjust your estimate to get a bound that will include 99.9xxx% of your cases or whatever value you need. Do you kno w that equation?

Rick C.

Get 6 months of free supercharging + Tesla referral code -
formatting link

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Tue, Jan 1, 2019 11:55 PM

:

D,W), where D=depth and W=width, which provides an estimate of the gate count of a lookup ROM implemented in ASIC gates.

he bits are ones and the contents are randomly distributed, a formula shoul d be pretty accurate.

an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be shar ed amongst the W columns. Each of the W columns would require D/2-1 OR gat es if half the ROM bits in each column are 1.

g terms. As W increases, term sharing should go up. Again, I'm looking fo r a *formula*.

lexer with 1 or 0 on each input. That would be an upper bound in any case.

riables in either true or inverted form combined with 1s and 0s as choices for inputs which simplified the mux by one address input.

e6 times, estimating there will be 5e5 heads is pretty accurate. Estimatin g there will be less than or equal to the upper bound of 1e6 heads is corre ct but not helpful.

e you tried any tests? What is the result of your "estimate" under-estimat ing?

lts and how much they might vary before I start looking for an equation to "estimate" the number. The one thing you didn't do with the die example is to figure out how much you need to adjust your estimate to get a bound tha t will include 99.9xxx% of your cases or whatever value you need. Do you k now that equation?

Sorry, my response could've been more nicely expressed. I originally thoug h that ASIC guys must have some formula for this but perhaps not--maybe the y just run it through the synthesizer and check. I'm writing ASIC code but don't have direct access to the synthesizer. If I did I could maybe plot a few points and fit a curve to it. I only have access to FPGA synthesizer s and of course they just implement ROMs in LUTs and the LUT count is propo rtional to the number of bits in the ROM and there is no logic sharing as t he ROM size increases.

I did think about the die/coin example and how to find how good the estimat e is. For example, say you flip a coin 1000 times. My expected number of heads is is 500. Say you want to know how many runs will have between 450 and 550 heads. You can use the Poisson CDF. In Matlab/Octave:

octave:32> n_flips=1000; exp_heads = n_flips*0.5; poisscdf(550,exp_he ads)-poisscdf(450,exp_heads) ans = 0.97469

So I'll be in that range 97.5% of the time.

- G
- gnuarm.deletethisbit
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Wed, Jan 2, 2019 2:48 AM

te:

f(D,W), where D=depth and W=width, which provides an estimate of the ga te count of a lookup ROM implemented in ASIC gates.

the bits are ones and the contents are randomly distributed, a formula sho uld be pretty accurate.

s an AND-OR array. The D address decoders (comprising ANDs/NOTs) can be sh ared amongst the W columns. Each of the W columns would require D/2-1 OR g ates if half the ROM bits in each column are 1.

ing terms. As W increases, term sharing should go up. Again, I'm looking for a *formula*.

iplexer with 1 or 0 on each input. That would be an upper bound in any cas e.

variables in either true or inverted form combined with 1s and 0s as choice s for inputs which simplified the mux by one address input.

1e6 times, estimating there will be 5e5 heads is pretty accurate. Estimat ing there will be less than or equal to the upper bound of 1e6 heads is cor rect but not helpful.

ave you tried any tests? What is the result of your "estimate" under-estim ating?

sults and how much they might vary before I start looking for an equation t o "estimate" the number. The one thing you didn't do with the die example is to figure out how much you need to adjust your estimate to get a bound t hat will include 99.9xxx% of your cases or whatever value you need. Do you know that equation?

ugh that ASIC guys must have some formula for this but perhaps not--maybe t hey just run it through the synthesizer and check. I'm writing ASIC code b ut don't have direct access to the synthesizer. If I did I could maybe plo t a few points and fit a curve to it. I only have access to FPGA synthesiz ers and of course they just implement ROMs in LUTs and the LUT count is pro portional to the number of bits in the ROM and there is no logic sharing as the ROM size increases.

ate is. For example, say you flip a coin 1000 times. My expected number o f heads is is 500. Say you want to know how many runs will have between 45

0 and 550 heads. You can use the Poisson CDF. In Matlab/Octave:

heads)-poisscdf(450,exp_heads)

My point is that unless you can come up with a number like this for your es timate, how much good will it do you? Even if you are right 99% of the tim e, when designing a chip is that of much value?

I would think knowing the upper bound is a very useful thing indeed when pl anning an ASIC. But I don't have any more info on calculating the estimate you say you need, so I can't help you.

Rick C.

-- Get 6 months of free supercharging -- Tesla referral code -

formatting link

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
5 years ago

Mon, Jan 7, 2019 7:38 AM

Am Mittwoch, 2. Januar 2019 00:55:57 UTC+1 schrieb Kevin Neilson:

I think you did not understand my posting. There is no way of estimating th e synthesis result of a ROM right by simple formula. Unless you build a model of the synthesis tool itself with this formula.

A change of 1 dataword in a ROM with 1024 words could significant change th e synthesis result by more than 1%.

There is a simple upper bound and a simple lower bound but real ROMs are al ways in between. Simple upper bound is calculated by building DNF for ROM in given technolog y and lower bound is 1 gate per bit (tie0 or tie1) for stand alone synthesi s.

Take for example DES encryption algorithm S-Box. These are lookup tables (R OM) designed to be not easy reduced by synthesis tools. You could synthesis one of this S-Box 10 times with different seeds and would not get 2 identi cal results for the same S-Box.

If you have no access to ASIC synthesis tool than download free FPGA synthe sis tool and test the results for FPGA synthesis. This gives at least a fee ling how synthesis tools deal with your ROM.

bye Thomas