#### Do you have a question? Post it now! No Registration Necessary

**posted on**

March 30, 2020, 7:11 pm

I'm not building anything, I'm just curious. I worked on a machine in the

80's called an attached array processor capable of 100 MFLOPS, the ST-100.

It could do a 1K complex FFT in 0.86 msec. What sort of device would be r

equired to do this today? A desktop would far exceed this performance. Lo

w end single chip processors like an ARM CM3 would probably not reach this

performance level. But I bet the processor in an rPi would exceed it.

Can anyone nail this down a bit more? Are there ARM CM4 processors capable

of a sub ms 1K complex FFT in 32 bit floating point? Looking at the Cypre

ss PSoC 61 is says "with single-cycle multiply (Floating Point and Memory P

rotection Unit)" but I think those are three separate things, single cycle

integer multiply, floating point and MPU.

Anyone know how fast a CPU like this can crank a 1k FFT?

Of course the rack cabinet machine I'm talking about was fully capable of m

oving data around to keep up with the FFT so the computations were the bott

le neck. I expect in many applications in MCUs the FFT time is not the bot

tle neck, at least not at the 1K size, rather getting the data and moving i

t around is nearly as slow and the rest of the application is running on th

e same CPU. The ST-100 was the equivalent of a floating point coprocessor

so it only handled the math.

80's called an attached array processor capable of 100 MFLOPS, the ST-100.

It could do a 1K complex FFT in 0.86 msec. What sort of device would be r

equired to do this today? A desktop would far exceed this performance. Lo

w end single chip processors like an ARM CM3 would probably not reach this

performance level. But I bet the processor in an rPi would exceed it.

Can anyone nail this down a bit more? Are there ARM CM4 processors capable

of a sub ms 1K complex FFT in 32 bit floating point? Looking at the Cypre

ss PSoC 61 is says "with single-cycle multiply (Floating Point and Memory P

rotection Unit)" but I think those are three separate things, single cycle

integer multiply, floating point and MPU.

Anyone know how fast a CPU like this can crank a 1k FFT?

Of course the rack cabinet machine I'm talking about was fully capable of m

oving data around to keep up with the FFT so the computations were the bott

le neck. I expect in many applications in MCUs the FFT time is not the bot

tle neck, at least not at the 1K size, rather getting the data and moving i

t around is nearly as slow and the rest of the application is running on th

e same CPU. The ST-100 was the equivalent of a floating point coprocessor

so it only handled the math.

--

Rick C.

- Get 1,000 miles of free Supercharging

Rick C.

- Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

A beefed-up microcontroller should be enough:

http://www.ti.com/lit/an/spry288a/spry288a.pdf

C28x family runs up to 200 MHz and does 1k complex FFT in 53219 cycles,

i.e. 0.27 ms. Of course there are caveats, you must run everything in RAM

and very tight.

Some models are also dual core.

--

Fletto i muscoli e sono nel vuoto.

Fletto i muscoli e sono nel vuoto.

Re: FFT Speeds

I think you mean N log N. Usually we think of complex multiplication as

4 real multiplications plus 2 additions. I wonder if (on machines with

1 cycle MAC) that makes complex FFT 4x slower than real FFT, or if they

have some interesting optimizations, parallelism, etc.

Re: FFT Speeds

On 1/4/20 12:41 pm, Paul Rubin wrote:

Ahh yes, sorry. Mis-translated by omitting "per sample" :)

Memory/cache behaviour is often the limiting factor. Bailey's approach

to ordering the computation can radically assist though:

Ultimately I want an RPi-like module with a GPU that supports CUDA...

for a lower price than the Jetson Nano :). It's a shame to have to jump

to an FPGA or funky DSP chip when a GPU is perfect for the job.

Clifford Heath

Ahh yes, sorry. Mis-translated by omitting "per sample" :)

Memory/cache behaviour is often the limiting factor. Bailey's approach

to ordering the computation can radically assist though:

*<https://www.davidhbailey.com/dhbpapers/fftq.pdf*Ultimately I want an RPi-like module with a GPU that supports CUDA...

for a lower price than the Jetson Nano :). It's a shame to have to jump

to an FPGA or funky DSP chip when a GPU is perfect for the job.

Clifford Heath

Re: FFT Speeds

On Tuesday, March 31, 2020 at 10:40:48 PM UTC-4, Clifford Heath wrote:

I assume CUDA is a development language for GPUs? The rPi GPU has a not fu

lly documented GPU so it may or may not support CUDA. But an FPGA is easy

enough to find on a daughtercard for the rPi. Or are you saying you just d

on't want to mess with an FPGA at all and would rather use software on a GP

U?

Personally I much prefer coding FPGAs in HDL than mucking about with the co

mplexities of trying to get a sequentially executing CPU to appear to be pr

ocessing in parallel. A GPU might be a bit fun though. I've worked with T

ransputers and programmed an attached array processor (4 ALUs in parallel).

I expect a GPU would be a step up from either of those.

Too bad the rPi GPU is not accessible. That could make for an interesting

target.

BTW, thanks to those who pointed out FFT benchmarks. Looks like a single c

hip CPU does come pretty close to two rack cabinets from 30+ years ago. It

doesn't warm the room as well though.

I assume CUDA is a development language for GPUs? The rPi GPU has a not fu

lly documented GPU so it may or may not support CUDA. But an FPGA is easy

enough to find on a daughtercard for the rPi. Or are you saying you just d

on't want to mess with an FPGA at all and would rather use software on a GP

U?

Personally I much prefer coding FPGAs in HDL than mucking about with the co

mplexities of trying to get a sequentially executing CPU to appear to be pr

ocessing in parallel. A GPU might be a bit fun though. I've worked with T

ransputers and programmed an attached array processor (4 ALUs in parallel).

I expect a GPU would be a step up from either of those.

Too bad the rPi GPU is not accessible. That could make for an interesting

target.

BTW, thanks to those who pointed out FFT benchmarks. Looks like a single c

hip CPU does come pretty close to two rack cabinets from 30+ years ago. It

doesn't warm the room as well though.

--

Rick C.

+ Get 1,000 miles of free Supercharging

Rick C.

+ Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

Yes.

Correct. Same for ODroid.

Yes, that. I want the array processor already defined for me, I just

want to tell it what to do.

I would too, if I had already learnt to do it.

Yes, exactly.

In 1986 I was at HP, and we took delivery of one of their 3D Turbo SRX

graphics rendering boxes... with hardware radiosity for rendering

surfaces, etc... minimally configured, as we only needed it for

configuration testing. But it was my desktop computer... That thing

would still dim the lights when you turned it on!

The performance was dwarfed by single-chip PC graphics cards in less

than ten years.

Clifford Heath

Re: FFT Speeds

On 01/04/2020 04:53, Rick C wrote:

Probably a dumb question, I'm no FPGA expert, but for a fixed-point FFT,

could one use multiple shift and add multipliers instead of one fast

hardware multiplier? Roughly how many shift-add multipliers could you

fit in the same space as a fast multiplier?

I use a 128 point FFT for a comms system, and a dsPIC handles this

quickly enough with it's two hardware multipliers and ancillary logic,

but I've wondered if an FPGA with lots of parallel shift-add multipliers

might be quicker.

In my application, I can't use FPGAs with inbuilt processors or multipliers.

*<snipped>*Probably a dumb question, I'm no FPGA expert, but for a fixed-point FFT,

could one use multiple shift and add multipliers instead of one fast

hardware multiplier? Roughly how many shift-add multipliers could you

fit in the same space as a fast multiplier?

I use a 128 point FFT for a comms system, and a dsPIC handles this

quickly enough with it's two hardware multipliers and ancillary logic,

but I've wondered if an FPGA with lots of parallel shift-add multipliers

might be quicker.

In my application, I can't use FPGAs with inbuilt processors or multipliers.

--

Cheers

Clive

Cheers

Clive

Re: FFT Speeds

On Wednesday, April 1, 2020 at 12:32:27 PM UTC-4, Clive Arthur wrote:

e complexities of trying to get a sequentially executing CPU to appear to b

e processing in parallel.

rs.

Ok, sometimes I don't completely follow what people have posted. By shift

and add multiplier I assume you mean a single adder that is used sequential

ly to generate a multiply in N clocks?

I can't think why you would use such a multiplier if you need many of them

to do the job. I'm not an expert on multiplier design, but I believe a mod

ified Booth's algorithm is not only faster but more compact. The logic is

a bit involved but someone explained to me how it uses fewer computations,

but I completely forget the details.

It's almost certain that multiple multipliers of any type could be faster..

. for some situations. A 128 point (complex/real?, xx bit?, fixed/float) F

FT is not very large, so not a huge amount of computations. If you split t

he work you need to consider the time to offload the data and return the re

sults. A friend many years ago told me about evaluating an attached array

processor on his PDP 11/70 in grad school. The calcs were fast, but the I/

O speed was slow enough it was faster on the CPU.

I expect this is real time work so you might want to stream the data into t

he FPGA and do more processing there. I would be looking at the problem as

by default work being done on the FPGA and offloading to a CPU the parts t

hat are harder to do in an FPGA like network connectivity, etc. Many of my

designs have no CPU.

Take a look at the iCE5LP series from Lattice. 2 or 4 DSP blocks with 1100

, 2048 or 3520 logic cells (LUT4s + FF) and 8 or 16 kB block RAM in a 0.5 m

m pitch, 7 x 7 mm, 48 pin QFN. A nice part for many small designs.

Can you explain why you can't use the built in multipliers? I get not want

ing to use a CPU, but a multiplier is a multiplier whether you construct it

or it's already in the FPGA.

e complexities of trying to get a sequentially executing CPU to appear to b

e processing in parallel.

rs.

Ok, sometimes I don't completely follow what people have posted. By shift

and add multiplier I assume you mean a single adder that is used sequential

ly to generate a multiply in N clocks?

I can't think why you would use such a multiplier if you need many of them

to do the job. I'm not an expert on multiplier design, but I believe a mod

ified Booth's algorithm is not only faster but more compact. The logic is

a bit involved but someone explained to me how it uses fewer computations,

but I completely forget the details.

It's almost certain that multiple multipliers of any type could be faster..

. for some situations. A 128 point (complex/real?, xx bit?, fixed/float) F

FT is not very large, so not a huge amount of computations. If you split t

he work you need to consider the time to offload the data and return the re

sults. A friend many years ago told me about evaluating an attached array

processor on his PDP 11/70 in grad school. The calcs were fast, but the I/

O speed was slow enough it was faster on the CPU.

I expect this is real time work so you might want to stream the data into t

he FPGA and do more processing there. I would be looking at the problem as

by default work being done on the FPGA and offloading to a CPU the parts t

hat are harder to do in an FPGA like network connectivity, etc. Many of my

designs have no CPU.

Take a look at the iCE5LP series from Lattice. 2 or 4 DSP blocks with 1100

, 2048 or 3520 logic cells (LUT4s + FF) and 8 or 16 kB block RAM in a 0.5 m

m pitch, 7 x 7 mm, 48 pin QFN. A nice part for many small designs.

Can you explain why you can't use the built in multipliers? I get not want

ing to use a CPU, but a multiplier is a multiplier whether you construct it

or it's already in the FPGA.

--

Rick C.

-- Get 1,000 miles of free Supercharging

Rick C.

-- Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

On 01/04/2020 20:20, Rick C wrote:

That's because it's not my field and I don't know the correct form of words.

Yes.

The last time I used a hardware multiplier outside a CPU it was made

with TTL chips and wire wrapped. A Wallace tree IIRC taken from a big

orange Texas databook. My experience is somewhat limited.

There is a possible future requirement to run at high temperatures and

in a very high vibration space-constrained environment. This limits the

size of the FPGA and the package type in practice to say something with

20k gates - the actual part numbers known to work is confidential, but

they won't be the latest greatest.

It had occurred to me that, as the FFT needs lots of complex

multiplications, these could be done using lots of smaller simple

shift-add multipliers instead of one large 'normal' multiplier - and for

all I know, that's the way an expert would do it. But only if the

speed/size trade off worked.

That's because it's not my field and I don't know the correct form of words.

Yes.

The last time I used a hardware multiplier outside a CPU it was made

with TTL chips and wire wrapped. A Wallace tree IIRC taken from a big

orange Texas databook. My experience is somewhat limited.

There is a possible future requirement to run at high temperatures and

in a very high vibration space-constrained environment. This limits the

size of the FPGA and the package type in practice to say something with

20k gates - the actual part numbers known to work is confidential, but

they won't be the latest greatest.

It had occurred to me that, as the FFT needs lots of complex

multiplications, these could be done using lots of smaller simple

shift-add multipliers instead of one large 'normal' multiplier - and for

all I know, that's the way an expert would do it. But only if the

speed/size trade off worked.

--

Cheers

Clive

Cheers

Clive

Re: FFT Speeds

On Thursday, April 2, 2020 at 4:40:42 AM UTC-4, Clive Arthur wrote:

the complexities of trying to get a sequentially executing CPU to appear to

be processing in parallel.

T,

rs

liers.

ds.

No, I think your terminology is good, I just don't recall exactly what thin

gs mean from time to time. I've reached a point where I often know a topic

(well, the parts I remember) but not the terminology.

ed sequentially to generate a multiply in N clocks?

hem to do the job. I'm not an expert on multiplier design, but I believe a

modified Booth's algorithm is not only faster but more compact. The logic

is a bit involved but someone explained to me how it uses fewer computatio

ns, but I completely forget the details.

I've heard of Wallace trees, but never used them. I think they are more us

eful with the sort of function blocks you have in TTL/ECL chips. When desi

gning with gates I believe the modified Booth's algorithm is faster, but I'

m not sure. It certainly fits in an FPGA better because they all have fast

adder chains in the fabric logic so the Wallace tree logic would be less e

ffective.

er... for some situations. A 128 point (complex/real?, xx bit?, fixed/floa

t) FFT is not very large, so not a huge amount of computations. If you spl

it the work you need to consider the time to offload the data and return th

e results. A friend many years ago told me about evaluating an attached ar

ray processor on his PDP 11/70 in grad school. The calcs were fast, but th

e I/O speed was slow enough it was faster on the CPU.

to the FPGA and do more processing there. I would be looking at the proble

m as by default work being done on the FPGA and offloading to a CPU the par

ts that are harder to do in an FPGA like network connectivity, etc. Many o

f my designs have no CPU.

1100, 2048 or 3520 logic cells (LUT4s + FF) and 8 or 16 kB block RAM in a 0

.5 mm pitch, 7 x 7 mm, 48 pin QFN. A nice part for many small designs.

wanting to use a CPU, but a multiplier is a multiplier whether you construc

t it or it's already in the FPGA.

Ok, that makes sense. The qualified part doesn't have multiplier blocks.

A shift-add multiplier is one layer of a pipelined multiplier. So have N s

hift-add multipliers of M bits or N/M Booth's multipliers. The amount of l

ogic will be about the same other than whatever optimization factor the Boo

th's will offer. I'm not sure it is less logic, but faster for sure. Anyw

ay, you should be able to put a lot of them into 20 kLUTs. No, wait, you s

aid "gates".

I think that tells me which FPGA familly you are working with. No one uses

"gates" anymore and these guys do a lot of "qualified" business. That exp

lains the lack of multipliers. 20 kgates isn't really much logic either.

I'm guessing around 3,000 logic elements and these can be either logic or F

F but not both?

I wish I could remember more, but these days you can put

A <= B * C in your code and you will get a library function block. Let t

he tools work it out.

the complexities of trying to get a sequentially executing CPU to appear to

be processing in parallel.

T,

rs

liers.

ds.

No, I think your terminology is good, I just don't recall exactly what thin

gs mean from time to time. I've reached a point where I often know a topic

(well, the parts I remember) but not the terminology.

ed sequentially to generate a multiply in N clocks?

hem to do the job. I'm not an expert on multiplier design, but I believe a

modified Booth's algorithm is not only faster but more compact. The logic

is a bit involved but someone explained to me how it uses fewer computatio

ns, but I completely forget the details.

I've heard of Wallace trees, but never used them. I think they are more us

eful with the sort of function blocks you have in TTL/ECL chips. When desi

gning with gates I believe the modified Booth's algorithm is faster, but I'

m not sure. It certainly fits in an FPGA better because they all have fast

adder chains in the fabric logic so the Wallace tree logic would be less e

ffective.

er... for some situations. A 128 point (complex/real?, xx bit?, fixed/floa

t) FFT is not very large, so not a huge amount of computations. If you spl

it the work you need to consider the time to offload the data and return th

e results. A friend many years ago told me about evaluating an attached ar

ray processor on his PDP 11/70 in grad school. The calcs were fast, but th

e I/O speed was slow enough it was faster on the CPU.

to the FPGA and do more processing there. I would be looking at the proble

m as by default work being done on the FPGA and offloading to a CPU the par

ts that are harder to do in an FPGA like network connectivity, etc. Many o

f my designs have no CPU.

1100, 2048 or 3520 logic cells (LUT4s + FF) and 8 or 16 kB block RAM in a 0

.5 mm pitch, 7 x 7 mm, 48 pin QFN. A nice part for many small designs.

wanting to use a CPU, but a multiplier is a multiplier whether you construc

t it or it's already in the FPGA.

Ok, that makes sense. The qualified part doesn't have multiplier blocks.

A shift-add multiplier is one layer of a pipelined multiplier. So have N s

hift-add multipliers of M bits or N/M Booth's multipliers. The amount of l

ogic will be about the same other than whatever optimization factor the Boo

th's will offer. I'm not sure it is less logic, but faster for sure. Anyw

ay, you should be able to put a lot of them into 20 kLUTs. No, wait, you s

aid "gates".

I think that tells me which FPGA familly you are working with. No one uses

"gates" anymore and these guys do a lot of "qualified" business. That exp

lains the lack of multipliers. 20 kgates isn't really much logic either.

I'm guessing around 3,000 logic elements and these can be either logic or F

F but not both?

I wish I could remember more, but these days you can put

A <= B * C in your code and you will get a library function block. Let t

he tools work it out.

--

Rick C.

-+ Get 1,000 miles of free Supercharging

Rick C.

-+ Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

Implementing e.g. a 32 x 32 Wallace tree multiplier with MSI chips is

nasty, due to the huge number of interconnections. The product

consisting of two input AND gates produces 1024 lines. Then full

adders (preferably with carry look ahead) are required to count the

number of "1" bits. This requires multiple stages with the adder

propagation delays in each stage.

This is not a problem in e.g. in FFT calculations, since the

multiplies can be pipelined, producing one product every adder

propagation delay time but the total time taken by a single multiply

can be 10 or more adder propagation times,

Re: FFT Speeds

On 4/1/20 12:32 PM, Clive Arthur wrote:

Going back to this basic question, If I presume you are asking not about

the fairly limited number hard core multipliers that many FPGAs have

included, but about implementation in the FPGA fabric, and are comparing

one design which has a full tree implemented in parrallel, and can run

at a throughput of 1 multiply per cycle (it may take several cycle to

get a new result, but a new multiply can be done every cycle) verse an

implementation that uses reduced hardware but serially iterates the

answer in N cycles, my first impression is that the additional control

mechanisms to iterate probably take more logic than the fully parallel

solution and thus costs space, and thus has a lower throughput for a

given number of logic cells dedicated to multipliers. Someone would

really need to design this to be sure. Maybe the fact that the sequencer

could drive many serial multipliers might make a difference.

Going back to this basic question, If I presume you are asking not about

the fairly limited number hard core multipliers that many FPGAs have

included, but about implementation in the FPGA fabric, and are comparing

one design which has a full tree implemented in parrallel, and can run

at a throughput of 1 multiply per cycle (it may take several cycle to

get a new result, but a new multiply can be done every cycle) verse an

implementation that uses reduced hardware but serially iterates the

answer in N cycles, my first impression is that the additional control

mechanisms to iterate probably take more logic than the fully parallel

solution and thus costs space, and thus has a lower throughput for a

given number of logic cells dedicated to multipliers. Someone would

really need to design this to be sure. Maybe the fact that the sequencer

could drive many serial multipliers might make a difference.

Re: FFT Speeds

On Thursday, April 2, 2020 at 8:02:57 AM UTC-4, Richard Damon wrote:

,

you

s

I have designed shift/add multipliers and the control logic is not much. T

here is a log2(N) bit counter and minimal other logic. The shift/add multi

plier only works with unsigned multipliers (or is it the multiplicand?) so

there needs to be a 2's complement before and after the multiplier. I'm pr

etty sure the Booth's algorithm handles sign automagically, but I won't swe

ar to it. I did this over a decade ago. I'm pretty sure if you are doing

a bunch of multipliers the shift/add uses more real estate for the same thr

oughput, but I don't recall the details of modified Booth's so I can't say

for sure.

*<snipped>*,

you

s

I have designed shift/add multipliers and the control logic is not much. T

here is a log2(N) bit counter and minimal other logic. The shift/add multi

plier only works with unsigned multipliers (or is it the multiplicand?) so

there needs to be a 2's complement before and after the multiplier. I'm pr

etty sure the Booth's algorithm handles sign automagically, but I won't swe

ar to it. I did this over a decade ago. I'm pretty sure if you are doing

a bunch of multipliers the shift/add uses more real estate for the same thr

oughput, but I don't recall the details of modified Booth's so I can't say

for sure.

--

Rick C.

+- Get 1,000 miles of free Supercharging

Rick C.

+- Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

On Thursday, April 2, 2020 at 2:33:20 PM UTC-4, Tauno Voipio wrote:

n.

Your mention of two's complement as if there were other choices these days

made me recall that in the Forth language group there is a discussion of st

andardizing on 2's complement... finally.

Seems someone looked and there are literally no remaining hardware computer

s running anything other than 2's complement. There is an architecture run

ning 1's complement, but it is no longer a hardware implementation but only

simulated on other machines, lol. There must be some commercial niche for

software written for that machine.

So no real need to specify 2's complement these days although I suppose in

an FPGA there could be anything. Some ADCs are still signed magnitude, so

that conceivably could be carried through the chip.

n.

Your mention of two's complement as if there were other choices these days

made me recall that in the Forth language group there is a discussion of st

andardizing on 2's complement... finally.

Seems someone looked and there are literally no remaining hardware computer

s running anything other than 2's complement. There is an architecture run

ning 1's complement, but it is no longer a hardware implementation but only

simulated on other machines, lol. There must be some commercial niche for

software written for that machine.

So no real need to specify 2's complement these days although I suppose in

an FPGA there could be anything. Some ADCs are still signed magnitude, so

that conceivably could be carried through the chip.

--

Rick C.

++ Get 1,000 miles of free Supercharging

Rick C.

++ Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

On Thursday, April 2, 2020 at 11:33:24 PM UTC-4, Paul Rubin wrote:

That's not the main data format on any machines I know of. No one uses flo

ating point for logic or flags. It was an issue in Forth because of the wa

y people perform logic using non-flags or math using flags. A one's comple

ment or a signed magnitude data format breaks any assumptions you make abou

t mixing the two.

Some people complain that the practice happens making programs potentially

non-portable so it looks like they are going to standardize on two's comple

ment so these programs will be compliant and the griping will stop. No one

is complaining the change will break anything.

That's not the main data format on any machines I know of. No one uses flo

ating point for logic or flags. It was an issue in Forth because of the wa

y people perform logic using non-flags or math using flags. A one's comple

ment or a signed magnitude data format breaks any assumptions you make abou

t mixing the two.

Some people complain that the practice happens making programs potentially

non-portable so it looks like they are going to standardize on two's comple

ment so these programs will be compliant and the griping will stop. No one

is complaining the change will break anything.

--

Rick C.

--- Get 1,000 miles of free Supercharging

Rick C.

--- Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: FFT Speeds

On Thu, 2 Apr 2020 12:39:55 -0700 (PDT), Rick C

by simply inverting all bits in a word. Thus inserting the inverters

in one data path and both inverting and some other operation can be

used in a single instruction.

must be added to the result, requiring an adder and the ripple carry

can propagate through all bit positions. Thus complementing a number

needs an ADD instruction and takes as long as an ordinary ADD, Of

course these days with carry look-ahead logic, the ADD is nearly as

fast as AND/OR instructions.

With linear ADCs the zero line can be freely selected. One could put

the zero at 1/3 FSD and hence large numeric range on one side.

However, in floating point ADCs the range just around zero is of most

interest, thus the sign/magnitude is more appropriate than biased

unsigned or 2'c complement. The floating point sign/magnitude is used

e.g. in digital telephony, so that weak signals are reproduced better.

by simply inverting all bits in a word. Thus inserting the inverters

in one data path and both inverting and some other operation can be

used in a single instruction.

must be added to the result, requiring an adder and the ripple carry

can propagate through all bit positions. Thus complementing a number

needs an ADD instruction and takes as long as an ordinary ADD, Of

course these days with carry look-ahead logic, the ADD is nearly as

fast as AND/OR instructions.

With linear ADCs the zero line can be freely selected. One could put

the zero at 1/3 FSD and hence large numeric range on one side.

However, in floating point ADCs the range just around zero is of most

interest, thus the sign/magnitude is more appropriate than biased

unsigned or 2'c complement. The floating point sign/magnitude is used

e.g. in digital telephony, so that weak signals are reproduced better.

Re: FFT Speeds

On Friday, April 3, 2020 at 2:34:42 AM UTC-4, snipped-for-privacy@downunder.com wrote:

tion.

ys made me recall that in the Forth language group there is a discussion of

standardizing on 2's complement... finally.

ters running anything other than 2's complement. There is an architecture

running 1's complement, but it is no longer a hardware implementation but o

nly simulated on other machines, lol. There must be some commercial niche

for software written for that machine.

er

"1"

Exactly, and the 2's complement doesn't have the various issues the 1's com

plement does. It only has a single value for zero. I can't recall how the

y deal with that. If you subtract 2 from 1 you would get an all ones word

which is zero, so you have to subtract another 1 to get -1 which is all one

s with the LSB zero. So it's not really simpler, just messy in different w

ays.

Actually the invert and add 1 is pretty easy to do in an ALU. You already

have the adder, just add a conditional inverter in front and you have a sub

tract, oh, don't forget to add the 1 through the carry in. Easy and no dif

ferent timing than the ALU without the inverter if it's done in the same LU

T in an FPGA. In fact, I've seen macros for add/sub blocks.

The same sort of trick can be used to turn an adder into a mux. In the ins

truction fetch unit of a CPU I designed there are multiple sources for the

address, plus there is a need for an incrementer. This is combined in one

LUT per bit by using an enable to disable one of the inputs which only pass

es the other input like a mux. I think I use that for the return address.

in an FPGA there could be anything. Some ADCs are still signed magnitude,

so that conceivably could be carried through the chip.

You are talking about u-Law/A-Law compression. Yes, very familiar with tha

t. Not really relevant to the native data format on CPUs though. I know i

n u-Law there is a bias which can muck things up if you aren't careful. So

it's not even signed magnitude either.

tion.

ys made me recall that in the Forth language group there is a discussion of

standardizing on 2's complement... finally.

ters running anything other than 2's complement. There is an architecture

running 1's complement, but it is no longer a hardware implementation but o

nly simulated on other machines, lol. There must be some commercial niche

for software written for that machine.

er

"1"

Exactly, and the 2's complement doesn't have the various issues the 1's com

plement does. It only has a single value for zero. I can't recall how the

y deal with that. If you subtract 2 from 1 you would get an all ones word

which is zero, so you have to subtract another 1 to get -1 which is all one

s with the LSB zero. So it's not really simpler, just messy in different w

ays.

Actually the invert and add 1 is pretty easy to do in an ALU. You already

have the adder, just add a conditional inverter in front and you have a sub

tract, oh, don't forget to add the 1 through the carry in. Easy and no dif

ferent timing than the ALU without the inverter if it's done in the same LU

T in an FPGA. In fact, I've seen macros for add/sub blocks.

The same sort of trick can be used to turn an adder into a mux. In the ins

truction fetch unit of a CPU I designed there are multiple sources for the

address, plus there is a need for an incrementer. This is combined in one

LUT per bit by using an enable to disable one of the inputs which only pass

es the other input like a mux. I think I use that for the return address.

in an FPGA there could be anything. Some ADCs are still signed magnitude,

so that conceivably could be carried through the chip.

You are talking about u-Law/A-Law compression. Yes, very familiar with tha

t. Not really relevant to the native data format on CPUs though. I know i

n u-Law there is a bias which can muck things up if you aren't careful. So

it's not even signed magnitude either.

--

Rick C.

--+ Get 1,000 miles of free Supercharging

Rick C.

--+ Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

#### Site Timeline

- » Missing volatile qualifiers in MCU vendor header files?
- — Next thread in » Embedded Programming

- » PCIe controller in congatec SMX8 does not work
- — Previous thread in » Embedded Programming

- » e-Ink and other paper like displays
- — Newest thread in » Embedded Programming

- » Rules
- — The site's Newest Thread. Posted in » Microcontrollers (Russian)