#### Do you have a question? Post it now! No Registration Necessary

**posted on**

- Christos

August 28, 2003, 4:09 pm

Hi to all,

The sum or average of a certain number of samples (for ex. the last 100

values received) have to be checked constantly against a threshold.

I thought of implementing this by keeping a "Moving Sum" which will work by

adding the new value and subtracting the oldest. I think that can be

implemented by adding to a register the value just arriving and subtracting

the value coming out of an 100 word deep shift register.

Now, if a longer sum has to be checked then there is a memory problem

because a lot of values have to be stored. In addition more than one "Moving

Sums" is needed so if I use the above implementation I will have in addition

to store the same data more than once (for ex. the 1000 word Shift Register

will include the 100 word S.R. data).

Any idea of how this could be implemented?

The final system will have to keep 10 moving sums with the largest being

250,000 (8-bit) values for each of the 16 independent input channels.

Help to the design problem will be appreciated and acknowledged.

Christos

Christos Zamantzas

CERN, European Organization for Nuclear Research

Div. AB/BDI/BL tel: +41 22 767 3409

CH-1211 Geneva 23 fax: +41 22 767 9560

Switzerland snipped-for-privacy@cern.ch

The sum or average of a certain number of samples (for ex. the last 100

values received) have to be checked constantly against a threshold.

I thought of implementing this by keeping a "Moving Sum" which will work by

adding the new value and subtracting the oldest. I think that can be

implemented by adding to a register the value just arriving and subtracting

the value coming out of an 100 word deep shift register.

Now, if a longer sum has to be checked then there is a memory problem

because a lot of values have to be stored. In addition more than one "Moving

Sums" is needed so if I use the above implementation I will have in addition

to store the same data more than once (for ex. the 1000 word Shift Register

will include the 100 word S.R. data).

Any idea of how this could be implemented?

The final system will have to keep 10 moving sums with the largest being

250,000 (8-bit) values for each of the 16 independent input channels.

Help to the design problem will be appreciated and acknowledged.

Christos

______________________________________________________________________________________________________________________________________________________Christos Zamantzas

CERN, European Organization for Nuclear Research

Div. AB/BDI/BL tel: +41 22 767 3409

CH-1211 Geneva 23 fax: +41 22 767 9560

Switzerland snipped-for-privacy@cern.ch

______________________________________________________________________________________________________________________________________________________Re: Moving Sum

Can you cheat? That is, instead of having a 250,000 deep moving sum,

have it be 250,000 deep but only at intervals of every 1000 samples?

The other option is once you have to go off-chip for memory for the

FIFO's, the size doesn't matter much because you can easily just throw

~1GB of DRAM on the other side.

--

Nicholas C. Weaver snipped-for-privacy@cs.berkeley.edu

Nicholas C. Weaver snipped-for-privacy@cs.berkeley.edu

Re: Moving Sum

Another option, if you're using Xilinx parts, is to take advantage of those

SRL16's. In V2P parts (and I think V2) there are 2 per slice. With 64 of

these guys cascaded together, you've got a 1-bit wide, 1024-bit long moving

sum (barrel shift down for the average). That's 32 slices, or 8 CLB's per

bit-width, depending on the level of abstraction you like to think about.

--Josh

by

subtracting

"Moving

addition

Register

SRL16's. In V2P parts (and I think V2) there are 2 per slice. With 64 of

these guys cascaded together, you've got a 1-bit wide, 1024-bit long moving

sum (barrel shift down for the average). That's 32 slices, or 8 CLB's per

bit-width, depending on the level of abstraction you like to think about.

--Josh

by

subtracting

"Moving

addition

Register

Re: Moving Sum

I have thought also of this but the idea was rejected as it increases the

total system error.

In order to make an interval you have to wait to receive all of its samples

before you add the interval to the sum. Thus, you update the sum slower

which increases the system error.

For a similar reason it is not possible to use a Low-Pass Filter. I guess to

have an average of the 250,000 values you need as many taps.

A sync. SRAM (probably 2Mx36b) will be available to the board. I have been

calculating and I think that it is enough.

----------------------------------------------------------------------------

---

We've slightly trimmed the long signature. Click to see the full one.

Re: Moving Sum

Look for "CIC filter". CIC is a Cascaded integrator Comb filter. It is a

recursive implementation of a moving sum. In your case, it sounds like you are

sampling the output once for every input sample, so you don't get the benefit of

decimation (if you could decimate, then the delay queue is shortened by a ratio

equal to the decimation ratio). The CIC consists of an integrator, a subtractor

and a delay queue. For a moving sum, you are stuck with the storage and the key

is to minimize the number of transactions you need to do with the storage per

sample. In the case of the CIC, you need to do one read and one write per

sample. For the depth you are looking at, you'll need to use off chip memory

for the storage (you might fit it into the bulk storage on an Altera Stratix).

You did not mention the sample rate. If the data rate is sufficiently low, you

can time multiplex the data in/out of the external memory so that you can trade

memory width for depth, which might get you a lower parts count.

Christos wrote:

recursive implementation of a moving sum. In your case, it sounds like you are

sampling the output once for every input sample, so you don't get the benefit of

decimation (if you could decimate, then the delay queue is shortened by a ratio

equal to the decimation ratio). The CIC consists of an integrator, a subtractor

and a delay queue. For a moving sum, you are stuck with the storage and the key

is to minimize the number of transactions you need to do with the storage per

sample. In the case of the CIC, you need to do one read and one write per

sample. For the depth you are looking at, you'll need to use off chip memory

for the storage (you might fit it into the bulk storage on an Altera Stratix).

You did not mention the sample rate. If the data rate is sufficiently low, you

can time multiplex the data in/out of the external memory so that you can trade

memory width for depth, which might get you a lower parts count.

Christos wrote:

--

--Ray Andraka, P.E.

President, the Andraka Consulting Group, Inc.

--Ray Andraka, P.E.

President, the Andraka Consulting Group, Inc.

We've slightly trimmed the long signature. Click to see the full one.

Re: Moving Sum

Do you require that each of the previously recieved values are

considered equally in the average calculation? If you can assume that

the current samples are more important than those recieved a long time

ago then you can calculate an "average" via an exponentially weighted

moving average filter. Or in other words, us a 1st-order LPF.

a

___k = (1/(n+1))__

***s_****k + (n/(n+1))***a_k-1

where:

s_k = sample input at instant k,

n = number of samples in moving-average window,

a_k = average at instant k, and

a_k-1 = average at instant k-1

As you can see this is quite easy to implement, requiring to

multiplies, one addition, and one register for a_k-1 storage. If you

choose n+1 do be a power of two then one of the multiplications (or I

guess it is a divide) becomes a simple shift operation.

-Jack Stone

Re: Moving Sum

[exponential weighted averaging]

The other multiply/divide turns into a shift and subtract.

The other multiply/divide turns into a shift and subtract.

--

The suespammers.org mail server is located in California. So are all my

other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited

The suespammers.org mail server is located in California. So are all my

other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited

We've slightly trimmed the long signature. Click to see the full one.

Re: Moving Sum

The exponential weighted averaging cannot be used because all data into the

window have to be treated equally as all have the same importance.

If using a LPF and n is the number of samples in the window then if you want

to have an average of the last 100 values received then your filter has to

be 100 tap long.

Correct? I am asking just in case I have not understood fully how digital

filters are implemented. The truth is that I have never done it, just read

about it.

Christos

unsolicited

addresses.

window have to be treated equally as all have the same importance.

If using a LPF and n is the number of samples in the window then if you want

to have an average of the last 100 values received then your filter has to

be 100 tap long.

Correct? I am asking just in case I have not understood fully how digital

filters are implemented. The truth is that I have never done it, just read

about it.

Christos

unsolicited

addresses.

Re: Moving Sum

The bottom line is if you want a moving window averager you are going

to have to have a memory that will hold the entire window of points.

If your FPGA doesn't have enough memory (and it won't for 250,000x10

data) then you will need external memory. Otherwise your question is

like how do you put a gallon of water into a pint sized glass.

to have to have a memory that will hold the entire window of points.

If your FPGA doesn't have enough memory (and it won't for 250,000x10

data) then you will need external memory. Otherwise your question is

like how do you put a gallon of water into a pint sized glass.

Re: Moving Sum

can be zero, all the values in memory must be preset to zero and the

comparison to a threshold must be declared invalid until 'n' new values have

been accumulated.

--

Greg

snipped-for-privacy@hotmail.com.invalid

Greg

snipped-for-privacy@hotmail.com.invalid

We've slightly trimmed the long signature. Click to see the full one.

Re: Moving Sum

and

This is easy todo, even without the CIC Filters suggested by Ray

Andraka:

In external memory you keep a circular buffer of 16x250000 samples.

You keep your 160 Sums inside of your FPGA. To update them you do the

following:

For each channel

input the new sample X

write new sample to its external ram location in a circular buffer

for each moving sum of this channel

read the value Y that "falls of" the sum from external ram

add X-Y to the moving sum inside the FPGA.

This requires 16 writes and 160 reads to external memory with a

resulting bandwidth of 4.400.000 memory accesses per seconds.

If the values are stored in memory with the right alignment you can do

4 accesses in parallel reducing the bandwidth to 1.100.000 accesses

per second.

Maybe you should instantiate a processor in you fpga and use that to

implement this.

OT:

What are you doing at LHC that has a sample rate of 25kHZ?

Kolja Sulimma

Frankfurt

What are you

Re: Moving Sum

For the moment I have used a process similar to the one you describe.

The circular buffer is a SR and the result of the X-Y is fed to an

accumulator using signed numbers.

I think it works very well (the clk has to be min 20 ns, otherwise it goes

unstable/setup violations, but this is not a problem as the real clk will be

much slower).

On the other hand this is going to be used only up to ~10 ms of data (250

values).

The processes that will go up to 100s will take an average of 8 values and

store that value to the external memory. In that way the data are minimised

by a factor of 8 and the system error is negligible.

The SRAM that was found can be used in the architecture of 1M x 72bit, so 8

accesses in parallel times two in 40 us seems to be more than ok

One problem now is how to implement this! my experience do go that far!

You've said something about instantiating a processor (I guess something

like NIOS), are you sure that this will not complicate things more?

Is there something ready to implement a circular buffer to the external ram?

The second problem, and the reason why I asked for help in this group, is

that those SRAMS are quite expensive and having in mind that 2000 of them

will be needed, it increases the cost significantly. So they are pressing me

to find some other way to implement it. (usual stuff: we want the pie and

the dog fed!)

Ray has given me the idea of the CIC (he will be acknowledged for that in my

thesis, as well as all the rest which took the time to answer) but I still

haven't figured how it is working! Soon I hope, so that I can figure out if

I can use it.

And answering your question about LHC, this system is for machine protection

and it is called Beam Loss Monitor. The superconducting magnets have to be

prevented from quenching by the particles showers hitting them as some

particles are lost from the trajectory.

Inside the tunnel some Ionisation Chambers are installed (3600) and they

give an amplitude proportional to the particle rate passing through them.

This current is fed to a CFC (Current to Frequency Converter) and a counter

is measuring the frequency. The counter data, as well as some status, CRC

etc from 16 chambers are sent through an optical link to the surface for

processing.

And here is where I come, I have to design the threshold comparator for

this.

Samples at 40us is enough as it is ~half a LHC cycle and maybe it will be

increased to 89us (~11KHz)which is one cycle. Just imagine what would be my

problems if 40MHz was used and I had to go up to 100s of data!!

Christos

The circular buffer is a SR and the result of the X-Y is fed to an

accumulator using signed numbers.

I think it works very well (the clk has to be min 20 ns, otherwise it goes

unstable/setup violations, but this is not a problem as the real clk will be

much slower).

On the other hand this is going to be used only up to ~10 ms of data (250

values).

The processes that will go up to 100s will take an average of 8 values and

store that value to the external memory. In that way the data are minimised

by a factor of 8 and the system error is negligible.

The SRAM that was found can be used in the architecture of 1M x 72bit, so 8

accesses in parallel times two in 40 us seems to be more than ok

One problem now is how to implement this! my experience do go that far!

You've said something about instantiating a processor (I guess something

like NIOS), are you sure that this will not complicate things more?

Is there something ready to implement a circular buffer to the external ram?

The second problem, and the reason why I asked for help in this group, is

that those SRAMS are quite expensive and having in mind that 2000 of them

will be needed, it increases the cost significantly. So they are pressing me

to find some other way to implement it. (usual stuff: we want the pie and

the dog fed!)

Ray has given me the idea of the CIC (he will be acknowledged for that in my

thesis, as well as all the rest which took the time to answer) but I still

haven't figured how it is working! Soon I hope, so that I can figure out if

I can use it.

And answering your question about LHC, this system is for machine protection

and it is called Beam Loss Monitor. The superconducting magnets have to be

prevented from quenching by the particles showers hitting them as some

particles are lost from the trajectory.

Inside the tunnel some Ionisation Chambers are installed (3600) and they

give an amplitude proportional to the particle rate passing through them.

This current is fed to a CFC (Current to Frequency Converter) and a counter

is measuring the frequency. The counter data, as well as some status, CRC

etc from 16 chambers are sent through an optical link to the surface for

processing.

And here is where I come, I have to design the threshold comparator for

this.

Samples at 40us is enough as it is ~half a LHC cycle and maybe it will be

increased to 89us (~11KHz)which is one cycle. Just imagine what would be my

problems if 40MHz was used and I had to go up to 100s of data!!

Christos

Re: Moving Sum

and

minimised

so 8

ram?

is

them

pressing me

and

In the first paragraph I explain that saving the average values of 8 samples

the data are minimised by a factor of 8. So 5 MB have to be stored for this

system (up to 100s) which fit together with the data of the first system (up

to 10ms) in one SRAM. On the card there are 2 more to hold other data. And

650 of these cards are needed, that gives ~2000 SRAMs.

Sorry for not been that clear but the mail was already too long and I didn't

want to kill you with

boredom completely!

Christos

Re: Moving Sum

I guess that you have a lot more channels than I realized.

I recommend going to a mass storage device that can hold this amount

of data, such as multiple disk drives. Your data rates are relatively

slow, giving you the option of streaming interleaved data (don't store

a single channel in one place on the disk). You can also look into

large DRAMs. Another option is flash memory, but you might wear these

devices out.

Tom

Re: Moving Sum

Christos,

I have read all the previous responses and I have one suggestion. Could

you use a DRAM based memory? For your first pass generate a running sum of

your 250,000 samples. Then add the new value and subtract the old value.

That way you need not do the whole sum for each sample. Use a simple micro

to control the system. It could be as simple as a pico-blaze. You could

even control it with a simple state machine. You really do not need

anything as fast as a SRAM. You can refresh the DRAM between the sample

periods.

Theron Hicks

by

subtracting

"Moving

addition

Register

I have read all the previous responses and I have one suggestion. Could

you use a DRAM based memory? For your first pass generate a running sum of

your 250,000 samples. Then add the new value and subtract the old value.

That way you need not do the whole sum for each sample. Use a simple micro

to control the system. It could be as simple as a pico-blaze. You could

even control it with a simple state machine. You really do not need

anything as fast as a SRAM. You can refresh the DRAM between the sample

periods.

Theron Hicks

by

subtracting

"Moving

addition

Register

Re: Moving Sum

Regardless which method is used, using a CIC limits the number of memory

transactions per sample to just two: a read and a write. The memory is used as

a delay queue, so the read pointer is N samples behind the write pointer. The

memory required for all those channels is pretty big, so DRAM would be the way

to go if you are using semiconductor memories. Since the addressing can easily

be made linear, you can simplify it by using page mode or burst accesses. This

should make it fast enough to multiplex many channels into one memory .

transactions per sample to just two: a read and a write. The memory is used as

a delay queue, so the read pointer is N samples behind the write pointer. The

memory required for all those channels is pretty big, so DRAM would be the way

to go if you are using semiconductor memories. Since the addressing can easily

be made linear, you can simplify it by using page mode or burst accesses. This

should make it fast enough to multiplex many channels into one memory .

--

--Ray Andraka, P.E.

President, the Andraka Consulting Group, Inc.

--Ray Andraka, P.E.

President, the Andraka Consulting Group, Inc.

We've slightly trimmed the long signature. Click to see the full one.

Re: Moving Sum

Hi Ray,

Forgive me that I still haven't found any time to read about the CIC filter,

but from the way you describe its operation it does not require less read

and write operations from a simple implementation of a subtract and

accumulate which I am testing at the moment. I still need the same read

pointer and the same memory.

My point is, is there an advantage with CIC that I don't see?

Christos

used as

The

way

easily

This

Forgive me that I still haven't found any time to read about the CIC filter,

but from the way you describe its operation it does not require less read

and write operations from a simple implementation of a subtract and

accumulate which I am testing at the moment. I still need the same read

pointer and the same memory.

My point is, is there an advantage with CIC that I don't see?

Christos

used as

The

way

easily

This

#### Site Timeline

- » Selecting between two clock signals
- — Next thread in » Field-Programmable Gate Arrays

- » Re: pricing, cyclone or spartan
- — Previous thread in » Field-Programmable Gate Arrays

- » Division Algorithms
- — Newest thread in » Field-Programmable Gate Arrays

- » Spectrum analyzer for sale
- — The site's Newest Thread. Posted in » Electronics Equipment