DC Blocker

B

b2508 10 years ago

Hi all,

I need to implement DC blocker in FPGA. Data samples are coming at every clock cycle.

My original idea was to implement high pass filter as in formula below:

y[n] = x[n] - x[n-1] + p*y[n-1]

However it seems to me that I cannot achieve this with the given data rate. I am unable to calculate output by the time when I need it in feedback loop for the next sample.

Is there some way to do this that I don't see? If not, I was thinking of finding mean value of signal and subtracting it from signal in order to clear DC.

However, I do not know how to determine appropriate number of samples for this and do i do this by FIR filtering with all coefficients equal to

1/N?

Thank you in advance.

--------------------------------------- Posted through

formatting link

Vote

K

kaz 10 years ago

you can, just put one delay stage(register) on input to get x[n-1] and one on output to get y[n-1], multiply by p and the circuit will do the job at data rate. The mult output should not be registered and this may be speed bottleneck. Moreover the above subtraction/addition cannot be pipelined i.e. result should arrive at same clock edge. What is your data rate (system clock) and device?

it

for

This an alternative but you may need long delay stages to filter off dc only.

for n stages, design n stages of delay, subtract current input from last stage and accumulate/scale.

Kaz

--------------------------------------- Posted through

formatting link

Vote

B

b2508 10 years ago

Hm.. I tought that multiplication cannot be implemented without delay.

This could cause timing issues to my knowledge.

Moreover, full formula is

y[n] = Q {x[n] - x[n-1] + p*y[n-1] - e[n-1]} e[n] = x[n] - x[n-1] + p*y[n-1] - e[n-1] - y[n]

Error is difference between output before and after quantization. I asked for the initial one because even that i don't know how to implement.

if x1 appears at t1, corresponding y1 is ready at earlies at t2=t1+1. If I register subtracting operation as well, e1 is available at t3=t1+1.

However x2 arrives at t2 and neither y1 (corresponding y[n-1]) or e1 are ready at that time.

--------------------------------------- Posted through

formatting link

Vote

K

kaz 10 years ago

I

As such you got very long combinatorial paths running from mult input right through adders/subtractors. Unless your speed is low enough you can't do that in practice.

The fir subtraction is certainly doable but you need a long delay line e.g. n = 1024 or more but depends on signal

Kaz

--------------------------------------- Posted through

formatting link

Vote

R

rickman 10 years ago

What is Q?

Y1 is ready at t1+delta which is a logic delay, not a clock cycle. So don't sweat that. If you need to pipeline this to meet timing constraints, you are in trouble, lol.

What clock rate are you shooting for?

Rick

Vote

M

Mark Curry 10 years ago

Without really looking at your required function in detail (just noting that it has feedback terms) - I'll just note in general.

The statement "multiplication cannot be implemented without delay" is false, in many ways. It all depends on your processing requirements. What is your sample rate? What are your bit widths?

You're processing clock does NOT need to be the same as your sample clock. If you wish them to be the same - it may be easier for new FPGA users to design - then you MAY be able to run the multiplier full combinational - If you're sample rate is low enough.

The alternative (at a high level) is to buffer an input and output, and process with a faster processing clock. Modern FPGA's these days can run DSP functions upwards to around 400-500 MHz. This is likely much faster than your sample rate.

Regards, Mark

Vote

B

b2508 10 years ago

I

are

clock.

OK, I was taught that it is always safer to put registers wherever you can. I have no choice in my project but to have same sampling and processing rate.

My rate is 100 MHz. Input data or x[n] has data format - unsigned, 16 bit, 1 bit for integer.

Also, I am not sure how to select data widths after each of these operations.

If x[n] and x[n-1] are 16/1 and their subtraction is 17 bit unsigned with

2 bit integers, how do I proceed with data width selection? Feedback loop part is unclear to me.

Also, should I use DSP48 for the multiplication with P or should I make it somehow power of two and do it by shifting?

Q is quantization, or reducing number of samples after all these operations.

--------------------------------------- Posted through

formatting link

Vote

G

glen herrmannsfeldt 10 years ago

(snip)

I would say that it is right, but not very useful.

Addition can't be implemented without delay, and for that matter no filter can be. Even wires have delay.

If you are lucky, you can do all processing within one sample period, so one sample delay. You have to include any delay from the previous register, so you have less than one sample period.

But more often, you can live with a few cycles delay, and pipeline the whole system.

-- glen

Vote

T

Tim Wescott 10 years ago

There are other ways to implement high-pass filters. I'm not much of an FPGA guy, but this one may help. I'm going to rearrange your nomenclature:

u: input y: output x: state variable

y[n] = u[n] - x[n-1] x[n] = d * y[n]

For d

Vote

R

rickman 10 years ago

Do you know the value of P? Multiplies are done by shifting and adding. I don't know which chip you are planning to use, but all the multipliers I know of require pipelining, the only option is how many stages, 1, 2, etc... Since P is a constant (it *is* a constant, right?) you only need to use adders for the 1s, or if there are long runs of 1s or 0s, you can subtract at the lsb of the run and add in at the bit just past the msb of the run. The point is you may not need to use a built in multiplier.

Your filter seems very complex for a feedback filter. Is there some special need driving this? Can you use a simpler filter?

Rick

Vote

M

Mark Curry 10 years ago

This is a complete non-sequitur. Yes, register often in an FPGA. That's a good rule of thumb. NOTHING to do with "having same sampling and processing rate".

Think of it as an analogy - if you implemented this in software, would you force (if you could) the processor to operate on the sample clock? Of course not. You buffer a few input samples, do your processing at the higher speed clock, then buffer your output. Your requirements are, the total processing must complete in one-sample time.

When designing at the higher rate clock, each register is NOT neccesarily a Z-1 sample delay of your function. The register is just a retiming step (i.e. pipeline stage).

A 100 MHz fully combinational multiply is doable in a modern DSP48. Now, whether the rest of the algorithm would fit, I dunno. I'd not design it this way. I'd use the faster processing clock.

I'm confused on you're notation - x[n], and x[n-1] should be same format. But in any event, in cases like these you just need to make sure your scaling of each variable is the same (i.e. align the "decimal" points), and appropriate sign-extend the size of each input.

This is an implementation trade-off you must decide.

Regards,

Mark

Vote

M

Mark Curry 10 years ago

Ok, picking nits - my terminology wasn't clear. But multiplies and adds can be done on an FPGA in 0 clock cycles i.e. pure combinational logic. My notation (which I think is common), is counting pipeline cycles. Both add, and multiply (and quite often both) can be done within 1 cycle, such that you can register just the final output, and use that final output as a new input on the next iteration.

Which is, what I think the OP is (correctly) worried about. His feedback term is needed for the next calculation, so he can't fully pipeline. His requirements are "Processing must be complete in one sample time." Where as in general, full-pipelined designs requirements are just "Can accept another input in one cycle time"; Output may appear (some reasonable) number of clock cycles later.

Regards,

Mark

Vote

L

Les Cargill 10 years ago

x need not be a vector, does it? I think it works out to be a single value. That may matter for an FPGA implementation.

Les Cargill

Vote

K

kaz 10 years ago

The OP appeared on dsprelated.com first where dsp guys know everthing about fpgas but then migrated here thankfully.

The guy has posted there a link to a doc written by same dsp guys who control that forum. It is a leaky integrator based dc filter followed by a modification where the quantisation error is added back to the loop.

The double equations given here are misleading.

My suggestion is just implement as per filter2 in the diagram and forget about equations. It is there ready for you. and if you use P as power of 2 it might be enough for your resolution and so mult needed.

your input is 16 bits unsigned?? that means dc offset, I believe the design is meant for signed.

regarding bit growth: 16bits after addition/subtraction => 17 bits. for feedback use 16 bits. Truncation error is meant to help that.

If you get into fmax issues then I hope dsp guys will come to help!!

Kaz

--------------------------------------- Posted through

formatting link

Vote

T

Tim Wescott 10 years ago

In this case the x[n] notation means "x at sample time n", where "n" means "today".

Basically the notation that the OP used.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

B

b2508 10 years ago

Hey, these are only two sentences next to each other, I didn't mean that I register because sample and processing rate are the same :-)

--------------------------------------- Posted through

formatting link

Vote

B

b2508 10 years ago

delay.

t2=t1+1.

noting

to

-

and

faster

integer.

with

loop

make

I do not really know the value of P or how to determine it. I was thinking to use 0.99 because I tried it out in software simulation and it seems to do what I wanted it to do. The idea for this filter came from this article / second filter on Figure 2.

formatting link

Someone said to forget equations and do as it is drawn in figure, but these figures never account for potential latency of the add/subtract/multiply blocks or if I do not add registers, then I may have timing issues.

Anyway, I will try to do add and multiply in one clock cycle and see where this gets me.

Thank you all very much anyhow.

--------------------------------------- Posted through

formatting link

Vote

R

rickman 10 years ago

About the timing issues. Try it without extra registers first. Then if you have problems you will need to find ways to address them. Your calculation can not work if you add more register delays.

Rick

Vote

M

Michael Kellett 10 years ago

t3=t1+1.

e1

is

requirements.

sample

users

combinational

output,

can

you

unsigned

Feedback

adding.

right?)

1s

just

built

thinking

to

article

have

where

Here's a neat way to do it (if using Xilinx parts) and quite alot of helpful discussion:

formatting link

MK

Vote

A

Allan Herriman 10 years ago

I have successfully implemented a DC blocker in VHDL based on the information in that paper. It was less that a page of VHDL.

Instead of multiplying by something like 0.99, multiply by (1-1/(2**N)) (for some fixed integer N, e.g. 6 or 7). This can be done with just a shift and subtract, and the low pass filter can be done all in one clock cycle.

This is a DC blocker, after all, and the position of the pole probably isn't all that critical.

Regards, Allan

Vote

DC Blocker

Join the Discussion

Didn't find your answer?