The Mathworks is offering more than 1600 dB of attenuation (2023 Update)

- S
- Simon S Aysdie
  
  Contact options for registered users
posted
2 years ago

Thu, Jan 6, 2022 11:29 PM

formatting link

Those plots go to -1800 dB. What an incredible toolbox.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Jan 7, 2022 8:52 AM

That is perfectly possible depending on luck. The smallest numbers that can arise in floating point are O(10^-308) which is -6160 dB (actually denorms can go even smaller 10^-323 with lost precision)

They are not at all meaningful beyond about 10^-17 or so allowing for the typical 53 bit mantissa and 64 bit intermediate results.

Realistically any plot going beyond -320dB is into rounding error noise.

You can occasionality get -inf if the computation produces exact zero.

I defend against it by adding 1e-20 which is different enough from the nearest real non-denormalised answer 2.2e-16 to be obvious and doesn't corrupt the output dataset in ways that disrupt further processing.

That happens sometimes in my high precision calculations for easier problems with a near analytic solution. It is a bit annoying since it causes discontinuities in otherwise smooth residual error curves.

- J
- Jeroen Belleman
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Jan 7, 2022 10:46 AM

Those plots go to -1800 dB. What an incredible toolbox.

It's a joke! Excepting some special circumstances, you don't usually care what happens below -80dB or so.

Jeroen Belleman

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Jan 7, 2022 3:43 PM

Denormals are a huge pain. Nice enough in theory, of course--why throw away information you could keep?

The problem is that it's a waste of good silicon to make such marginal creatures fast, so they aren't.

In early versions of my clusterized FDTD simulator, the run time was usually dominated by rounding error causing the simulation domain to fill up with denormals until the actual simulated fields got to all the corners.

I couldn't fix it by adding a DC offset, because that would go away completely in two half steps, so I filled all the field arrays with very low-level random noise. Sped some simulations up by 100x.

At the time I was using the Intel C++ compiler, which didn't have an option for flush-to-zero on underflow.

Cheers

Phil Hobbs

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Jan 7, 2022 10:12 PM

I do much the same.

In the analog parts (remember them?) of the implementation, it's uncommon to achieve more than maybe 100 dB of isolation, and 80 dB is more common.

Joe Gwinn

- S
- Simon S Aysdie
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Fri, Jan 7, 2022 11:18 PM

10 of 10. That's what I thought. I don't see the value of these -1800 dB plots and I'm not generally suggesting throwing away significant digits in computations. (After all, the polynomials in filter synthesis work can be notoriously ill-conditioned, to put a fuzz-ball term on it.)

In realized physical terms, even -60 dB can (sometimes) be challenging for RF/micro/mm work.

I'd clip the plot before someone laughed at me.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Jan 8, 2022 11:19 AM

Tell me about it. One of my early contributions to that game was noticing that a particular astrophysical plasma simulation was spending all its runtime in interrupts handling denorm underflows. A couple of orders of magnitude speed improvement was very welcome. It needed rescaling to safer territory x ~ h^2/c^3 is just asking for trouble in single precision (it was a fluid dynamics code).

The thing I am working on at the moment involves powers of tan(x)^(2^N) in the range -pi to pi. It gets quite hairy for even modest N and falls over completely for N>5. I have a cunning fix that makes it work for any N < 8 but by then it is almost all rounding error anyway.

They are sometimes better than having it hit zero (though not always).

A bit of judicious random noise can work wonders on breaking degeneracy.

I've been very impressed with the latest MSC 2019 code generator on some of my stuff. It somehow groks tediously complex higher order difference correctors in a way that no other compiler can match. A lucky combination of out of order and speculative execution makes some things run much faster with SEE, inlining and full optimisation all permitted.

2nd order Newton-Raphson and 3rd order Halley are almost the same execution time now and the 4th order one just 10% slower. That's quite a bonus when the functions f(x) and its derivatives being evaluated are S-L--O---W.

In one instance a single apparently harmless line to protect against a rounding error giving an impossible answer had an effective execution time of -100 cycles because it prevented a pipeline stall. I could hardly believe it so I double checked the same code with and without.

if (E<M) E=E+pi;

The really weird thing is that the branch statement is almost never taken except when M is extremely close to pi, but doing the comparison somehow gives the CPU enough recovery time to run so much faster.

I'm slowly collecting a library of short code fragments that give different optimising compilers and certain Intel CPU's trouble.

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Jan 8, 2022 5:25 PM

What are "MSC 2019 code generator" and "SEE"?

I'm assuming that you are coding in C here.

I have one similar surprise to report, but in MatLab:

The behavior of a megawatt power system for a shipboard radar was modeled in Simulink (integrated with MatLab). This was circa 2000. The simulations ran very slowly, but none of us thought much about it, for lack of a comparison.

One day, I was working with the Mathematician who was running the simulation, and idly watching the usual stream of sim-in-progress messages roll by as we talked, and saw a message that I did not recognize or understand. Turned out, those messages were relatively common, but never really noticed in the blather from the sim.

Now curious, I dug into that message. -Saga omitted- It turns out that the simulation was coded (by us the users) in such a way that the solver was forced to solve an implicit equation at each solution time step in a large system of coupled ODEs. So, instead of one or two big matrix operations per step, it was one or two hundred operations per step. Ouch! But why?

The presence of implicit forms was a byproduct of using a block diagram and line language to describe the power system being modeled, being programmed by placing standard blocks and connecting them with standard connection lines on the computer screen. But what made sense and looked simple on the screen was anything but under the covers.

Redesigning and recoding the simulation yielded a 100x speedup.

Joe Gwinn

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Jan 8, 2022 5:55 PM

MS C/C++ compiler under Visual Studio and "SEE" (sic) is a typo for SSE (the extended floating point registers on modern Intel CPUs) /SSE2 works best for me but YMMV.

Even so the compiler sometimes generates hybrid code with the x87 still being used for some parts but not all of the computations.

MSC 2019 appears to have been replaced by 2022 so yet another compiler to check my code against to see if there are any more improvements.

formatting link

One of the tricky quirks I know about is that sincos can either speed things up or slow them down. It depends on whether the result gets used before it has been computed (pipeline stalls are hellishly expensive).

The classic one when I was at university was that inevitably a new graduate student would grind the Starlink VAX to a standstill by transposing what was then a big image of 512x512 with nested loops.

x[i,j] = x[j,i]

Generating roughly a quarter of a million page faults in the process.

There were libraries with algorithms we had sweated blood over to do this efficiently with the minimum possible number of page faults.

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Jan 8, 2022 6:03 PM

Hairy? With a mere 32nd order pole at each end? Surely not!

Yeah, the reason I was using Intel is that it vectorized complicated stuff the other compilers (MSVC++ and gcc) wouldn't touch.

Getting the curl equations to vectorize is the key to making FDTD fast on modern hardware.

Yup. High order methods are the ticket for smooth expensive functions.

Doing good metals with FDTD requires using an auxiliary differential equation for the electric polarization because the Yee (normal FDTD) updating equations are unstable when n < k, or equivalently when Re(epsilon) goes negative.

In the IR, copper, silver, and gold have epsilons that are essentially large negative real numbers. At DC such a material is impossible, because it would spontaneously turn to lava.

I suspect it gives the compiler permission to do some more daring optimizations by preventing that error.

I'd be interested to see it!

Cheers

Phil Hobbs

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sat, Jan 8, 2022 6:13 PM

As one of my old colleagues used to say, "Ah, Labview--spaghetti code that even _looks_ like spaghetti."

Cheers

Phil Hobbs

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sun, Jan 9, 2022 12:50 AM

As would I.

Joe Gwinn

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sun, Jan 9, 2022 4:36 AM

There are tons of people who find these sorts of issues and document them, often to the particular combination of processor and compiler. What works well on one combination can be pathological on another because of changes at the micro-architecture level or in the compiler optimizations. It's a tough job trying to optimize for many combinations of machine and software.

You might ask in c.l.forth. I know plenty of people spend a lot of time tracking such combinations there because they are writing optimizing compilers.

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sun, Jan 9, 2022 7:10 PM

We had the same problems in the early days of neural nets, coded in C, running on VAX/VMS. The app folk never could fathom why reversing inner and outer loop when processing the big connection matrix could make a 1000-to-1 difference in run time, and declined to make the change. Pretty soon, they had exhausted their computer time budget.

Problem solved.

Joe Gwinn

- J
- Joe Gwinn
  
  Contact options for registered users
Vote on answer
posted
2 years ago

Sun, Jan 9, 2022 7:14 PM

Heh. Don't get me started on Labview.

But in the above simulation case, the code was not spaghetti, and was actually beautiful to behold. Too bad about the runtime, though.

Joe Gwinn