Okay. Do you want pure sinewaves to better than 16-bit resolution using only 12-bit values? Uwe Bonnes points out that one lookup arrangement can work for multiple sinewave generations. You can use a coarse lookup for a first step and produce a pure-sine result using a little math. sine(a+b)=sin(a)cos(b)+sin(b)cos(a) and, for very small b, sin(a+b)=sin(a)+b*cos(a) which leaves you with the need to convert your "b" delta from a 2^n phase difference to radians. If I remember my investigations properly, you can get 18 bits pure sine accuracy with the single lookup with 2 multiplies even at extremely low audio frequencies.
To get past the psychoacoustic issues, you can generate your 12-bit values by taking your more-precise sine values and delta-sigma modulate the result providing very good audio.
You should be able to get this to run with a large number of independent sine generators. You can even run multiple DDS generators using a BlockRAM to store both the accumulator value and the phase increment value. For 36 bits, you can get 256 independent DDS generators running to produce independent phase values for your sine lookups.
You can do a huge amount of work in FPGAs when the needed frequencies are low. If you're only running at 30 MHz, you can do significantly more work by increasing the clock rate in your part. You should be able to run nearly
2k 18-bit accurate sinewaves through one sine-interpolated sinewave LUT structure. 300 clocks per sample, ~30 MHz external clock, ~180 MHz internal clock give ~1800 pipelined mechanisms.Insults weren't intended. The frustrations in real engineering come from loosely defined specifications. If the true needs are communicated, engineers can deliver "precise" solutions that give the necessary psychoacoustic results.