Hello,
my experiment has shown that high-quality sine wave can be easily generated using linear interpolation. A simple look-up table and a small multiplier is necessary, however I use two LUTs instead of one, because this reduces complexity of the circuit. But for a specialized IC the one LUT-based approach should be considerably cheaper. The idea is as follows: the first LUT contains 256 unsigned 18-bit sine values in the interval of [0,pi/2), sampled uniformly. The second LUT contains differences between consecutive samples (as I said above, this can be computed on-line, but is not well-suited for an FPGA chip because of long latency compensation paths, i.e. many wasted LEs). This ROM has only 11 bits, because max{(sin(2*pi*(k+1)/256) - sin(2*pi*k/256))*2^18} is just 1609. An 18-bits wide phase word is composed of three parts: quadrant_indicator (the upper 2 bits), lut_index (the next 8 bits) and phase_residue (remaining 8 bits). This provides:
my_sin(x) = sign * lut_val[lut_index] + lut_dif[lut_index] * phase_residue;
Of course sign, lut_index and phase_residue depend on currently selected quadrant, but this just a trivial remark.
This simple scheme provides sine wave with 17 bits of accuracy, which can be used directly to feed a quadrature mixer. Another important property is that the interpolation error near pi/2 is negative (i.e. forall x . |my_sin(x)| < 1), so there is no need for guard bits. I have some ideas how to further increase accuracy, but I am not sure whether I should start developing them, because
17 bits are far better than any modern digital RF front-end expects.I have implemented a complete quadrature mixer (not just an NCO) on a multiplierless Cyclone 1C6 and it occupies only 815 LEs, where the majority of them is consumed by a 17x16 shared multiplier. Its top performance is limited by M4K RAMs and for -6, -7 and -8 speed grades this is, respectively, 255.9 MHz, 226.3 MHz and 197.0 MHz.
So, this low cost and low complexity method needs 256 times smaller number of ROM cells than classical ROM-based designs and much less LEs than CORDIC-based approaches. Moreover, this is achieved without signal quality nor performance degradation.
Best regards Piotr Wyderski