You said you needed a 64 bit machine.
You didn't understand the comment about how a 32 bit machine could be as fast or faster.
No you don't. You need speed. A very fast 32 machine would do fine. An extremely fast 8 bit machine would also work. You seem not to really understand the performance issues. The highest speed is more likely to be a very long instruction word processor not one with the widest data bus.
There is more setup time in doing that but yes you could optimize them. On the other hand, if exteme speed was the goal, The cache hardware of a 32 bit machine could be optimized to do the bulk of it much faster than the 128 CPU could.
Actually in high performance code they are very rarely used.
Have you every tried to look to see what percentage of the CPU time of a well designed program goes to moving strings around? I have written quite a bit of code in Borland Pascal. Some of this code I sold as a finished product. The only cases, that I can think of, where a single operation type set the speed did not involve the speed of string operations.
If you are really after high speed math operations a DSP is the way to go. If you are after very fast integer and string operations a very long instruction word RISC machine is usually the best. If the operations can be paralleled an array processor or multiple CPUs is the way to go.
One of the most common tasks that take a lot of time is the FFT. A DSP processor very commonly has special instructions that make this go much faster than a alrger ALU could allow.