AFAIK, GPU cores are vector units: an instruction decoder unit follows the program, directing N register files and ALUs and memory buses to do the operations en masse.
And so there are two kinds of cores: the ALU parts, and the instruction parts that control them. I forget which kind is which, when you're shopping for GPUs -- I thought it's that, CUDA cores (to use NVidia for example) are the ALU parts, and there's another term for the top level sets-of-cores parts.
So there are some problems that can be computed very easily with these: anything where the same operation is performed upon huge swaths of data, of course, are fine (linear algebra). Less so if you have, say, loops per pixel, like ray tracing, or, rendering a fractal. There are some tricks you can do, like using arithmetic as conditionals (instead of "(condition) ? (expr1) : (expr2)", you use "(condition == 0) * (expr1) + (condition != 0) * (expr2)"). But anything that requires branching/looping, per element, that can't be optimized this way, is right out.
In contrast, a pile of general purpose compute cores, that do allow branching per element, could solve that kind of problem quickly.
Tim