Profile the code. If an instruction cache isn't being reloaded then it doesn't need to be associative. If cache use is small enough the difference with a smaller simpler cache will be more noticable.
Yes it will be if you have any 8 byte loops. Profile your code to determine the value for you.
The only problem with embedded cache and pipelined processors subjected to pre-emptive interupts is that the very low frequency of worst case events complicate testing strategy.