Yes, cpus with CAS and other locked instructions (like atomic read-modify-write sequences) need bus lock signals. These are quite easy to work with from the software viewpoint, and a real PITA to implement efficiently in hardware in a multi-core system with caches. Thus you get them in architectures like x86 that are designed to be easy to program, but not in RISC systems that are designed for fast and efficient implementations.
CAS can be useful even on a single cpu, if you have multiple masters (DMA, for example). And CAS or LL/SC can be useful on a single cpu if you have pre-emptive multi-tasking and don't want to (or can't) disable interrupts.
On a small processor like yours, disabling interrupts around critical regions is almost certainly the easiest and most efficient solution.
(If I were making a cpu, I'd like to have a "temporary interrupt disable" counter as well as a global interrupt disable flag. I'd have an instruction to set this counter to perhaps 3 to 7 counts. That's enough time to make a CAS, or an atomic read-modify-write.)
There is a whole field of possibilities with locking, synchronisation mechanisms, and lock-free algorithms. Generally speaking, once you have one synchronisation primitive, you can emulate any others using it - but the efficiency can vary enormously.