I am not convinced that we will see more of the XMOS type of architecture. (Though I do think that they are a very innovative idea, fun to play with, and ideal for some types of problem.)
Basically, it is easier, cheaper, lower power and more developer friendly to have dedicated hardware blocks for peripherals. Sure, an XMOS is capable of making a 100 Mbit Ethernet MAC in software - but at a price much higher than a dedicated hardware MAC. An XMOS has the flexibility to have multiple UARTs and SPIs in software - you choose exactly the number you want, and the pins you want to use. But it is cheaper for an ARM Cortex M microcontroller just to have 5 UARTs and 4 SPI units on the chip even though most people will only use one or two of them.
There are different kinds of application where different solutions are a better fit. The XMOS has a place where neither a normal microcontroller nor an FPGA is the ideal fit - but it is a small slot, as "not ideal but good enough" encroaches from both sides.
What we are seeing more of in newer devices is asymmetric multiprocessing. Rather than choosing between a Cortex A9 with massive processing power but unpredictable and costly interrupt response, or a Cortex M4 with mid-level processing, deterministic interrupts and good control of small peripherals, you can now pick a chip with both cores on board.
It is all about development time - that's what costs. Specialist devices cost more to use, in time and money. There is a decline in the general usage of DSPs, because they are too costly for development - people prefer to use standard microcontrollers or processors (even if the clock rate needs to be higher), or pre-packaged units with all the development work done before (an audio decoder chip, a microcontroller with a graphics unit with video acceleration, etc.).