logic partioning -- why not after mapping

You'ld be dead in the water without clustering the local signals. The mapper usually destroys any hint at what is local and what is not. The reason we have issues with this are simple: the languages don't allow the user to easily specify and group those signals that are not local. (The mapper would need to maintain that information.)

The languages people use to code their logic are lame in this respect. They don't allow people to contextualize their nets or groups of nets. This is the main area HDL producers seem to forget. You want local signals mapped in with local objects. Current languages treat all nets as local signals really, even though the mapper can extract some of this from the name hierarchy. However, if you want to go between partitions (be it a single object/core, groups of objects/cores, chips, groups of chips, boards, groups of boards, etc.) you need to be able to define a net structure as a protocol set. That structure needs to include all the clock signals and every such thing that is necessary for synchronizing that data between partitions. You also need to be able to attach communication cores to it that are automatically inferred by the partitioning software. The other thing we do here by making this user-defined is that we drastically reduce the search space for the partitioning software.

Consider that you want to split your algorithm across multiple chips. (Or you want to design software to do this.) Consider how fantastic it would be if you had a limited network of a synchronous data set that included enables, clocks, stalls, etc. More importantly, you would need to be able to specify a data rate for that data set. You also had a selection of communication cores to use to break that apart: one for the V2outgoing, one for S3 incoming, one for x86 host driver, etc. The possibilities are endless. The point is, that by having a breakable data set with known rates, we could partition our algorithm to take full advantage of the available hardware, without having to code each chip (or corner of the chip) individually.

Cray has pretty much given up on the FPGA market because nobody produces tools to do this in a mixed-language, mixed-host fashion. I'm quite sad about that. I had hoped that they and SGI would push this issue into the main HDL table.