Hello again,
After giving some thought about this issue, I'd like to show my own solution (i.e. answer my own question).
But before that, my opinion about multicycles: I'm generally in favor of using clock enables instead of multiple clocks whenever possible. The worst thing about being generous with clocks is that each core has its own set of clocks, and before you know it, you're out of clock resources.
Personally, I try to keep my HDL reusable, which generally means to use device-dependent resources as little as possible. For example, I don't know of a portable way to make a pipelined multiplier (I know, of course, how to do that with Coregen).
So to summarize my opinion, I do prefer a single clock, with clock enables within the core, in order to make it easier to reuse. And this brings up the need for a reliable way to define multi-cycle paths.
----------------------
Now to my suggestion: The problem is that paths which end up in the flip-flops' CE pin are taken for multi-cycle paths. After all, the CE is sampled by the flip-flop on every clock. So let's exclude exactly those paths.
With the example I gave in the beginning, let's change the UCF to:
NET "clk" TNM_NET = "clk"; TIMESPEC "TS_clk" = PERIOD "clk" 10 ns HIGH 50 %; NET "en" TNM = FFS "tgrp_en"; TIMESPEC "ts_multipath" = FROM "tgrp_en" TO "tgrp_en" "TS_clk" * 4;
PIN "*.CE" TPSYNC="tgrp_ces"; TIMESPEC "ts_no_multipath" = FROM "tgrp_en" TO "tgrp_ces" "TS_clk";
The first four lines are like before (except I've corrected "tgrp_en" to include flip-flops only).
And then comes the fix: The two rows next pick all paths which start at multi-cycled flip-flops, but end at CE pins. These paths are returned to their original, single-cycle timing.
So what we have now is this: If a path begins at a flip-flop which changes value every 4 clocks, and ends at a flip-flop which samples at the same clock enable, that path can take longer. This statement is now true, because we've excluded paths that end on the CE pin, which is sampled on every clock.
Note that if the relevant paths are crossing clock domains or are used against flip-flops which sample on reverse edge of the clock, we might mess things up. So I would suggest using a more specific definition for "tgrp_ces" ("*.CE" is just for the example).
Does anyone see a problem with this?
By the way, I *was* somewhat worried that the timing calculation would change. It wasn't clear to me if referring to the flip-flop's CE pin in the TPSYNC constraint rather than using the INST/TNM pair for the entire flip-flop, would change the timing endpoint, resulting in a different timing calculation. But it turns out that the tools include Tceck (CE to clock) in both cases, which results in identical calculations (at least where I managed to compare them).
So what do you say about this?
Eli