Here's a single-prop-delay version, which could be done in about 3 ns with one of the tiny-logic parts.
I probably wouldn't do this in production, because the switch thresholds of cmos schmitts aren't very tightly defined. My other circuit is more predictable.
There is likely a variant that uses slow negative feedback that would be demonstrably reliable but preserves the 1-gate delay. Haven't worked that one out, but it feels like there's something there.
John