I think the solution is obvious.
Configure for falling edge interrupt (I assume from your post that LOW is ACTIVE).
In the interrupt, do the following:
1) Clear the interrupt flag
2) Verify that input pin is still ACTIVE. If it is not, exit silently (no event).
3) Do a CPU timed busy-wait for 2us (minus typ/max interrupt latency up to this point). You might need inline asm or a calibrated function for this purpose.
4) When the busy-wait function returns, check the interrupt flag again. If there was another interrupt event, go back to 2).
5) Verify that input pin is still ACTIVE. If it is not, exit silently (no event).
6) Accept the event and exit the interrupt function.
The interrupt function will consume 2us plus overhead. It will be as accurate, as your prediction of typ/max interrupt latency is. That depends mostly on the usage of "critical section" primitives in the rest of your code. The LPC1768 contains an ARM core, so you could reserve the FIQ (fast interrupt) for this purpose only, and use IRQ (slow interrupt) throughout the rest of your code (including OS). That would allow you to never disable interrupts (FIQ) and thus never suffer software-induced latency. You could also move the vector table and the FIQ handler code (which is the function described above) and all of its variables / subfunctions, in an area that has predictable memory timing. I'm not aware of the LPC1768 details, but on other archs you can use the TCM or disable i-cache via MMU for the memory area in question. The memory access doesn't have to be fast, it must just be very very predictable.
Using these tips, you can achieve very reliable acception/rejection. The imperfections are:
A) There is a "blind window" which is from detection of the edge to the clear interrupt flag. If there is a spurious HIGH in this window, it will not be detected. You could possibly extend the proposal, using your counter mode to improve on this imperfection.
B) That most chips (and certainly the LPC1768 too) synchronize the input pin before detecting edges. Thus, some very fast flicker might happen completely unnoticed by the hardware (and software). You would need to move the pulse detection into a specially designed hardware circuit to handle this better.
C) The sampling of step 4 and 5 are not taken from the same-moment-in- time. Therefore the pulse must be longer than X to be accepted reliably, but smaller than Y to be rejected reliably. There is a window in which a pulse may or may not be rejected. This is the crux of using software instead of hardware. Again, you can only handle this better in a specially designed hardware circuit.
Best regards Marc