In article , alb wrote: }On 3/3/2012 4:32 AM, Charles Bryant wrote: .. data loss due to failure to see some start bits ... }> One possible solution might be possible to do the UART receive function }> in software (this depends very much on how the hardware works). By }> setting the baud rate on the FPGA over 10x the true speed, it sees }> every bit as either a 0xff or a 0x00 character. If you can react to }> the interrupt fast enough and read a suitable clock, you can then }> decode the bits in software. Of course if the FPGA is failing to }> deliver characters, this is no better. } }the 0xFF has a good chance to go completely lost. The method you suggest }may reduce the problem of recognizing bytes to the problem of delivering }the bytes. Then extra encoding should be added to recover the loss of bytes.
If you can set the receiver clock fast enough you won't lose any bytes. For example, if you set it to 16x, and suppose the true bit-stream is 0000101101 (ASCII 'h'). Then the apparent bit-stream is
0000000000000000000000000000000000000000000000000000000000000000111111
1111111111000000000000000011111111111111111111111111111111000000000000
00001111111111111111
(wrapped for convenience). Assuming this starts after lots of '1' bits, and receiving a '0' stop bit is merely reported as a framing error and doesn't affect synchronisation, this gets interpreted as:
s........Ss........Ss........Ss........Ss........Ss........Ss........S
0000000000000000000000000000000000000000000000000000000000000000111111
__________00________00________00________00________00________00________f8
s........Ss........Ss........Ss........Ss........Ss........Ss........S
1111111111000000000000000011111111111111111111111111111111000000000000
____________________00________e0____________________________3f________00
s........Ss........S
00001111111111111111
__________f8
When you get an interrupt reporting a character, you note the time since the last such interrupt (I believe your CPU has a built-in timer which can count at high speed and which might be useful for this). Then you work out approximately what bits must have been received based on both the character and the time. Since each real bit is seen as sixteen bits, even if one is missed, this only introduces a small error, so although you don't get an exact match to any valid pattern, you're much closer to one than any other.
Specifically, if an interrupt is T bit-times since the last one, then there must have been T-10 one bits, a zero bit, the bits in the character received, and a stop bit (0 if framing error was reported, 1 otherwise).
When a 0 bit is missed, then there were T-1 ones, two zeros (the missed zero must be the first of these), the bits in the character, and the stop bit. But since sixteen of these bits make one real bit, the difference between T and T-1 is never big enough to flip a real bit.
Having said all that, you might not be able to change the receive clock without also changing the transmit clock, in which case it won't help. Similarly, if you can't time the interrupts to sufficient accuracy, it won't work.
}If you plot the number of failed packets [1] with the position in the }packet which had the problem, you will see an almost linear increasing }curve, hence the probability to have problems is higher if the packet is }longer. } }At the moment we don't have any re-transmitting mechanism and the rate }of loss is ~0.5% on a 100 bytes packet. We want to exploit the 4K buffer }on the transmitter side in order not to add too much overhead, but it }looks like the rate of loss will be higher with bigger packets. } }[1] we send a packet and echo it back and compare the values.
That suggests that the receiver only resynchronises in a gap. The solution suggested elsewhere of two stop bits sounds very promising (some UARTs can do 1.5 stop bits and that might be enough). Otherwise a simple re-transmission scheme tailored to the fault might be good. Here is an example:
Each packet starts and ends with FF. Unlike typical framing schemes, the start and end cannot be shared. This guarantees that an error is confined to one packet.
Other than the start and end bytes, all bytes in the packet are escaped. (e.g. byte FF becomes CB 34, CB becomes CB 00).
The last two bytes in the packet are a CRC.
The byte before the CRC is an acknowledgement number.
If a packet is at least four bytes, the first byte is the packet sequence number.
This gives a packet like this:
FF SS DD DD DD...DD AA CC CC FF ^^^^^^^^^^^^^^^^^^^^^^^^^ these are escaped as necessary
When the receiver gets a packet if the CRC is bad, or if the sequence number is not the next expected, ignore the packet. Otherwise accept the data or ack.
The transmitter sends continuously. If it has no data to send, it sends just ACKs (i.e. packets with just AA CC CC). Otherwise it sends a packet with this loop: 1) send the packet (SS DD...DD AA CC CC) 2) send ack (AA CC CC) until we have received at least X bytes and a valid packet 3) if our sent packet has been acknowledged, this one is done 4) else goto 1
Step 2 avoid the need for a timer. The value X depends on the round-trip delay. Since the AA field is at the *end* of a packet, we know when we receive a packet that it reflects the remote receiver's last packet at a time that is a fixed interval in the past. e.g. if the round-trip time is five bytes (to allow for buffering in the UART etc) then when we send the FF framing byte we know that any AA we receive in the net five bytes could not possible acknowledge the packet we just sent, if the remote happened to be just about to send the AA field, we might get AA CC CC FF, so any packet which ends more than 9 bytes after we sent the FF of our packet should acknowledge our packet, so we would use a value of about 10 to make re-transmissions be as prompt as possible. (This depends on the protocol being implemented at a character-by-character level at each end. If you had a higher-level view whereby hardware was given a complete packet to send at once, the ACK no longer benefits frombeing at the end and the timing is more complex).
Obviously the SS numbers in one direction have AA numbers going in the opposite direction.
The overhead of this scheme is 129/128p + 4+X for a packet size of p. It could be made lower by allowing more than one packet to be in flight at once, though that makes it more complex and costs more when a re-transmission is needed, unless you add even more complexity and have selective acknowledgements.
If you have a suitable timer available, the sending of ACKs in step 2 can be omitted (e.g. possibly saving power by not running the UART continuously).