error detection rate with crc-16 CCITT

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 2:15 AM

You are sending 2*n bits to encode n bits of data. Yet, that encoding will only *detect* a single bit error. Won't *correct* ANY errors. Won't *see* (certain) two bit errors. etc.

I.e., your choice of message encoding has lots of overhead (twice as many bits!) but doesn't give you a corresponding increase in "reliability".

Without understanding what sorts of errors you are likely to encounter, it is hard to design a protocol and encoding scheme that will be resilient to *those* errors.

You can also opt to run at the slower (more reliable) rate ALL THE TIME and encode command messages more robustly than "less important messages". I.e., so command messages have greater Hamming distances (require more bandwidth per bit, so to speak) while less important messages are *compressed* so there is more "data" per bit -- and less protection against corrupted transmission. As such, the compressed data appears to have a higher bandwidth -- at reduced reliability -- even though it is being sent over the same "bit rate" channel.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 5:23 AM

Do you *literally* mean a ring topology? I.e., (excuse the crappy ASCII art)

AAAA ----> BBBB ----> CCCC ----> DDDD AAAA BBBB CCCC DDDD AAAA DDDD AAAA BBBB CCCC DDDD AAAA

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 5:54 AM

te:

e

ems

r

Use a proper forward error correction scheme. You'll be able to monitor the increase in error rate while still getting most packets through. A Reed-Solomon code will allow you to (for example) add 20 bytes to a 235 byte message and correct any 10 bad bytes (and all detect all bad messages with no more than 19 bad bytes). If you're getting a bit corrected every few dozen packets, it's probably safe to bump up the data rate. If it's a couple dozen bits in every packet, it's time to back off. In fact, this can substantially increase your effective data rate, as you can continue to run in the presence of a moderate number of errors (disk drives, for instance, run well into that region, and it's relatively rare these days that *any* sector actually reads "clean," and a very heavy duty ECC code is used to compensate).

You can also improve things by using a multi level scheme, which could be a simple duplication (think disk RAID-1), or some combined code over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish like RAID-6), which would provide added recovery, at the expense of added latency (mainly in the presence of errors). Since you mentioned that you have at least two classes of data (critical and nice to have), apply the second level of FEC to just the critical data (after protecting each packet with an appropriate RS code), and even a substantial spike in error rate, you're likely to get the critical stuff through.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 10:28 AM

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 11:04 AM

I think ddcmp dates to the mid 70's and was originally designed by digital / dec for their decnet network, then updated later for ethernet. Fwir, it is a connection oriented protocol implemented as a multilayer stack, that provided reliable comms between nodes. It had error detection, retries etc much as tcp/ip does. It's a long time since I used decnet, but I know that there are ddcmp protocol specs and other docs out there which describe the whole stack. There is, I think, even a linux decnet protocol driver which might be a usefull bit of code to look at, even if the complete stack is too much for the application...

Regards,

Chris

- S
- Simon Clubley
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 5:17 PM

Yes, it was before the VAX days. (VMS is a part of my day job, so I am familiar with DEC history.)

The Phase IV documents can be found at:

formatting link

I don't know what the current status of the DECnet code in Linux is however as I never use it.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 8:51 PM

A dec document describing the low level protocol, crc, retries and states etc can be found at:

formatting link

I had a previous life working with dec kit and thought I recognised the name, perhaps from the vms group, but were you by any chance a contractor in the mid to late 80's ?...

Regards,

Chris

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 10:48 PM

y

Thanks. Error correction sounds like it would be too CPU intensive. I'd be happy just to detect errors.

Do you have any idea how many bytes we would have to add to a 60 byte message to detect 19 bad bytes or less and how CPU intensive it is?

- S
- Simon Clubley
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Mar 28, 2011 11:25 PM

I'll have a read through it thanks; it's been a long time since I really did anything with DECnet Phase IV.

No, but late 80s/early 90s was the start of my career and I was writing code for the PDP-11 before moving onto VAX then Alpha and taking in a range of other environments along the way.

It's quite possible you ran across me as part of that, especially if you attended the annual DECUS conferences.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 3:54 AM

ce

ity

d

To detect (but not correct) all errors of 152 (19*8) or fewer, you'd have to add at least 152 bits of check code. If you're only looking to detect errors occurring in no more than 19 bytes of the message, it would be a bit less, but not hugely so. Remember that to detect n bits of error, the block has to be different enough from any other valid block that errors in n bit do not make it look like a different valid block.

If you're asking about a RS code as I described above, the short message really doesn't buy you anything, since you need about twice the number of bits worth of RS symbols as the number of error bits you hope to correct.

RS is moderately computationally intensive, but that clearly depends on your data rates, and that hardware you're running on. In fact it has a worse reputation that it really deserves. But to toss some numbers out there, a decent implementation in C, on a 1GHz x86, for a RS(255, 239) encoding (239 bytes of data, plus 16 bytes of check code, or a bit weaker than what was discussed above =96 that=92s a commonly used code in broadcasting, so is well studied and you should be able to find plenty of benchmarks and samples and whatnot), should come in at

100-200Mb/s for encoding (or 10K-20K cycles per block), about half that for decoding blocks without errors, and about a fifth the encoding rate for decoding blocks with the maximum correctable amounts of error. Shorter blocks require less work to process, but it's sub- linear, so your net data rate for a fixed CPU load will go down as block size decreases. And note that 255 bytes is the longest possible block for RS with 8 bit symbols.

On something like an ARM 9, quadruple the cycle counts.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 5:49 AM

The hardware handles the sending of a whole message at a time. The software gives the hardware a whole message to send and gets told when a whole message has been received. This is done by an interrupt routine. The interrupt routine will decide when to switch baud rates or check when the other end is asking to switch so the only penalty is a couple of extra messages and a short delay if the switch works. If the switch doesn't work there's a slightly bigger penalty but we won't be switching often enough for it to matter.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 10:21 AM

Most industrial protocols (like Modbus) are simple half duplex request/response systems.

At any significant line rates, the troughput is severely limited by the line turn-around delays at both ends.

The additional delay (due to autobauding) at new device insertion should not be a significant issue.

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 1:49 PM

Having protocol logic in the Tx/Rx ISRs is a bad idea already.

How a hub type device would work?

What if a node somehow missed the correct baud rate, receiving garbage and responding to it?

How would you verify, troubleshoot and prove the operation?

VLV

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 2:20 PM

Absolutely. Lower level comms drivers should always be transparent to data. You build protocol layers on top of that.

Maybe i'm missing something, but I don't understand what all the fuss is about in this thread. All this kind of thing has been done to death in the past. It would help the op to have a look at one of the original ddcmp protocol specs to see how it should be done, with message flow, state transitions etc. Why keep on reinventing the wheel ?...

Regards,

Chris

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 2:25 PM

From the system point of view, including the end of frame detection (and hence also autobauding) in the ISR makes a lot of sense, since it reduces the number of times the RT tasks needs to be rescheduled.

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 2:28 PM

I just thought the name sounded familiar. I too spent several years doing systems engineering, programming macro and C on pdp and vax. Never attended decus meetings,but was a member and still have some tapes. Worked at dec park, racal, smiths and others during the good old 80's...

Regards,

Chris

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 3:01 PM

There are arguments for and against and there are tradeoffs, different if, for example, you are running a state driven loop, rather than an rtos. One way to get round the problem is to have an incoming fifo big enough to handle data between polling, then a polling function within the fifo module that takes two args, start of frame and end of frame. The function just scans the fifo from time to time, skipping over duplicate sof until a complete frame is recognised, which is then passed to the deframer / protocol handler. The interrupt handler never needs to be disabled and in the unlikely event of fifo full, the data just wraps round, deleting the least recent byte. Acks, nacks, timeouts and retries at the next level up then keep the link reliable. You can drive this sort of thing in many ways, for example, from a timer callback. This is more or less how it's been done for decades...

Regards,

Chris

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 3:09 PM

[scratches head] then why are you worrying about running at a higher rate? Latency might be a reason -- assuming you don't circulate messages effectively as they pass *through* a node. But, recall that you only have to pass through 32 nodes, worst case, to get *a* copy of a message to any other node...

Depending on how you chose to allocate the Tx&Rx devices in each link -- and, whether or not your baudrate generator allows the Tx and Rx to run at different baudrates -- you have to:

make sure your Tx FIFO (hardware and software) is empty before changing Tx baudrate

make sure your "neighbor" isn't sending data to you when you change your Rx baudrate (!)

Consider that a link (a connection to *a* neighbor) that "gives you problems" will probably (?) cause problems in all communications with that neighbor (Tx & Rx). So, you probably want to tie the Tx and Rx channels of *one* device to that neighbor (vs. splitting the Rx with the upstream and Tx with the downstream IN A GIVEN RING)

[this may seem intuitive -- or not! For the *other* case, see end]

Now, when you change the Rx baudrate for the upstream CW neighbor, you are also (?) changing the Tx baudrate for the downstream CCW neighbor (the "neighbor" is the same physical node in each case). Also, you have to consider if you will be changing the baudrate for the "other" ring simultaneously (so you have to consider the RTT in your switching calculations).

Chances are (bet dollars to donuts?), the two rings are in different points of their message exchange (since the distance from message originator to that particular node is different in the CW ring vs. the CCW ring). I.e., this may be a convenient time to change the baudrate (thereby INTERRUPTING the flow of data around the ring) for the CW ring -- but, probably *not* for the CCW ring.

[recall, changing baudrate is probably going to result in lots of errors for the communications to/from the affected neighbor(s)]

So, you really have to wait for the entire ring to become idle before you change baudrates -- and then must have all nodes do so more or less concurrently (for that ring). If you've split the Tx and Rx like I described, then this must also happen on the "other" ring at the same time.

Regarding the "other" way to split the Tx&Rx... have the Tx always talk to the downstream neighbor and Rx the upstream IN THE SAME RING. In this case, changes to Tx+Rx baudrates apply only to a certain ring. So, you can change baudrate when it is convenient (temporally) for that *ring*.

But, now the two rings are potentially operating at different rates. So, the "other" ring will eventually ALSO have to have its baudrate adjusted to match (or, pass different traffic)

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 3:17 PM

I think the OP is pretty much stuck with having logic in the ISR. Consider, the upstream Rx IRQ has to push the received packet into the downstream Rx FIFO to propagate the message around that ring. At the very least, the Rx ISR needs to decide if the message "looks" correct -- at least, "correct enough" to warrant passing it along (presumably, you don't want to propagate errors that were picked up locally -- since those errors can then become UNerrors on the next op...)

If the Rx ISR had to require the intervention of user-land code (or, any higher level of the protocol stack) to move the received packet "out", then the time spent *in* the node would increase dramatically which is reflected in the RTT of the entire ring(s).

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 5:39 PM

One could ask about the wisdom of using a ring topology, which will always involve more latency than a multidrop network using some sort of poll / select or request / response protocol. You must have more than one comms link for redundancy, as any break in the ring isolates any node past the fault. You need double the comms hardware, as each node needs an rx and tx uart. In the presence of faults, a ring topology doesn't degrade anything like as gracefully, as multidrop either. Finally, where does ddcmp fit into the picture ?. Ddcmp is more than just a frame format, it's a complete protocol spec with defined messages flows, state transitions, error recovery etc...

Regards,

Chris