error detection rate with crc-16 CCITT

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 6:16 PM

Yup. I'm *guessing* this was just a "free" (hardware) approach. You'd have to ask the OP for his specific reasons for this approach...

Hence the double ring approach -- with twice the "cost".

Well, to be fair, with a bus topology, anything that can toast/busy the bus indefinitely will shutdown *all* communications. With a ring, one can at least talk to one's neighbor (even if you can't get a reply).

I.e., a node can say "the bozo upstream from me is spouting endless gibberish" (or, "hasn't said anything in *DAYS*") and, therefore, you can hope folks downstream from you propagate this and, as a result, know that they should move to a safe/secure state *and* know where the fault likely lies.

In a bus topology, everyone has to monitor the bus's health independently and there is *no* communication in the event of a failure.

I know of at least one such design in which the "master(s)" had the ability to impress a high voltage on the bus with the intent that a malfunctioning node wouldn't be smart enough to galvanically isolate itself from the bus during this event (this would blow fuses that would, thereafter, isolate the offending node :> ).

No idea how well this worked, in practice. Amusing idea, though! If only *all* problems could be solved with a suitably high voltage ;)

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 10:31 PM

That's true, say with an line turnaround after tx that doesn't, thereby pulling the line down. It's still a lower cost solution than the ring approach. Of course, ethernet fixes that problem, but at the cost of increased software complexity, so there really is no free lunch.

You often need a keep alive packet for ongoing health check, which would fail to get an ack on a hardware failure, but that's still less hw in the path than ring topology, so better mtbf (in theory) to start with.

Horrifying, though I guess you could implement that using zener diodes and limiting r on each node for protection, placed after, say, 50mA fuses, which would be easy to open circuit. Sounds a bit extreme though...

Regards,

Chris

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 10:40 PM

d

o

s

The traditional way to lower latency in a ring is to start transmitting to the next node early - at least as soon as you see that the address (hopefully at the front of the frame) isn't yours. If the frame is bad, you can force that to be passed on by making sure the ending delimiter is transmitted with an indication of error. If you do it right, then in the worst case the bad frame will be pruned at each node, so even if the address has been damaged* (or it was addressed to a non-existent node), it'll get removed in a reasonable amount of time.

*And exactly where the frame is removed from the ring is a design question. Often the sender removes frames it had sent, when they make their way back around, in which case the critical item for removal is the source address (and usually in that case the destination node sets a "copied" bit in the trailer, thus verifying physical transmission of the frame to the destination).

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Mar 29, 2011 11:07 PM

You only need double the comm hardware if you want redundant rings (I'm not sure if that was the point you were making or not). On a single ring (or one one of a pair of double rings), you just need a funky cable, and connect Tx clockwise, and Rx counterclockwise (or vice versa) on the ring. You can add some redundancy by adding some relays to physically bypass inactive nodes from the ring, and deal with wire breaks too with an additional pair of wires, and some more switch gear, but...

...at what point does one just give up on all that custom work, and just toss on a cheap Ethernet port, and invest in a basic switch? After all, you can get single quantities of a PIC with Ethernet for $4.50 (including pretty much everything except the magnetics). And that neatly solves bandwidth issues too (if 10Mb isn't enough for some nodes, you can always give those nodes faster interfaces, and two nodes talking don't take bandwidth away from other nodes that might want to talk (assuming a proper switch).

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:38 AM

[attributions elided]

Yes, but this only makes sense in an environment where errors are "infrequent". Note that the OP is talking of error rates as high as "19 bytes out of 60". In that case, there's just too great a chance that you will start passing a corrupted message with little chance to RELIABLY mark it as such "on the tail end" (i.e., your error indication stands a good chance of being corrupted -- what I playfully called "UNerrors")

As a policy, I don't like (implicitly) "blessing" anything that I don't have a high degree of confidence in ("I" being code that I write).

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:43 AM

I don't know if it was a result of a client requirement or just paranoia on the part of the system designer(s) -- the system was designed for 24/7/365 unattended operation in a hostile physical environment (with no on-site personnel to even determine if something was "malfunctioning"). When you have no *practical* alternatives, I guess you just do the best you can!

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 11:41 AM

I found out today ddcmp was used purely because it calculated the CRC for us. All it does is the framing. All the state transition and error recovery stuff is turned off. Using ddcmp was probably a mistake because ccitt crc can be calculated quickly enough and soon we'll be doing a new version of this device with a different micro which will have to be compatible with the existing device so will still have to ddcmp but without the hardware support.

I'm trying to improve the propagation delay of messages around the ring without requiring the customer to fit twisted pair cable everywhere and I'm also trying to improve the error monitoring so we can signal when a connection isn't performing well enough without creating nuisance faults, hence my interest in the error detection capability of crc16-ccitt.

We actually already do have an RS485 multi-drop version of this protocol but it's non-deterministic and doesn't work very well. I don't really want to go into that...

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:12 PM

Asynchronous with a pair of uarts, one for clockwise, one for counter- clockwise.

Hmm. I forgot about that. A start or stop bit error means the whole message is rejected which is good.

Because not all sites can wire as a mesh. The third port is optional but helps the propagation delay a lot.

This is assured. It's half duplex and the hardware sends a whole message at a time.

Not sure I follow but a single uart does both the tx and rx to the same neighbor.

Yes

What is RTT?

I'm lost here.

I think there must be a misunderstanding somewhere - not sure where.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:20 PM

c.

oops, I made a mistake here - one uart for neighbour one, another uart for neighbour 2. Tx to neighbour one is the CW data (say) and tx to neighbour 2 is the CCW data.

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:31 PM

So you using some hardware device with internal crc hw ?. Just curious, which device ?.

Two unknowns: Max cable length between nodes and max baud rate ?. Assume that you are currently running unbalanced rs232 style cabling ?.

If you are limited on baud rate due to cable length, you might be able to compress the data. A recent project for led road signs was limited to 9600 bauds, but the screen update requirement of 1 second max meant that we had no option but to use compression.

Sounds like a better place to start, from a technical point of view :-)...

Regards,

Chris

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:34 PM

It's half duplex so we can't start transmitting early.

In our case, there are two logical rings and each message placed on the ring is sent in both directions. When the two messages meet up approximately half-way round they annihilate each other - but if the annihilation fails, the sender removes them as well.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 12:53 PM

Motorola 68302

It's RS485 but apparently a variety of cable gets used, not always twisted pair.

0

I don't know much about compression but it sounds too CPU intensive for the 68302? What micro are you using?

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 2:33 PM

So, you have to buffer a message, verify it's integrity and then push it on to the next node. This suggests it is either done

*in* the Rx ISR *or* an ASR running tightly coupled to it (else you risk adding processing delays to the propagation delay).

I.e., the time a message takes to circumnavigate the ring is ~K * n where K reflects message size, baud rate and per node processing.

How does the sender *recognize* them as "his to annihilate"? I.e., if the data can be corrupted, so can the "sender ID"! The problem with a ring is that it has no "end" so things have the potential to go 'round and 'round and 'round and...

If you unilaterally drop any message found to be corrupted, then you have no way of knowing if it was received by its intended recipient (since you don't know who it's recipient is -- or was). If you await acknowledgment (and retry until you receive it), then you run the risk of a message being processed more than once. etc.

- C
- ChrisQ
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 4:26 PM

Not wishing to offend, but this sounds like a legacy project that was originally ill thought out and a bit of a hack to start with. You used the ddcmp frame format, but didn't implement the full protocol. The system wiring is non RS485 conforming, so susceptable to noise related errors and line drive problems. Data reliability is exactly what protocol definitions like ddcmp are designed to address.

I think you will have to at least rewire with twisted pair before addressing any sw issues. If the hardware is bad, then no amount of software will fix the problem...

The project used the Renesas 32C87 series from Hitachi. Not such an elegant arch as 68k, but almost certainly faster than a '302. Depending on the data, simple compression like huffman encoding can work quite well, but another way might be to simplify / reorganise the frame format or data within it, so you can send fewer bytes...

Regards,

Chris

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 5:14 PM

OK. Been there, done that, T-shirt to prove it...

My point was that if you *miss* a start bit, then you have -- at the very least -- missed the "first" bit of the message (because, if it was MARKING, the UART just ignored it and, if it was SPACING, the UART thought *it* was the start bit). If you are pushing bytes (characters) down the wire at the maximum data rate (minimal stop time between characters), then you run the risk of part of the *next* character being "shifted" into this "misaligned" first character. I.e., it gets really difficult to figure out *if* your code will be able to detect an error (because the received byte "looks wrong") or if, BY CHANCE, the bit patterns can conspire to look like a valid "something else".

Sorry, the subject wasn't clear in my question > Latency might be a reason -- assuming you

So, for each ring, you WON'T receive a message until you have transmitted any previous message? Alternatively, you won't transmit a message until your receiver is finished?

What prevents two messages from being "in a ring" at the same time (by accident)? I.e., without violating the above, it seems possible that node 18 can be sending to node 19 (while

19 is NOT sending to 20 and 17 is not sending to 18) at the same time that node 3 is sending to node 4 (while neither 2 nor 4 are actively transmitting).

Since this *seems* possible, how can you be sure one message doesn't get delayed slightly so that the second message ends up catching up to it? (i.e., node 23 has no way of knowing that node 24 is transmitting to 25 so 23 *could* start sending a message to 24 that 24 fails to notice -- in whole or in part -- because 24 is preoccupied with its outbound message)

Round Trip Time (sorry :< ) I.e., you (each of your nodes) has to be aware of the time it takes a message to (hopefully) make it around the ring.

Number the nodes 1 - 10 (sequentially). The CW node has 1 sending to 2, 2 sending to 3, ... 10 sending to 1. The CW node has 10 sending to 9, 9 sending to 8, ... 1 sending to 10. The nodes operate concurrently.

So, assume 7 originates a message -- destined for 3. In the CW ring, it is routed as 7, 8, 9, 10, 1, 2, 3. In the CCW ring, it is routed (simultaneously) as 7, 6, 5, 4, 3.

*If* it progresses node to node at the exact same rates in each ring (this isn't guaranteed but "close enough for gummit work"), then it arrives in 8 & 6 at the same time, 9 & 5, 10 & 4, 1 & 3, 2 & 2 (though different "rings"), 3 & 1, etc. (note I have assumed, here, that it continues around until reaching it's originator... but, that's not important).

Now, at node 9, if the CW ring decides that the baudrate needs to be changed and it thinks "now is a good time to do so" (because it has

*just* passed it's CW message on to node 10), that action effectively interrupts any traffic in the CW ring (until the other nodes make the similar baudrate adjustment in the CW direction).

But, there is a message circulating in the CCW ring -- it was just transmitted from node 5 to 4 (while 9 was sending to 10). It will eventually be routed to node 9 as it continues it's way around the CCW ring. But, *it* is moving at the original baudrate (in the CCW ring) while node 9 is now operating at the *new* baudrate (in the CW ring). So, any new traffic in the CW ring will run around that ring at a different rate than the CCW traffic. If you only allow one message to be active in each ring at any given time, then this will "resolve itself" one RTT later. But, if the "other" ring never decides to change baudrates... ?

And, if it *does* change baudrates at the same time as the "first" ring, then you have to wait for the CW message to have been completely propagated *and* the CCW message as well before making the change. I.e., you have to let both rings go idle before risking the switch (or, take considerable care to ensure that a switch doesn't happen DOWNstream of a circulating message)

You can wire two UARTs to give you two rings in TWO DIFFERENT WAYS. Look at a segment of the ring with three nodes:

------> 1 AAAA 1 --------> 1 BBBB 1 --------> 1 CCCC 1 ------->

AAAA BBBB CCCC AAAA BBBB CCCC

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 5:49 PM

What sort of times are you seeing, presently? At which baudrates? How much *better* do they need to be (or, would you *like* them to be)?

It's relatively easy to get deterministic behavior from a 485 deployment. And, depending on the *actual* operating conditions of the current ring implementation, could probably achieve lower latencies at lower baudrates.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 10:39 PM

It's done in the Rx ISR. Actually as well as being half duplex the hardware sends and receives whole messages though we could change that

Yes.

ke

The message would have to get corrupted undetected every time around the ring to go round forever.

Each device that puts a message on the ring puts his own address at the start plus a one byte incrementing sequence number. Each node keeps a list of address/ sequence #/ received time of the last X messages received. If it's seen the address/ seq# before within a certain time, it removes the message from the ring.

Every message a node transmits has an incrementing "ack" byte that the next node sends back in its next message. If the ack byte doesn't come back correctly the message is sent again. If the ack is lost and a retry is sent, the receiver throws the message away because he's already seen it.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Mar 30, 2011 10:56 PM

We've limited the ring to 32 nodes up till now at 57600 baud. The nominal target now is 64 nodes for which I've calculated a worst case request response time of approx 1.8 seconds with no retries. I would like it to be about one second. 64 nodes max is a bit arbitrary so what I'm really trying to do is get the best performance that's reasonably achievable. Some sites have well over 64 devices but not all on the same ring.

Some of the devices that connect to the multi-drop network are old and low-powered and a token ring was too much overhead at the time. Also we require redundancy which needs 4 wires for multi-drop but only 2 wires for the ring.

- S
- Shane williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Mar 31, 2011 12:56 AM

Hi, I'm out of time today. I'll get back to this tomorrow. Thanks.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Mar 31, 2011 6:14 AM

[8>> I'm trying to improve the propagation delay of messages around the

OK, back-of-napkin guesstimates...

Assume 10b character frames transmitted "flat out" (no time between end of stop bit and beginning of next start bit). So, 5760 characters per second is data rate.

If we assume N is size of packet (in "characters"), then RTT is

64 * [(N / 5760) + P] where P is time spent processing message packet on each node before passing it along.

1.8s/64 = [(N / 5760) + P] = ~30ms

Guessing at a message size of ~100 bytes (characters) suggests the "transmission time" component of this is ~17ms -- leaving 13ms as a guess at the processing time, P.

If you cut this to ~0, then you achieve your 1 sec goal (almost exactly).

This suggests that eliminating/simplifying any error detection so that incoming messages can *easily* be propagated is a goal to pursue. If you can improve the reliability of the comm link so that errors are the *exception* (i.e., unexpected), then you can simplify the effort required to "handle" those errors.

Furthermore, if errors *are* The Exception, then you can consider running the interface(s) in full duplex mode and starting to pass a packet along to your successor *before* it is completely received. This effectively reduces the size of the message (N) in the above calculation.

E.g., if you can hold just *10* bytes of the message before deciding to pass it along, then the "transmission time" component drops to 1.7ms. Your RTT is then 0.1 sec!

Alternatively, you can spend a few ms processing in each node and still beat your 1 sec goal -- *or*, drop the data rate by a factor of 5 or 6 and still hit the 1 sec goal!

[remember, this is back-of-napkin calculation so I don't claim it accurately reflects *your* operating environment. rather, it puts some options in perspective...]

How do you mean "4 wires" vs. "2 wires"? You can run a 485 network with a single differential pair. A 232-ish approach requires a Tx and Rx conductor (for each ring). So, you could implement two

485 busses for the same conductor count as your dual UART rings.