CAN bus reply problems

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Apr 8, 2005 12:56 PM

Wrong.

[... CAN ACK mechanism...]

No. It only lets the sender know that someone *could* have received it, if he had been interested in it. The crux being that ACK is flagged even by nodes who won't actually do anything with this message, because it wasn't meant for them.

... or the ID didn't match the mask set in the receiver.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- S
- Stephen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Apr 8, 2005 1:45 PM

In article , Ska writes

Having read a number of articles and threads recently on this subject, it seems to me that despite CAN's excellent hardware based acknowledgement & retry system, the above statement is probably true once you have three or more processors on the bus and certain types of message being sent.

Consider a system with processors P1, P2 and P3.

P1 wants to send a message to P2. The message is not one of the often quoted "nice" CAN bus examples whereby P1 is constantly spewing out repeated readings of a sensor so that P2 or anyone else may "consume" them; the loss of a message in this scenario isn't so important as the next reading will usually suffice. Instead, the message is an instruction for P2 to perform something, such as turn an I/O line on, or write some data to an LCD, and it is therefore 100% essential that P2 receives this message or the product fails.

So, P1 sends the message, and gets the hardware ACK. But the ACK came from P3, who isn't interested in consuming the message.

From what I understand, although P2 "should" generate an error to destroy the ACK if it detects an error, there are a number of circumstances where it may not and P2 may "lose" a message.

A software bug in P2.
A receive overflow in P2.
Errata in the P2 CAN controller.
P2 has gone error-passive or bus-off.
Are there any other reasons?

Admittedly, (1) should be fixed and would be a problem even in a two- node system, but (2) may be unavoidable on certain smaller CAN controllers with limited FIFOs, (3) is unavoidable unless you change to another processor/CAN device, and (4) is actually designed to happen. I'd truly like to know if there is a (5).

So, it would seem in this situation that despite the hardware based ACK system present in the CAN controllers, you must still produce a high level protocol which provides a software based mechanism for acknowledge, timeout and retry.

Such lost messages may only be one in a billion, but if my product sends a billion or more messages per week and it doesn't include a high-level acknowledge, timeout and retry mechanism, then I'll have a product MTBF of a week or less which is totally unacceptable.

I'd be interested in the opinion of others here. I'm in the process of firmware development on my first CAN based system and only have one of the nodes up and running in loopback mode for now so I can't assess reliability on a three-or-more-node system. But based on the fact that the possibility of a message going missing isn't completely zero, I'm taking the view that I must implement the additional high-level ACK mechanism. The general view I sense from reading CAN articles is that although CAN's error mechanism is extremely robust, it's not 100%, and stuff does occasionally go missing.

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Apr 8, 2005 3:36 PM

You understand that incorrectly. No CAN node can possibly "destroy" an ACK being flagged by some other node. And "generating an error" (by which I assume you mean "sending an error frame") for reasons not already diagnosed by the CAN protocol itself would be a layer model violation. Application layer errors have no business generating transport/link layer errors. That's also the reason why CAN controllers typically don't support sending error frames on purpose: if an error frame needs to be sent, the controller will do that all by itself.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- S
- Stephen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Apr 8, 2005 4:21 PM

In article , Hans-Bernhard Broeker writes

No, you misunderstood me.

I was talking about the situation where the CAN module detects an error at the hardware level and deliberately generates an error frame to stop the transmitter believing the frame was acknowledged. Nothing to do with software.

- R
- R Adsett
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Apr 8, 2005 7:51 PM

One thing to watch for that hasn't been pointed out is that a CAN node may recieve multiple valid copies of the same message. This has two consequences, the first is that toggling the state based on message receipt is a bad idea. The second is that any acknowledge/retry scheme has to be able to recognize and discard duplicates if necessary.

Robert

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Apr 9, 2005 7:58 AM

Apart for some strange networks with multiple store and forward repeaters, it is hard to imagine how such could situations could happen.

Basically this would require that the transmitter has recognised an error (missing ACK or error frame) and thus resends the message. However, your node did not detect that something was wrong and accepted the message at the first time.

A properly working receiver should check the CRC, the ACK fields _and_ check that at least six recessive bits are received in the End Of Frame field.

If your receiver is happy that the frame that you are interested in, passed the CRC check and immediately accept the message, without checking the ACK and EOF fields, you are going to get duplicates, if the transmitter works according to the standard. An other node may have generated the error frame, which the transmitter detects and retransmits, but your receiver is content with the first copy.

Paul

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Apr 9, 2005 1:12 PM

That's indeed what happens, if a bit error hits exactly the wrong bit in the CAN message: the last bit of the end-of-frame field. This bit is checked by the transmitter, but not by the receiver(s). So, if this bit is struck by an error, the transmitter will detect this as a "form error", and re-send, but the receiver will not have noticed any problem.

No. That's not the actual definition of a properly working receiver. A proper CAN receiver will *not* look at the last bit of the EOF field.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- R
- R Adsett
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Apr 9, 2005 3:47 PM

I've seen it happen. Not frequently but far more than could be ignored even if you were inclined to do so.

Robert

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Apr 9, 2005 4:03 PM

At least you'll be forewarned. Non-systematic errors will supposedly hit any bit of a CAN message randomly, at equal probability. So this particular error will occur at most 1/50 as often as the other types of error, which both transmitter and receiver notice --- less if you use longer CAN messages.

Keeping an eye on overall error-induced frame retransmission rates thus provides a handle on how often to expect this particular error. Combined with the requirements of the communication at hand, one can design the amount of countermeasures to match the risk.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.