CAN bus reply problems

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi folks!

We are developing a system using the CAN bus to implement the network
connecting different nodes. We have a PC that needs to ask for some
data (the node status) to the nodes that have to answer to the request
immediately.
In order to ask each node for its status we send a "remote frame"
message to the CAN bus with a specific ID. The relevant node has to
answer with the relevant data by using a "data frame" message.
Each node is in a while loop reading a buffer and sending back data
when necessary. Usually everything goes well but sometimes it happens
that one of the nodes does not answer to the PC request, even if the
request is sent to the bus (it is seen by another node and it can be
seen by using an oscilloscope connected to the CAN bus lines). It
seems the node do not see the message, it misses the interrupt for
updating the buffer...
We usually send a sequence of "remote frame" messages waiting every
time for the answer: send ,waiting for answer, send, waiting, ... Even
if we insert a sleep between a send and another, sometimes the
messages are missed by a node...
We modified the baud rate (from 500Kbit to 20Kbit) but the problem is
not solved.
We are using a T89C51CC03 micro-controller by ATMEL.

Have you ever experienced this problem? Any suggestion?

Thank you in advance for any help!

Cheers,
Ska

Re: CAN bus reply problems

Quoted text here. Click to load it

1:  This is either a problem with your microprocessor or with your code.
2:  I have no experience with Atmel & CAN.
2a: The TMS320F2812 has been rock solid for me.
3:  No protocol should trust external nodes 100% to receive something --
     you should always have a timeout & retry mechanism.

--

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems

Quoted text here. Click to load it

I can not answer your specific question, in other words I don't know which
part of your software or hardware is responsible for it. Could be the
driver, could be a miss configuration of the CAN controllers, could be the
cabling.
But you should consider switching your node monitoring from the master/slave
principle you are using now to something other.
Your current implementation looks exactly like to _old_ CANopen Node
Guarding mechanism. CANopen switched to Heart Beat years ago, where each
node is an autonomously Heart Beat Producer and can be monitored by every
node that wishes to do so. The benefit is more flexibility and reduced band
width for the node monitoring.
Anyway, it can happen that one of the Heart Beat Consumers is missing one
Heart Beat of one of the Producers. In this case increase the rate or
accept that one or more HB are missing.

Regards
  Heinz
--

with best regards / mit freundlichen Grüßen

   Heinz-Jürgen Oertel
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems
Hello Tim, hello Heinz, hello everybody

Thank you for your mails.

What you both are telling is that "No protocol should trust external
nodes 100% to receive something -- you should always have a timeout &
retry mechanism"!
This is exactly what we are doing now, but it is something I don't
like so much... :(
We set a maximum number of retry messages (say 10) and it sometimes
happens that the trials go over this threshold! In this case we reset
and start again the CAN bus but, as I said, it is something we don't
like so much...

...mmm...

Regards,
Ska

Quoted text here. Click to load it

Re: CAN bus reply problems
[Note: F'up2 cut down to one group --- should have been done by OP.]


Quoted text here. Click to load it

[Massive quote without actual referral snipped.  Please don't do that.]

What you're observing appears to be a rate of failure to receive CAN
messages that is quite a lot beyond expectations of the protocol,
unless you were operating in a pathologically noisy environment ---
but you didn't mention anything like that.

What this hints at is a genuine bug in the receiving end, but I'm
afraid you didn't reveal enough of its details for anybody out here to
be able to remote-diagnose it more precisely.  So I'll just bombard
you with some questions:

Did you test this with only two nodes on the bus, and check if the
receiving one ACKs the transmission?  

What *is* the rate of failure, anyway, i.e. one in how many messages
gets lost?  What is the rate of transmissions with CRC or other
failures, on the same network?

Do you have any way of debugging into the receiving CAN controller's
register banks after a failed receival, to distinguish if the message
actually failed to arrive in the message box, or just failed to raise
the IRQ it's configured to?  (There's a bug like that in another 8051
derivative with integrated CAN...)

Do you have a storage scope that would let you record the exact
signalling up to the point of failure, so you could go look for any
differences between successful and failing transmissions, on physical
level?

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems
Quoted text here. Click to load it

Which 8051 is that then?


Re: CAN bus reply problems

Quoted text here. Click to load it

DS80C390 Rev. B3 and B4

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems
Quoted text here. Click to load it

Ahhh... (sigh of relief). Just about to start coding on a 400. Sure
would be a killer if the CAN interrupts didn't work as advertised!

Re: CAN bus reply problems

Quoted text here. Click to load it
This really should not happen in CAN networks.
If one of the nodes sees a wrong message, whatever the reason was, CRC, bit
failure, framing error ..., it generates an error frame and cause the
transmitter to retransmit.
The probability of a _lost_ message, eg. a message not seen by a receiver,
is very, very low.
Looks like a bug in your driver (or CPU, but I'm not aware of such a problem
in the Atmel chips)

Heinz

--

with best regards / mit freundlichen Grüßen

   Heinz-Jürgen Oertel
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems

Quoted text here. Click to load it


There are a couple of cases where you *will* see lost messages:

1. Too many messages for the receiver.
   Trivial case, but if your protocol doesn't allow for it, it *will*
   bite you in the ass one day.

2. Errors on the bus.
   Eventually, someone is going to go TX-Passive. Now everyone has lost
   whatever message they were going to get from there.

The normal reliability of CAN allows people to handwave both of these
problems into "shouldn't happen". However, if you're building a
higher-level protocol on top of CAN, you have to take these faults into
account, because they will happen sometime.

cheers, Rich.

[happily swamping 8-bit micros on 1MBit CAN with flaky connectors and
high error rates since ... oooh, gosh, *that* long ago?]

--
rich walker         |  Shadow Robot Company | snipped-for-privacy@shadow.org.uk
technical director     251 Liverpool Road   |
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems

Quoted text here. Click to load it

As I said already, yes you are right in _very seldom_ cases it can happen
that a message is lost as a failure of the CAN protocol (described in every
good CAN book.)
What you are describing is different.
The first case - definitely a problem of processing power, bad driver design
or bad network design.
The second case, what you are calling TX-Passive. This term could not be
found in the ISO11898 standard. Assuming you mean:

Quoted text here. Click to load it
An >error passive< node shall not send an active error flag.
It takes part in the bus communication but when an error has been detected,
a passive error flag is sent (opposed to active error flag)

A transmitting node in Error-Passive still sends messages, and normally is
received by other nodes. If network quality, or whatever, e.g. a
transceiver defect, is getting worse, the transmitting node is switched
off. And only in this case, nothing is sent. But in this case, as well as
when you cut the cable of the transmitter,  no other node can receive
anything.

What the OP described, that one of the nodes does loose messages from time
to time, looks strange. Therfore, for me it really looks like a bug in the
software.

Regards
 Heinz

--

with best regards / mit freundlichen Grüßen

   Heinz-Jürgen Oertel
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems
Hello everybody

Thank you for your answers.

I'm happy to understand that CAN bus should be more reliable than what
we are experiencing now.
I will try and answer to your questions, let's see if I can do it
properly.

For Hans-Bernhard

Quoted text here. Click to load it

Yes, we did it. In our net we have some nodes, as I said. We have a
serial line where we send the output of the printf in the code used
for test purposes. We inserted a printf code in one of the nodes code
(not the addressee node, another node; call it the "print node")to
print the message IDs seen in the CAN bus.
Sending the message to the CAN bus, sometimes it happens that the
"print node" sees it (it produces a message to the serial line with
the correct ID) but the addressee node does not send anything back.
Sometimes it happens that both of them do not see anything.
We are sure that the problem is that the addressee node does not send
anything back (why? because it does not receive the message or because
it decide not to trasmit an answer to it) because we set a led that is
turned temporarily on when something is transmitted by a node and we
do not see it flashing.

Quoted text here. Click to load it

More or less the failure rate is about one message in 10 messages (but
the rate is higher if you take into account consecutive failures...)
We use the API set to communicate with the bus and we know that there
is a CRC in the message sent, but we did not check it...

Quoted text here. Click to load it

As I wrote, the only thing we know is that we send the message but the
node do not receive it...

Quoted text here. Click to load it

Ehm... no, I think no... ... ...
---
For Rich
Quoted text here. Click to load it
We've slightly trimmed the long signature. Click to see the full one.
Re: CAN bus reply problems



Actually, you didn't.  You ran a test, but not the one I described
above.  The ACK I'm talking about is that of the CAN bus protocol itself,
where a receiving node sends back a single bit, inside the time frame
of the message being transmitted on the bus, to inform the sender that
at least one node successfully received it.

The test target here is to find how far into the receiving node the
CAN message still makes it.

Quoted text here. Click to load it

Change that LED's usage to "flash if something received", please.
That's the more important test for the moment.

Quoted text here. Click to load it

That's *way* too much.  It suggests a serious software bug, mismatch
in hardware clock rates, or misconfigured bit timing on the CAN bus.

Quoted text here. Click to load it

Please do so.

Quoted text here. Click to load it

Apparently, you don't really know even that.  You only know for a fact
that it doesn't send the answer.  You haven't established why, yet.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: CAN bus reply problems
aachen.de says...
Quoted text here. Click to load it

Perhaps we should start with some even simpler checks.  Re-reading
through the thread I don't hink it's been established that the bus is
properly terminated.  I have seen a CAN bus work something like 10% to
90+% when not properly terminated.  I have seen symptoms quite close to
this when the bus had a broken termination resistor.

Robert

Re: CAN bus reply problems
Quoted text here. Click to load it
Robert, I did not understand the test you proposed... Can you explain
it again?Is it an hardware check?

Cheers,
Ska

Re: CAN bus reply problems
snipped-for-privacy@libero.it says...
Quoted text here. Click to load it

Yes, it's a hardware check.  Make sure that you have the proper
terminating resistors on the bus.  While CAN is quite tolerant of
variation on the bus in my experience, missing termination resistors will
cause the error rate to rise (often quite dramatically).

Simply find the end of the bus cables and look for the resistor.  You can
verify it's resistance with a multimeter.


Robert

Re: CAN bus reply problems
Quoted text here. Click to load it

Robert, you are right. We are actually working with a temporary system
configuration. We will work on the test set-up to terminate the bus
and I will give a feedback about the results, hopefully by the end of
the week.

Thanks,
Ska

Re: CAN bus reply problems
Quoted text here. Click to load it

Actually, this is not necessary for CAN. The beginning of the frame contains a
node ID that possible
recipients filter through their match/accept registers. Active receivers
calculate CRC as the frame bytes
clock in and then compare it to the CRC at the frame end. If they match, the
accepting receiver drives the bus
active (low) for one bit in a designated tailing window. This lets the master,
or sender of the frame, know
that someone received it.

Use your scope to look at the bus for this ACK bit. If you see it, but the
receiver doesn't process the frame,
you've missed the interrupt. If you don't see the ACK bit, then the receiver
didn't match the node ID or the
CRC, or it's in Bus Off mode for error containment.

Also be sure you have both ends properly terminated; I've seen wild behavior on
DeviceNET packets at 125, 250
and 500 kb/s.

Dan


Re: CAN bus reply problems

Quoted text here. Click to load it
node ID

"Node ID" is only meaningful for some higher level protocols, such as
CanOpen, but it does not make any sense in simple CanBus systems,
which fully relies on message identifiers.

Quoted text here. Click to load it

Unless the receiver is in the "bus off" or "error passive" mode, _all_
receivers should monitor the bus and signal ACK or error frame
accordingly.

Quoted text here. Click to load it
accepting receiver drives the bus
Quoted text here. Click to load it

The ACK bit is sent by _any_ active (also "nonaddressed") device. Also
if _any_ receiver detects a CRC or other error, it will send the error
flag, which mutilates the message and no device will accept it.

Quoted text here. Click to load it

This is only usable with only two devices (sender and receiver) on the
bus. With more than two devices, someone else will acknowledge it.
Instead of an oscilloscope, you should also be able to tell from the
transmitter status registers, if someone ASKed the transmitted frame.

Quoted text here. Click to load it

Or you have configured the mask registers incorrectly.

Quoted text here. Click to load it

The identifier match should not affect the appearance of the ACK.

It should be possible to determine from the _transmitter_ status
registers, if the frame was ACKed or an error flag generated by the
receiving device.

Paul


Re: CAN bus reply problems
Quoted text here. Click to load it


Wrong.

[... CAN ACK mechanism...]
Quoted text here. Click to load it

No.  It only lets the sender know that someone *could* have received
it, if he had been interested in it.  The crux being that ACK is
flagged even by nodes who won't actually do anything with this
message, because it wasn't meant for them.

Quoted text here. Click to load it

... or the ID didn't match the mask set in the receiver.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Site Timeline