Intermittent Failures in Serial Communication

Dear All,

I am working on solving a problem for a programming station for certain PCBA Modules.

I have to mount the PCBA on the Fixture. There are RX and TX, Gnd and Power Supply pogo pins locked to RX and TX test points on PCBA and the flash programming software runs on embedded PC. The communication from PC to Fixture is via COM Port. As PCBA's main microcontroller IC uses

3V~0V levels, there is a small PCBA in the test fixture for converting the 3V~0V levels to standard +/-15 RS232 level for the controlling Embedded PC.

The flash programming has two stages. The fist stage is to load a S- Record program to the module with baud rate 9600. Then the second stage is to load the actual firmware at baud rate 115200.

What happens is that S-Record program loading is always successful in the first stage. Then when the actual firmware is loaded with baud rate 115200, the first few blocks are successfully loaded but it will fail in loading the third or fourth block.

Does any body has experience in it?

I don't know whether it is because of the test pins, power supply, serial cable, or the Embedded PCs.

Please give suggestions if possible.

Thanks and Best Regards

Reply to
Myauk
Loading thread data ...

My first suspicion is that your target device does not have enough of speed to handle 115200 bps. Some buffer overflows after the first few blocks of data, and communication fails.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

There are lots of possible problems and you didn't provide any detailed information about your problem. How does it fail loading the third and fourth blocks? Is there a checksum and a protocol, which says the programming PC that the received checksum was wrong? Or the PC doesn't receive any acknowledge? Or the programmed blocks are wrong when verifying the programmed firmware? What CPU do you program?

Depending on your answers there are many possible ideas how to analyze the problem, e.g. you can try to add a checksum and test that the data is transmitted without problems, or you can implement a protocol with acknowlege from the bootloader, if the sending PC is too fast.

Another idea would be to test the supply voltage. When flashing, the device needs more power and maybe you need to add some bypass capacitors and some electrolytic capacitor for buffering.

--
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Reply to
Frank Buss

May be timing errors between the two systems. If so, lengthen the stop bit (to 2?) or tweak the clocks better.

Jon

Reply to
Jon Kirwan

Make sure it is hardware handshaking, NOT XON XOFF.

Also:

formatting link

RS232 logic levels are between -25V and +25V (maximum)

I would also NOT choose the max rate either as it is certainly NOT necessary to perform the function. Why choose the max rate, and why NOT scale it back when problems have been encountered? Remember, that is EXACTLY what a modem will do during handshaking. Your hard link will not auto-negotiate. So if you are having comm errors, the easiest way to solve it is to reduce the data rate.

An embedded PC under WHAT OS? Yes, it does matter because that will define what level of control you have over the comm port. Not just the rate, but the other particulars mentioned (hardware not XON) as well.

Reply to
Archimedes' Lever

Exactly.

Reply to
Archimedes' Lever

I recall early PC UARTS had timing problems that would show up at

115kbd. Is this still the case? Perhaps try a different PC or serial card.

Cheers

Reply to
Martin Riddle

Sounds like a buffering problem. It seems the receiver can't deal with the data fast enough. If it where real intermittent failures, you would see random failures.

Without knowing more about the system and how many units are exhibiting this behaviour it is hard to tell the exact cause of the problem.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
 Click to see the full signature
Reply to
Nico Coesel

(1) Is there a signal back from the target saying that it is finished with the programming step? It may be saying it is done before it really is.

(2) When you program the device, the current consumption likely increases. Look at the power supply wires with a scope. It may be that the supply voltage sags.

(3) Is the machine that sends the RS-232 data a Windows machine? Windows is known to have a problem with sending certain things over the serial power. It could be that for some reason, you hit on one of those. IIRC 0x1A is one of the characters that can cause trouble.

(4) Does your RS-232 look good at the full levels and at the output of the buffer chips?

(5) Are you sending data with a check sum? If so, do the two ends agree exactly on how to compute the check sum?

Reply to
MooseFET

It is the upper end of the capacity for the link standard and you dopes cannot see where pushing a decades old link standard to its upper limit might be a problem?

Reply to
Archimedes' Lever

if he's loading at 115kbd, then he's writing to Ram first, then flashing. You could never write directly to flash. 115kb is reliable with the proper hardware. The problem is mass producing a design to meet the 3% spec.

Cheers

Reply to
Martin Riddle

Yes. But the OP seems to have problems following the successful transfer of a few blocks. So the lowest level TX-RX seem to be talking to each other.

As Vladimir suggests, it sounds like a buffer overflow. Its possible that the flow control handshaking isn't working properly. If it uses h/w, make sure all the 'extra' pins are hooked up correctly. For either h/w or s/w flow control, it might be possible to fiddle with the buffer full threshold value. Once the receive buffer gets close to full, the handshaking latency may be such that the TX end can't get stopped in time.

--
Paul Hovnanian  paul@hovnanian.com
----------------------------------------------------------------------
 Click to see the full signature
Reply to
Paul Hovnanian P.E.

The OS was DOS6.

Regards

T

ot

l
Reply to
Myauk

er hammer!"

If we test 100 units, 30 units have this failure only on that particular test fixture. Other fixtures does not have this problem.

Thanks and Best Regards

Reply to
Myauk

Then it must be a contact or wiring problem.

--
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
 Click to see the full signature
Reply to
Nico Coesel

Based on the scanty information in this post, i would hazard a guess that you are exceeding the sustainable programming rate of some onboard device.

Reply to
JosephKK

the

from

converting

bigger hammer!"

Just what is different about the one fixture?

Reply to
JosephKK

\\> Just what is different about the one fixture?

Pardon? I don't get what you would like to ask.

Regards

Reply to
Myauk

I thought that you indicated that there was a higher incidence correlation with one particular test fixture. Did i misunderstand?

Reply to
JosephKK

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.