Hello fellow developers,
I would like your opinion about the case below:
Mid 2014 I finished the prototypes of a new product using a WiFi module of a manufactur I will call 'X', and some other peripherals on an SPI bus. I d id the usual tests I perform on a new board: check if some test code runs o n the micro and check if this test code can access all peripherals and I/O on the board. This was the case, I could talk to all SPI devices, including the WiFi module. I could send commands to the WiFi module and the WiFi mod ule responded correctly.
So I sent the design to our manufacturer to have 240 boards produced. A pro cedure I always follow successfully, it buys me time to finish the firmware .
So I continued developing the firmware and soon found that the WiFi module did not work according to the specifications. First of all, once selected w ith the SPI CS signal, it would stay selected and block access to all other devices on the SPI bus. An e-mail to manufacturer 'X' resulted in a firmwa re update which fixed that problem. Interesting detail: this module was on the market for over 2 years already...
Then it appeared that the module would stop communicating seemingly randoml y, requiring a reset to get it on track again. This module works "synchrono usly", meaning you send a command and the module responsds with a status, d ata or a simple "OK". It is then ready for a new command. That is what the datasheet and the command reference say.
After tens of e-mails with trace files of the SPI communications with X, I got the message "you're doing everything right, the problem seems to be in our module". We're now 2 months further in time.
After some more testing and exchanging e-mails and trace files, the repsons e from X was: "if you receive the ""OK"" from the module, you need to wait
20ms before sending a new command". WTF???? This is not in the datasheet an d this severely throttles the bandwidth if you need to keep 4 TCP connectio ns active, where each connection needs a command to send, wait for the OK, wait 20ms, poll for received data, wait for the OK, wait another 20ms and r epeat this 3 times for the other 3 sockets. And X claimed a sustained data rate of 4.5Mb/s ....It appeared that X had never discovered this problem in 2 years because the y NEVER built an embedded system to test their own module in a real situati on. They had always tested using some sort of USB-to-SPI gateway connected to a PC, running Python scripts that executed the commands. This process ha s an inherent system delay of around 16ms!
An already long story short: X and I exchanged about 260 e-mails, I receive d 5 firmware updates and still X could not solve the problem put the blame on Broadcom and the combo FreeRTOS/LwIP.
Funny enough, I no use a Lantronix xPico WiFi module, which is also based o n the same Broadcom chip and this module runs flawlessly.
This entire adventure has cost me over a year, including a complete redesig n with with the new xPico WiFi. Not to mention the trust of our customers w ho have been waiting for this new product for an extra year.
Meindert