Bytes missed when reading RX FIFO from W5300

Dear support team,

Recently I investigated one issue reported by our customer and had one finding of W5300. In this scenario, W5300 acts as a TCP client. It sends a request and then receives a packet from the 3rd party TCP server.

In the test, we made TCP server reply the same data every time so W5300 should read exactly same data stream every time, thus our firmware should read the same data as well.

We found sometimes certain bytes are missed in the raw data our firmware reads from W5300 RX FIFO. The phenomenon is as below.

In correct transactions, the response bytes our firmware reads from W5300 RX FIFO is:
00 3F 00 6E 00 00 00 39 6E 03 36 F9 A0 9F FF D5 54 81 FF E0 07 C3 FF 11 FF 00 00 FF FA 18 FF FF F5 3F FF 24 A9 FF F9 C0 00 7F DF FF FD 00 00 00 01 AF FF BF 1F FF FA FF FF FF FF 55 5F FD B5 01 FF

In the wrong transactions, the response bytes our firmware reads from W5300 RX FIFO turns out to be:
00 3F 00 6E 00 00 00 39 6E 03 36 F9 A0 9F FF D5 54 81 FF E0 07 C3 FF 11 FF 00 00 FF FF FF F5 3F FF 24 A9 FF F9 C0 00 7F DF FF FD 00 00 00 01 AF FF BF 1F FF FA FF FF FF FF 55 5F FD B5 01 FF 01 FF

You can see that in the wrong transaction, two bytes FA 18 were missed in the data stream, but the length in PACKET-INFO (we do not use alignment mode) still gives the expected full length (63), so the firmware tries to read the full length (63) of bytes from W5300, thus two another tailing bytes read from W5300.

In further investigation, I used logical analyzer and found the actual signals we read from W5300. It shows that the bytes are electronically missed from W5300, not somewhere else in our firmware. Please refer to following screenshot.


You can see in the circled window, the expected bytes FA 18 are missed in the data stream from W5300 RX FIFO. CS, RD signals and data bus are operated as required by W5300 specification.

And we found different W5300 chips have different performance.
The chips with following print DO NOT have this issue on our product.


The chips with following print HAVE this issue.
image
We have confirmed the other parts on our Ethernet interface board are same; the only difference is the W5300 chip.
I am still investigating more samples in our stock to verify this finding.
But as I described above, we have had the evidence that different W5300 chips performed differently on our products.

I appreciate if you can provide more details about the technical difference between two chips indicated in the above pictures, or advise us if you have similar findings before.

Thanks a lot!

I assume you have checked that expected FA 18 actually come in the packets from the network.

I see you deactivate CS slightly later than RD. Try activating and deactivating them together at the same time, if it is possible by your design.

Thanks for your prompt reply Eugeny.

Yes I have confirmed that expected FA 18 appeared on the network every time (as I observed in Wireshark capture files).

Our CPU treats W5300 registers as an external memory range, so the time sequence of CS and RD becoming low/high are controlled its bus access controller. Currently as I tried, the only parameter I can tune is BUS ACCESS DELAY which will decide the low level time of both CS and RD(WR). I ever changed the low level time to the maximum value CS 480ns and RD 420ns (approx.). The interesting finding is, the position of missed bytes changes from 27/28th (FA 18) to 26/27th bytes (FF FA). Below is the screen shot from the logical analyzer.

Can somebody give a support on this issue?

After verifying more samples from the stock, we found even the chips with the same prints can produce the different performance. E.g. two chips with the same print ‘PMHN1-010 1704’, one has the byte missed issue, but the other does not have. We are still investigating our PCB and peripherals to W5300 but did find valuable clues to go forward.

So I appreciate if someone from the support team provide us some ideas that what is the potential reason for this phenomenon. Is it clock, register mis-configured, time sequence, or anything else?

The customer system test has to be suspended due to this finding. We earnestly look forward to your response. Thanks a lot!

Another interesting finding is, if we changed certain bytes value in the packet (from the 3rd party device), the byte missing phenomenon will disappear.

As described originally in this post, the test packet was
00 3F 00 6E 00 00 00 39 6E 03 36 F9 A0 9F FF D5 54 81 FF E0 07 C3 FF 11 FF 00 00 FF FA 18 FF FF F5 3F FF 24 A9 FF F9 C0 00 7F DF FF FD 00 00 00 01 AF FF BF 1F FF FA FF FF FF FF 55 5F FD B5 01 FF

If I changed FF FA to 00 00, the bytes missing phenomenon disappeared!
00 3F 00 6E 00 00 00 39 6E 03 36 F9 A0 9F FF D5 54 81 FF E0 07 C3 FF 11 FF 00 00 00 00 18 FF FF F5 3F FF 24 A9 FF F9 C0 00 7F DF FF FD 00 00 00 01 AF FF BF 1F FF FA FF FF FF FF 55 5F FD B5 01 FF

Further, I found the edge value is FE FF / FF 00. That means, if the value of this 16-bit is equal or larger than FF 00, the bytes missing phenomenon will happen; otherwise, the bytes missing phenomenon cannot observed.

We were using Modbus TCP protocol from application level to identify and test this issue. But currently we do not believe it is related to protocol because the bytes missing phenomenon was captured on the interface of data reading from W5300.

Modbus TCP protocol does not have CRC on application data level, so it cannot detect the corrupted data and just put the received bytes into local memory. For other protocols which have CRC on application data, the abnormal packet will be discarded directly without impacting the local memory of user program. And at the same time, as explained at the beginning of this message, the issue is also binary content dependent (as least it is what I observed so far), so these might be the reasons we did not detect this before.

Thanks!

Does this sound similar?

Thanks for telling me this story, Eugeny.
I have forwarded it to my hardware colleague (sorry I am not) and also had a browsing.
As I understood, it seems two changes made a difference

Update: I am working with WIZnet to make new design in 3v3 environment. It appeared that my original design was not so bad as one may think, but it could have been better. We have found the way making W5100 working properly in it by changing timing of the control signals, in particular delaying /RD signal deactivation

In our system, CPU operates W5300 using memory mapping so may be different to your system. As I tried so far, it seems I cannot control RD signal separately by changing CPU control register solely.
How did you make it? In FPGA?

Update: https://www.youtube.com/watch?v=BsEx56HTo_A . W5100 works. I also found out that original control signals I was feeding to the CPLD and W5100 were noisy up to malfunction of the high-speed logic (this should have also played a role in W5100-related issues). Seems W5100 does not have much logic to fight against it. I had to implement sophisticated circuitry in FPGA to remove the noise (as I did not find decent hardware solution to it - pullups, pulldowns, RC filter).

I appreciate if you share more details you did with FPGA codes so we will review if we can do similarly on our board.

Yes. The trick is to deactivate RD and CS together. I did deactivate RD several nanoseconds earlier, and it caused this issue. I have no explanation for this effect. But the fix worked, since then I had no single issue with data corruption.

Noise may also be a significant factor in my opinion. W5x00 are high speed high precision devices, and small spike on the input pin may be treated as legitimate change of the level causing issues in driving the chip further. As I underdstand the probability of the issues in 5V signalling environment is much higher that in native 3.3V environment.

Hi Eugeny,

Thanks for providing so valuable information. As you advised, we tried to make some flying wires on the board and secure the rising edges of RD and CS are close to each. Amazingly the phenomenon disappeared!!! I am still running the setup on my desktop but from last 60 hours or so, the performance was good which is a big difference comparing to the original schematics in the same setup.

I appreciate the story you shared and it helped us a lot!

@ Wiznet Support Team:

I also have a big concern about Wiznet chip itself that if this is eventually verified (I think Eugeny’s story has verified), the specification of W5300 has a big issue in this detail but you guys did not show up here yet to explain what is the technical reason of this phenomenon and how to fix it. It is quite frustrating.

image

If we have to change the schematics, it would be an enormous impact to our existing products and product compatibility performance between hardware and firmware. I appreciate if somebody stands up and confirms what we understood is correct or not.

Thanks!

The cause of the issue can be simple, and not much “dependent” on the W5x00 chip itself: 5V environment, or noisy signals may cause false positive, and W5x00 treats this noise as data access. For example, RD is being deactivated before CS us deactivated, but there’s a “closing” spike in 0-5V scale down to 1V on RD line, and W5300 thinks that host reads the data, and increases the counter. That’s why these two bytes are being “eaten” from the stream. Of course WIZnet may have made some measures against this issue, but it is (a) consumes silicon considerably (I did this task for one of my devices), and (b) then there will be a probability of false negatives.

I am speculating here as I do not know definite answer :slight_smile: can only provide educated guess.