I’m having a recurring issue sending data with the W5300. I have 5 test units, and each unit is using a W5300 and sending data at a very high speed, about 80Mbits/second total, mostly evenly divided among 4 sockets.
They ran for hours, and then eventually they all failed at different times. The failure mode is that I am waiting for the SENDOK bit to be set in Sn_IR, and it never gets set for one of the sockets. The Sn_SSR remains good, 0x17 socket established.
I do notice that the Reserved byte of the status register (for example 0x288 for socket 2) is different. For the sockets that are working, I see 0x01 in this, but for the frozen socket, I have seen either 0x89 or 0x8B). Is that reserved register giving me some useful status information? Can I use it to know that something has happened so I shouldn’t expect the SENDOK bit to get set? Can I recover from this state?
In general, what would cause the SENDOK bit to never get set if the connection is still established? The client is not receiving any data, but reports no errors.
Update: I tried to see if the status goes through any temporary conditions during a normal write sequence, and found the following:
After writing data to the TX FIFO: Sn_SSR (including the reserved byte) was one of 0x07A0, 0x07C0, 0x07E0, 0x0800
After writing Sn_TX_WRSR: Sn_SSR is back to normal 0x0117
After writing SEND command: Sn_SSR remains normal 0x0117
I have not been able to duplicate seeing 0x8917 or 0x8B17 in the reserved+SSR bytes in any normal operation; this only seems to occur when the freeze up happens.
By the way, I have the keepalive set for 5 seconds, but I haven’t checked during the freeze up if the keepalive messages are going through. I’ll try to duplicate it and check with Wireshark.
Update 2: I was able to duplicate the failure with 3 units running overnight. For the “frozen” units, I used Wireshark to analyze any traffic and found the following:
The W5300 is trying to send a re-transmission over and over with a sequence number near the 32-bit rollover, and the ACK response has a non-matching sequence number. For example, the retransmission is 1460 bytes with sequence 0xFFFFFA70, and the ACK responsds with sequence 0x000000DC, which is the original sequence offset by 1644 instead of 1460.
The other two units failed in similar fashion:
- one retransmitting 1460 bytes with sequence #0xFFFFFAD8, and the ACK coming back as 0x000001C4 (off by 1772 instead of 1460)
- one retransmitting 1460 bytes with sequence #0x0000088A, and the ACK coming back as 0x00000FF6 (off by 1900 instead of 1460)
I looked at some of the normal data stream before the freeze up, and it looks like the packets were most often 1460 bytes, with some occasional smaller packets of sizes: 340, 416, 1028, 1112, 1368, 1452. It also seems common for the W5300 to send multiple packets at high speed and only get a single ACK.
I was wondering if it’s possible that the W5300 is sending multiple packets around the time the sequence rolls over, and a single ACK is not being processed correctly. Any thoughts or workaround ideas?