I found out some time ago for TCP communication that if I instruct W5100 with RECV command, chip sends duplicate ACK requesting remote device continue communication. It is very useful if something got broken in the transmission, and remote device locks waiting for ACK (there’re cases when that device does not perform re-transmission).
Thus my algorithm does the following: if W5100 shows it has 0 bytes received within specific time (timeout period), I instruct it performing RECV command so that duplicate ACK is sent and remote device action on it if it is lost for some reason. I do it several times before deeming connection as totally broken. Here’s how it looks like, you can see duplicate ACK sent marked black.
This works well, until I caught the issue, which is not so intuitive to deal with, and related to internal organization of the W5x00 chips.
I open socket to the remote server, then connect to that server. Then I send HTTP request, but remote server is slow, not answering within 5 seconds to the request.
While application controlling W5100 waits for response, it times out and sends spare RECV command, and here’s what happens:
You can see that marked blue packet is having reported window size as 17463, and next packet is 19511 (exactly +2048 bytes, size of the socket buffer).
How it looks like from W5100 side? I got dump of the socket registers:
This is total dump of 48 registers of the socket 0.
- Dump in section 1 is taken when entering receiving routine;
- Dump in section 2 is taken before event of performing RECV command on the W5100’s socket 0;
- Dump in section 3 is taken after performing RECV command.
Starting line 3 further you can see that RX_RD pointer became 0x37, and RXR size became 0xFFC9, which is not normal. At the event of this being revealed in the dump W5100 did NOT receive any data.
The outcome of my research is that problem happens when application performs update of TX registers and sends data using SEND command (in my case to send HTTP request), and then application performs RECV command without updating RX_RD register assuming it to be 0 because it is read to be 0.
If application does NOT send any data before performing RECV command, problem does not manifest itself, and RX_RD and RSR both remain read as 0 after RECV command.
This makes me thinking that, W5x00 family, internally uses different register set for presenting to the driver (the data being read through SPI or parallel interface), and real values in respective registers being in work by the W5x00 core.
What happens when issue appears: application writes to TX buffer, then performs update on TX_WR, then performs SEND command, and then RECV command - and after this RX_RD and RSR become corrupt. I suspect some RX-related internal register was loaded with incorrect data when transmit operation occurs, and this value was not reloaded from correct RX register when RECV command is issued.
How I solved this issue: solution is rather simple, but requires understanding of internals of the W5x00. The only action needed was to read RX_RD pointer (was reading 0), and write it back to RX_RD, and only then performing RECV command. This action seems to update some internal register, and RECV did not corrupt RX_RD and RX_RSR.
Conclusion: application MUST perform update of RX_RD before issuing RECV command; and it MUST NOT access any other register/making other operation (like SEND) in between.
In general, if programmer follows programming guidelines explained in W5100 datasheet, situation like mine will not happen, because instructions assume update of RX_RD and then immediate RECV command.