WE have carried out another testing to diagnose the problem further. This time we replaced Wiznet with our PC to check the performance of 3G Router. PC is now connected to the 3G Router. The Desktop is running a small java TCP client application to mimic the wiznet testing with teraterm.
The result is not a single failure is captured with PC-3G Router setup. Although there is couple of retransmissions and latencies, the connection is not lost. And same port number is used throughout the TCP conversation start to end.
On comparing the PC-3GRouter & Wiz-3GRouter logs, we noticed few things as below.
- The are lot off retransmissions and the Retransmission Timeout (RTO) is squeezed to 0.617 secs in Wiznet. Whereas in PC-3G Router setup, the retransmission is not very often (only very few retransmissions); the maximum RTO is 0.922. Please Refer packet 571 in PC-3GRouter-2.pcapng
- In log WIZ-3GRouter.pcapng,
You can pick up the thread of the transfer at packet 4 which is a get request to the server.
4 HTTP GET to server
5 TCP ACK, from server, 67 ms later
6 HTTP OK from server, 118 ms after packet 4
7 TCP ACK from Client, almost instantly after packet 6 as you would expect
8 TCP Window update – client is telling the server that it has a 2K buffer
That completes the first sequence without problems and re-transmissions. 118 mS start to finish, but the key point to note is TCP ACK that came back in 67 ms. More on that later…
145 HTTP GET to server
146 TCP retransmission. Client is requesting packet 145 again after 88 mS
147 TCP retransmission. Client is requesting packet 145 again after 176 ms later, 264 ms after packet 145
148 TCP retransmission. Client is requesting packet 145 again after 352 ms later, 617 mS after packet 145
149 TCP ACK from server, 963 ms after packet 145
150 & 151 Duplicate TCP ACK from server in response to packets 146 & 147
152 HTTP OK from server, 1000 mS after packet 145
153 TCP ACK from Client, almost instantly after packet 33 as you would expect
154 TCP Window update – client is telling the server that it has a 2K buffer
That completes the next sequence without problems. 1000 mS start to finish, but the key point to note is TCP ACK that came back in 963 mS.
163 HTTP GET to server, port no 5008
164 TCP retransmission. Client is requesting packet 163 again after 88 mS
165 TCP retransmission. Client is requesting packet 163 again after 176 ms later, 264 ms after packet 163
166 TCP retransmission. Client is requesting packet 163 again after 352 ms later, 616 mS after packet 163
167 HTTP OK from server, 931 ms after packet 163
168 New connection is initiated by sending SYN from Client with new port no 5009 to the server
169 TCP ACK from server, 991 ms after packet 163 in response to packet 163
170 Malformed packet from server
171 & 172 Duplicate ACKs from server in response to packet 164 & 165
173 SYN-ACK from server in response to packet 168
174 ACK from Client in response to 173
175 Server retransmission of HTTP OK to old port 5008
176 Client reset connection on port 5008
So, this sequence does show an apparent problem with TCP receive timeout & retransmission. It takes only 931 ms for the response HTTP OK to reach, but still the connection is reset and the response is not send back to the serial port.
But, in the previous sequence, the ACK reaches in 963 ms which is greater than 931 ms and connection went through successfully.
The bottom line is that the Wiznet device is not obeying RFC 1122. 220.127.116.11 which clearly states that the receive timeout should initially be set to 3 seconds. The attached log clearly show that this is not the case when it is connected to the 3G router. Now, the 3 seconds is a default starting value and it is allowed to reduce this to match that of the data link. I suspect that part of the problem is that a cellular network is not as deterministic as broadband network, and the latency can go up and down based on interference. But on seeing the PC-3GRouter-2.pcapng log, PC still have latencies but managed to retain the connection with sufficient receive timout & RTO values.
In RFC 1122, under 18.104.22.168 Retransmission Timeout Calculation, it is stated that,
"There were two known problems with the RTO calculations
specified in RFC-793. First, the accurate measurement
of RTTs is difficult when there are retransmissions.
Did the 67 mS latency of the ACK in the first transfer influence how quickly the Wiznet device decided to timeout and retransmit?
logs.zip (16.7 KB)