Missing Responses intermittently when used with 3G Router

Hi,

I am using the W7500S2E-R1 module and configured as TCP client in DHCP mode.

W7500S2E-R1 is connected to a Desktop using Serial Interface and the other end (Ethernet) is connected to 3G Router (transfers data via cellular network using SIM card). The Desktop is running teraterm to input data & display incoming data on screen and is configured correctly. (Default settings - Baud Rate: 115,200, data size: 8, Parity: None, stop bit: 1 and Flow control: None).

I sent a HTTP request from teraterm and get HTTP response from Wiznet module. But, in some instances it fails to receive HTTP response back from Wiznet module. The very next HTTP request will get a HTTP response successfully. I have attached both the teraterm & the relevant Wireshark logs. When checking the Wireshark log, I found that the HTTP response actually reaches the LAN interface but not the serial interface. Deeper analysis shows that any response that comes after the 0.8 secs from HTTP request sent time , is ignored by wiznet module and doesn’t get passed to serial interface. Instead, the wiznet module assumes the socket is lost and send another SYN packet to server. Below is one of the sample for no-response,

At no 89, HTTP request (port no: 5006) is been made; after 3 re-transmission of data, finally the response reached the LAN interface at 93; soon after a SYN packet is sent to the server with new port number 5007. Meanwhile, the response doesn’t get reached to serial interface, which is evident in the teraterm log I attached (at the same time 2021-02-16 15:02:50). Same instance happened in last request at 2021-02-16 15:05:39. Is there any internal response timeout that causing this issue. However I have disabled the available wiznet timers (Inactivity Time, Keep Alive Time - set to zero).

Can anyone help how to recover?

Thanks in advance.

Note: teraterm and Wireshark logs are edited to hide confidential infos. This problem never exists with LAN or 4G router.
logs.zip (4.3 KB)

Are there any symptoms even if I set the Inactivity Time to 2000?
And set a value to keepalive in this module and keep the connection from being disconnected.
It may be a network problem, so it would be nice to test it.

Hi Irina,

Thanks for replying.

I made another set of testing with modifying the Inactivity & Keepalive timer. Still the problem exists. We have even done outdoor testing with full signal coverage. Didn’t help either.
Attached logs for your review. Please help.

Testcases done:

  1. InactiveTimer - 0 (disabled) KeepAliveTimer - 0 (disabled)
  2. InactiveTimer - 60000 ms (max value) KeepAliveTimer - 0 (disabled)
  3. InactiveTimer - 60000 ms (max value) KeepAliveTimer - 250, unit: 5s (max value)
    log-2.zip (43.9 KB)

Note: Failed connections are highlighted in attached teraterm logs

Hi Praveena,

I’ve tried to test W7500S2E-R1 with a setting similar to your situation:

Sending an HTTP request (GET / HTTP/1.1) to the IP address shown in your wireshark record
by send packet from PC through serial port repeatedly every 0.8s.

The result of the test was :both receiving and sending packet of the module worked fine. Serial on my PC received all packet from the IP address.

Serial setting:

  • Baud Rate: 115,200
  • data size: 8
  • Parity: None
  • stop bit: 1
  • Flow control: None

Module network setting:

  • Firmware version: 2.4 (which can be downloaded from wizse.com)
  • InactiveTimer - 0 (disabled)
  • KeepAliveTimer - 0 (disabled)

If I didnt get you wrong, what you meant was that your wiznet module had “cool-down” time of 0.8s, no data would be received during the cd?
And there was no port shifting during my test. I assume there was no disconnection throughout my test.

It would help if you could provide more detail about your situation, for example what data you sent to the address, so that I can have another test much more closer to your situation

It is also suggested to check the setting and performance of your 3G module/connection due to the fact that in some cases 3G connection could be slower than serial communication with baud rate 115200, or the performance of the website to see if any misbehavior happened.

Attached is the wireshark record of my test

800ms problem of W7500s2eR1.zip (16.1 KB)

Hi,

We can’t share IP or URL details in forum. I simply explain the issue here.

W7500S2E-R1 is connected to a Desktop using Serial Interface and the other end (Ethernet) is connected to 3G Router (transfers data via cellular network using SIM card). The Desktop is running teraterm to input data & display incoming data on screen and is configured correctly. (Default settings - Baud Rate: 115,200, data size: 8, Parity: None, stop bit: 1 and Flow control: None).

I am sending a HTTP request to a server using serial port (PC serial settings - Baud Rate: 115,200, data size: 8, Parity: None, stop bit: 1 and Flow control: None). I was expecting a HTTP response back to serial port. But I am not getting the response back. When seeing the wireshark log I found, the response actually reaching the LAN interface but not to the serial interface (in teraterm). Observing few failure instances, (I am not sure though, it’s a guess) I found that the response time in Wiznet (roughly 0.8 secs) may causing a timeout and the response is lost. Because, Wiznet does 3 re-transmission of HTTP request; if the ACK or either response reaches LAN after the 3 re-transmission, ideally after ~0.8 secs, will be ignored by wiznet and a new connection is started by sending SYN packet to the server with new port number.

We have tried with different server IPs. Also tried with 38400 baudrate. We get the same result.

Can you able to simulate a simple server, send a HTTP request and delay both the ACK & response by 1 sec and see the result? 3G connection could be slower, but we want to see how wiznet would handle this.

Hi Pravaana,

Since ACK packets is to confirm whether the connection is fine, and normally it should be sent within 0.2s after receiving data, it should be sent ASAP once one end receives a packet. And it is independent from the reply packet of the server.

Since the port changed, losing the packet from you PC through serial made sense.

We are able to alter the buffer time for disconnection for you if necessary. However, checking your the performance/transmission speed of your 3Grouter beforehand is suggested as the time for ack packet is not often seen.

You are welcomed to send us your problem through email if you need further support and are worry about privacy.

Hi,

Thank you so much you came forward to help.

Can you please provide a new software with configurable response time (buffer time for disconnection in your term) for TCP commands with an AT command so that even if there is any delay in serial port or router the connection won’t be affected. Please make the response time to be bit enlarged like 0 to 60 secs.

Hi Pravaana,

I understand your concern. However I am afraid we cannot provide such a software to you.

After double-checked the protocol document RFC1122(RFC 1122 - Requirements for Internet Hosts - Communication Layers), we found that normally a TCP delayed acknowledgment should not be longer than 500ms:

4.2.3.2 When to Send an ACK Segment

A TCP SHOULD implement a delayed ACK, but an ACK should not be excessively delayed; in particular, the delay MUST be less than 0.5 seconds, and in a stream of full-sized segments there SHOULD be an ACK for at least every second segment.

The wiznet module already has a buffer upon the standard delayed ACK. It is suggested checking the performance of your 3Grouter or server since other unknown problems would be arisen along with the abnormal delayed ACKs and replies.

Or I personally reckon you could alter the code on your PC side to compromise with the late ACKs and replies if you want to keep the performance the way it performs.

Hi,

WE have carried out another testing to diagnose the problem further. This time we replaced Wiznet with our PC to check the performance of 3G Router. PC is now connected to the 3G Router. The Desktop is running a small java TCP client application to mimic the wiznet testing with teraterm.

The result is not a single failure is captured with PC-3G Router setup. Although there is couple of retransmissions and latencies, the connection is not lost. And same port number is used throughout the TCP conversation start to end.

On comparing the PC-3GRouter & Wiz-3GRouter logs, we noticed few things as below.

  1. The are lot off retransmissions and the Retransmission Timeout (RTO) is squeezed to 0.617 secs in Wiznet. Whereas in PC-3G Router setup, the retransmission is not very often (only very few retransmissions); the maximum RTO is 0.922. Please Refer packet 571 in PC-3GRouter-2.pcapng
  2. In log WIZ-3GRouter.pcapng,

You can pick up the thread of the transfer at packet 4 which is a get request to the server.

4 HTTP GET to server
5 TCP ACK, from server, 67 ms later
6 HTTP OK from server, 118 ms after packet 4
7 TCP ACK from Client, almost instantly after packet 6 as you would expect
8 TCP Window update – client is telling the server that it has a 2K buffer

That completes the first sequence without problems and re-transmissions. 118 mS start to finish, but the key point to note is TCP ACK that came back in 67 ms. More on that later…

145 HTTP GET to server
146 TCP retransmission. Client is requesting packet 145 again after 88 mS
147 TCP retransmission. Client is requesting packet 145 again after 176 ms later, 264 ms after packet 145
148 TCP retransmission. Client is requesting packet 145 again after 352 ms later, 617 mS after packet 145
149 TCP ACK from server, 963 ms after packet 145
150 & 151 Duplicate TCP ACK from server in response to packets 146 & 147
152 HTTP OK from server, 1000 mS after packet 145
153 TCP ACK from Client, almost instantly after packet 33 as you would expect
154 TCP Window update – client is telling the server that it has a 2K buffer

That completes the next sequence without problems. 1000 mS start to finish, but the key point to note is TCP ACK that came back in 963 mS.

163 HTTP GET to server, port no 5008
164 TCP retransmission. Client is requesting packet 163 again after 88 mS
165 TCP retransmission. Client is requesting packet 163 again after 176 ms later, 264 ms after packet 163
166 TCP retransmission. Client is requesting packet 163 again after 352 ms later, 616 mS after packet 163
167 HTTP OK from server, 931 ms after packet 163
168 New connection is initiated by sending SYN from Client with new port no 5009 to the server
169 TCP ACK from server, 991 ms after packet 163 in response to packet 163
170 Malformed packet from server
171 & 172 Duplicate ACKs from server in response to packet 164 & 165
173 SYN-ACK from server in response to packet 168
174 ACK from Client in response to 173
175 Server retransmission of HTTP OK to old port 5008
176 Client reset connection on port 5008

So, this sequence does show an apparent problem with TCP receive timeout & retransmission. It takes only 931 ms for the response HTTP OK to reach, but still the connection is reset and the response is not send back to the serial port.
But, in the previous sequence, the ACK reaches in 963 ms which is greater than 931 ms and connection went through successfully.

The bottom line is that the Wiznet device is not obeying RFC 1122. 4.2.3.1 which clearly states that the receive timeout should initially be set to 3 seconds. The attached log clearly show that this is not the case when it is connected to the 3G router. Now, the 3 seconds is a default starting value and it is allowed to reduce this to match that of the data link. I suspect that part of the problem is that a cellular network is not as deterministic as broadband network, and the latency can go up and down based on interference. But on seeing the PC-3GRouter-2.pcapng log, PC still have latencies but managed to retain the connection with sufficient receive timout & RTO values.

In RFC 1122, under 4.2.3.1 Retransmission Timeout Calculation, it is stated that,

"There were two known problems with the RTO calculations
specified in RFC-793. First, the accurate measurement
of RTTs is difficult when there are retransmissions.
"
Did the 67 mS latency of the ACK in the first transfer influence how quickly the Wiznet device decided to timeout and retransmit?

logs.zip (16.7 KB)

Can you please share the email address of the support team so that we do share details and the original logs if you need?
It is more helpful if you arrange for a virtual meeting over phone or teams in a comfortable time for both of us.

Hi Wiznet Team, I am a work colleague of Praveena’s. It would be most helpful to have a call to discus the problem which Praveena has set out above; we could also invite the 3G router supplier to provide input directly. This issue is affecting a key project for us and we seek to resolve as soon as possible. Please let us know if the issue is being actively looked at and your availability for a call. Many Thanks !

Hi Praveena,

Sorry for the late reply.

The test we did last time, which is connecting PC to WIZnet module, and to an ethernet router, did not show the problem you are encountering.

Could you come up with the same test for us so as to analyze the problem more thoroughly? We would like to investigate this problem to see if the frequent retransmission problem arise solely from your WIZnet module or using you 3G router and WIZnet module together

For the RTO and RTT problems, we are looking into it. Thanks for reporting the issue.

The email address of our support team: support @ wiznet.hk

Hi Willznet,

I have just sent a reply for this chat to support@wiznet.hk where I have shared our 3G Router details.
Thanks.