Web server unreliable when queried from foreign country

Hi,
i have a problem with an ethernet shield attached to an arduino-like board fitted with a STM32 (Olimexino board). This happens with two different shields fitted with W5200, and also with a shield fitted with a W5100.
I have an application that is a heating regulation system with a web interface, that allows to monitor the operation away from home, and also to set the temperature remotely.
This system in installed in France. It is connected to the internet through an optical fiber modem, to which it is connected by an ethernet cable. The appropriate port redirection rules have been set on the router to have an external access to the application.
From France, it works well, as well from a computer as from a smartphone through a 3G or a 4G connection.
However, each time I go to a foreign country, if I connect with my smartphone to the website of this application, it works a few times (maybe a dozen), then I get a message “impossible to connect to host…”. When back home, I cannot connect to the application anymore, even using the local IP address (192.168.1.55).
The application has not crashed, however, as it still regulates the heating ; moreover, the application uses the Ethernet shield to periodically query a NTP server for logging purposes, and this querying still works, while the web server is blocked.
I need to reboot the application to recover the web server functionality.
What could cause this problem? Is there something different in the internet routing or in the protocol when queried from a foreign country?
Here at home, I am trying to perform a stress test by having several computers query the web page once every 5 seconds. I have run this test for several days without any failure.
How could I simulate from my home a query coming from a foreign country? I have tried using a TOR browser, since TOR uses a longer than usual path to route the traffic, but so far I could not observe a failure.

Sounds like all your TCP sockets are stuck open for some reason. Most probably - software can not close them properly, or thinks that connection is still established.

Hard to advise anything as code review is required.

The difference could be:

  • packet loss;
  • packet wrong order.

For each TCP connection there must be general software timeout while socket is allowed to be connected or in transient state to close. If time outs, you must forcefully close the connection and reopen socket for listening.

It is a bad idea to fully rely on the W5x00 timeout mechanism for two reasons:

  1. timeout can be long enough - up to tens of minutes if you set timeout registers too large values;
  2. some operations do not assume timeout. Example: socket is connected, and W5x00 waits for incoming data. Remote device may have been long gone, but as W5x00 does no tget anything from there it still thinks that device is still there and keeps socket open.

Hi,

I think that Engeny answered you with correct information.

I guess that you tried to close the connection and FIN packet sent to the system in France but it was not delivered to it due to network problem. So the web server socket is still in SOCK_ESTABLISHED and it means there is no socket listening to the next connection request.

There are two solutions you can do.
One is to add socket closing mechanism to the web server socket in your application layer and the other is to use KEEP-ALIVE function with the web server socket. I recommend you use it with all sockets you use.
KEEP-ALIVE checks whether your socket is still valid at the lower layer than your application.
If you use KEEP-ALIVE function, the socket will be closed automatically when the relevant connection is invalid after KEEP-ALIVE Timeout.
Then, you can make a new connection with the web server.

I hope you solve this issue soon.

1 Like

Thank you for this analysis. Please give me more details.

  Currently, my server waits for the request, then it sends the web

page in 23 pieces, I mean in 23 successive IP frames, then waits 5
ms and closes the connection. Do you mean that the confirmation of
the closing from the client part may be lost, and the closing
never performed?

  About Keep-Alive, do you confirm that this is part of the HTTP

message, that for example must start as follows:

HTTP/1.1 200 OK

  Content-Type: text/html

  Connection: Keep-Alive

  Content-Encoding: gzip

  Content-Type: text/html; charset=utf-8

  Keep-Alive: timeout=5, max=1000

  <!DOCTYPE HTML PUBLIC "-/W3C/DTD HTML 4.0/EN">

  <html><head>

etc…

  Should I add in my web page the lines about Keep-alive as here

above?

For example. In general process gets broken, and socket is kept open. You must dig into your code to identify what exactly is happening, and adjust algorithm accordingly.

Do not confuse keep alive packet and keep alive HTTP header.

Keep alive in W5x00 network stack is the mechanism sending spare packet with last seq id, and expecting remote device replying with very specific content. If remote device does not reply within the timeout period or replies with RST, socket gets closed automatically with relevant flags set.

Keep alive HTTP header is the request to web server to keep TCP/HTTP connection open after it serves HTTP request for predefined period of time.