WIZnet Developer Forum

Line break recovery fails

Hi,
I have an application as follows:
STM32 board with W5200 shield acting as a TCP client
STM32 board with W5500 shield acting as a TCP server.
I am working to improve fault resistance of the system. For so, during normal operation, frames are exchanged between both boards.
I am focussing on the server side.
When I disconnect the RJ45 plug from one board, a timeout condition exists. My software performs repeated attempts to reconnect. During this time, I monitor the Ethernet traffic using Wireshark. The client sends ARP requests repeatedly to know the MAC address of the server.
When I reconnect the plug, the client sends a SYN frame to which the server responds with a RST, ACK frame.
From then on, the client side is trapped in an endless loop. With the aid of the generation of a pulse on a port, I could localize where this loop stands. It is the following function:
In function EthernetClient::connect(IPAddress ip, uint16_t port)
the loop is the following:

while (uint8_t s = status() != SnSR::ESTABLISHED)
{
DELAY(1);
digitalWrite(4,HIGH); // debug
digitalWrite(4,LOW); // debug
if (s == SnSR::CLOSED)
{
_sock = MAX_SOCK_NUM;
return 0;
}
if (W5100.readSnIR(_sock) & SnIR::TIMEOUT)
{
W5100.writeSnIR(_sock, SnIR::TIMEOUT); /* clear TIMEOUT */
close(_sock); // free socket
_sock = MAX_SOCK_NUM;
return 0;
}

This means that the state of the W socket is neither ESTABLISHED nor CLOSED, and that the timeout flag is not raised.
I use the pulse generation trick since I do not have an emulator.
According to the documentation of the W5200, if a connect fails for any reason, the state changes to CLOSED. Apparently, this is not the case here, and I have no idea what the state could be. Anyway, the code is stuck in this loop and I must reboot the client side to recover.
Why does the W5200 not go back to CLOSE state on reception of the RST, ACK frame? In which state is it then?

One further information: I have added a software SPI to the loop code, so as to transmit serially the value of the Status Register when blocked in the loop. Apparently, the SR is 1 in this state, which means ARP being sent. Since there is no traffic at that time (according to WireShark), the timeout mechanism seems not to operate, as the status never changes to CLOSED nor is the Timeout flag raised in the interrupt register.
Is this a bug of the W5200 chip?

I added to my code a workaround: a timeout when the state is ARP. So the loop exits and the whole initialisation process is restarted. I have tested this, and so far, it seems to recover as expected.
I am still curious to know the whereabouts of the W5200 regarding this issue.

You must troubleshoot both sides, not only client side.

When reconnecting, your client seem to use same local (source) port number, and of course server will respond with RST/ACK, because the connection with specified local port is already established. You must use another port number for reconnection, and server must have hardware/software means to monitor hung TCP sockets - the open sockets in TCP mode having no activity within specified period of time.

Were your code waiting for enough time for timeout to happen before doing anything else? See formula for the timeout calculation in the datasheet, time calculation it is not that straightforward.

Evgueny, initially my software did not have a timeout mechanism, and it would wait for hours because the chip never raised the timeout flag, nor did it return the socket to the closed state. It was stuck in the ARP state, thus my workaround to put a timeout when in this state.

If you did unplug cable when each device have a data to send, each device might ocurre a timeout.
The reason that server sent a SYN is no more socket to listen is exist at server.

Both server & client can to check the connection alive by sending a keep alive packet.
In W5200, Refer to Sn_CR_KEEPSEND.
In W5500, Refer to Sn_CR_KEEPSEND or Sn_KPALVTR.

Thank you.

Copyright © 2017 WIZnet Co., Ltd. All Rights Reserved.