W5300 does not respond to SYN request from peer


#1

Hello everyone!

Recently there is an Ethernet connection issue of our product which uses W5300. After analyzing the packets, we found when the issue happens, W5300 does not respond to SYN request from the peer.

You can see several occasions in the attachment like packet ID 216, 256, 338, 369.
They are SYN request from 10.54.53.240 (a remote peer) to 10.15.1.80 (W5300), but W5300 does not give SYN ACK.


We believe there are still free sockets on W5300 and as we tested, even if all sockets are occupied, W5300 should send a RST instead of keeping silent. As I understand from the specification, the transition from SOCK_LISTEN through SOCK_SYNRECV to SOCK_ESTABLISHED should be controlled by W5300 itself not application.

Interestingly, W5300 is still working when this issue happens because another remote site 10.10.28.240 is talking to it. It seems W5300 just cannot respond anything to new SYN request at that moment. A software RST to MR1 of W5300 can restore the situation and it can accept new SYN and talk to both sites correctly.

Both remote sites (10.54.53.240 and 10.10.28.240) use the same MAC address. This is because two sites are connected to the local site via WAN and a router. The router is connected to one port of one local switch and 10.15.1.80 is connected to another port of the same switch. It should be a normal topology.

Did anyone come across the similar issue?
Thanks in advance!
Wiznet packets capture.zip (11.2 KB)


#2

Hi

This problem is due to your network configuration.
10.54.53.240 is in a different network band from 10.15.1.80, and it is judged that the packet transmitted from 10.54.53.240 does not reach 10.15.1.80.


#3

Hi Becky,

Thanks a lot for your reply.
Do you mean sub-net by ‘network band’?
Yes apparently both 10.54.53.240 and 10.10.28.240 are not in the same network with 10.15.1.80; they are the source IP addresses from the remote sites. But they are correctly routed into the local network where 10.15.1.80. There is one router between 10.15.1.80 and WAN. The LAN port of the router has one IP address 10.15.1.250 which is used by the local host as the gateway. I think it is a typical scenario of routing.

I attached the other Wireshark capture which shows a successful case in the same scenario, so the network architecture should be good. Note the SYN and SYN ACK at packet ID 296, 297.Wiznet packets capture when good condition.zip (12.7 KB)

Sorry I missed one point in the original description. In our project, both remote IP addresses could communicate with 10.15.1.80 successfully at the start. But if you leave the system run about 2 weeks, the issue as described above would happen.
Two capture files show the apparent differences.

Thanks!


#4

It looks like really W5300 ignores the packets not responding to them, even to the retramsmitted one - thus it is not just a kind of “packet loss” - it is steady behavior in some W5300 state.

You must prove it - that when this happens, you have at least one socket in LISTEN state. If you have diagnostic terminal to W5300, then program its driver to print all common and socket registers into this diagnostic output (e.g. pressing a key on the diag terminal) so that you can see whole the configuration when issue happens.

I suspect it is not correct assumption. In my web server application, when client’s web browser issues more requests than sockets available, all new requests are “hung” - browser keeps waiting, until it timeouts. This means for me that chip (I use W5100) simply does not respond - otherwise if browser’s stack would receive RST, it stopped waiting immediately.

Yes, it is controlled by the W5300 TCP/IP stack. Thus if something stops working, you first must look, carefully and thoroughly, into the current effective configuration of the chip at the time of the issue.

It confirms that chip is still operational, and there’s something else than faulty hardware.

Also logical as you reset whole configuration of the chip.

My conclusion - further troubleshooting is needed dumping all possible meaningful registers at the time after issue starts happening. I am sure we will find something interesting in the output.


#5

Hi Eugeny,

Sorry for the late reply.
I appreciate very much for your detailed comments.

And it is not possible at the moment to debug anything on the site so I was still trying to reproduce it on my own test bench. To dump all the registers is a good idea and I will try.

Hope I can come back with some useful information in following days.
Thanks again for your help!


#6

Hi Eugeny,

I did some tests on my test bench. The only way to reproduce the issue is, to set up two devices with the different IP addresses but the same MAC addresses.

Device 1 (TCP client), IP 192.168.30.10, MAC 00:01:19:01:01:01

Device 2 (TCP client), IP 192.168.30.11, MAC 00:01:19:01:01:01

Device 3 (TCP server), IP 192.168.30.17, MAC 00:01:19:64:01:02

All devices are connected to one switch to form a LAN.
Device 1 and 2 send the request to device 3 and device 3 responses.

I let the system run for some time (10 minutes) and I found device 3 does not send any response to SYN request any more. The only way to recover the communication is to reset the W5300. This is the similar phenomenon as our customer reports.

I reckon this is not desirable for LAN configuration because typically the MAC address over the LAN should be unique. But at the moment it is the only way to reproduce the issue stably.

And even if there are two MAC addresses on the LAN, this will confuse the switch only. For ARP table of the device, I think it should be OK if several target IP addresses use the same MAC address. Correct me if I am wrong.

When the issue happens, the register values I dumped from the firmware are:

MR: 0x01

IR: 0x00

IMR: 0x00

SHAR: 00:01:19:64:01:02

GAR: 0x0000

SUBR: 0x0fff

SIPR: 0xc0a81e11

RTR: 0xc350

RCR: 0x0008

TMSR: 0x0808080808080808

RMSR: 0x0808080808080808

MTYPER: 0x00ff

UIPR: 0x00000000

UPORTR: 0x0000

FMTUR: 0x0000

SOCK0

MR21

CR0

SR14

SOCK1

MR21

CR0

SR14

SOCK2

MR21

CR0

SR14

SOCK3

MR21

CR0

SR14

SOCK4

MR21

CR0

SR14

SOCK5

MR21

CR0

SR14

SOCK6

MR21

CR0

SR14

SOCK7

MR21

CR0

SR14

I cannot see obvious abnormal values from above. All sockets are in LISTENING status.

I also captured one situation that all sockets are in 0x17 status ESTABLISHED and the further SYN request will be responded with RST by w5300. It is reasonable.

And during the test, I ever read 0x1e and 0x11 status value of the socket randomly but they are not described in the manual. What do they mean?

Thanks!


#7

Dear sensiwood & Eugeny!
Thanks for your interest in W5300.

There is two cast W5300 is not responsiblbe to any packets.

  1. Network Information regsiters maybe changed by any reason.
    => Check SIPR, SUBR, GAR, SHAR to correct
  2. W5300 maybe entered into Erratum #1.(Refer to theerrata sheet
    => refer to close function in the iolibrary
    Version:1.0 StartHTML:000000235 EndHTML:000013417 StartFragment:000003403 EndFragment:000013327 StartSelection:000003469 EndSelection:000013317 SourceURL:https://github.com/Wiznet/ioLibrary_Driver/blob/master/Ethernet/socket.c

int8_t close(uint8_t sn)
{
CHECK_SOCKNUM();
//A20160426 : Applied the erratum 1 of W5300
#if (WIZCHIP == 5300)
//M20160503 : Wrong socket parameter. s -> sn
//if( ((getSn_MR(s)& 0x0F) == Sn_MR_TCP) && (getSn_TX_FSR(s) != getSn_TxMAX(s)) )
if( ((getSn_MR(sn)& 0x0F) == Sn_MR_TCP) && (getSn_TX_FSR(sn) != getSn_TxMAX(sn)) )
{
uint8_t destip[4] = {0, 0, 0, 1};
// TODO
// You can wait for completing to sending data;
// wait about 1 second;
// if you have completed to send data, skip the code of erratum 1
// ex> wait_1s();
// if (getSn_TX_FSR(s) == getSn_TxMAX(s)) continue;
//
//M20160503 : The socket() of close() calls close() itself again. It occures a infinite loop - close()->socket()->close()->socket()-> ~
//socket(s,Sn_MR_UDP,0x3000,0);
//sendto(s,destip,1,destip,0x3000); // send the dummy data to an unknown destination(0.0.0.1).
setSn_MR(sn,Sn_MR_UDP);
setSn_PORTR(sn, 0x3000);
setSn_CR(sn,Sn_CR_OPEN);
while(getSn_CR(sn) != 0);
while(getSn_SR(sn) != SOCK_UDP);
sendto(sn,destip,1,destip,0x3000); // send the dummy data to an unknown destination(0.0.0.1).
};
#endif
setSn_CR(sn,Sn_CR_CLOSE);
/* wait to process the command… /
while( getSn_CR(sn) );
/
clear all interrupt of the socket. */
setSn_IR(sn, 0xFF);
//A20150401 : Release the sock_io_mode of socket n.
sock_io_mode &= ~(1<<sn);
//
sock_is_sending &= ~(1<<sn);
sock_remained_size[sn] = 0;
sock_pack_info[sn] = 0;
while(getSn_SR(sn) != SOCK_CLOSED);
return SOCK_OK;
}

Thank you.

Please check