W5300 does not respond to SYN request from peer


#1

Hello everyone!

Recently there is an Ethernet connection issue of our product which uses W5300. After analyzing the packets, we found when the issue happens, W5300 does not respond to SYN request from the peer.

You can see several occasions in the attachment like packet ID 216, 256, 338, 369.
They are SYN request from 10.54.53.240 (a remote peer) to 10.15.1.80 (W5300), but W5300 does not give SYN ACK.


We believe there are still free sockets on W5300 and as we tested, even if all sockets are occupied, W5300 should send a RST instead of keeping silent. As I understand from the specification, the transition from SOCK_LISTEN through SOCK_SYNRECV to SOCK_ESTABLISHED should be controlled by W5300 itself not application.

Interestingly, W5300 is still working when this issue happens because another remote site 10.10.28.240 is talking to it. It seems W5300 just cannot respond anything to new SYN request at that moment. A software RST to MR1 of W5300 can restore the situation and it can accept new SYN and talk to both sites correctly.

Both remote sites (10.54.53.240 and 10.10.28.240) use the same MAC address. This is because two sites are connected to the local site via WAN and a router. The router is connected to one port of one local switch and 10.15.1.80 is connected to another port of the same switch. It should be a normal topology.

Did anyone come across the similar issue?
Thanks in advance!
Wiznet packets capture.zip (11.2 KB)


#2

Hi

This problem is due to your network configuration.
10.54.53.240 is in a different network band from 10.15.1.80, and it is judged that the packet transmitted from 10.54.53.240 does not reach 10.15.1.80.


#3

Hi Becky,

Thanks a lot for your reply.
Do you mean sub-net by ‘network band’?
Yes apparently both 10.54.53.240 and 10.10.28.240 are not in the same network with 10.15.1.80; they are the source IP addresses from the remote sites. But they are correctly routed into the local network where 10.15.1.80. There is one router between 10.15.1.80 and WAN. The LAN port of the router has one IP address 10.15.1.250 which is used by the local host as the gateway. I think it is a typical scenario of routing.

I attached the other Wireshark capture which shows a successful case in the same scenario, so the network architecture should be good. Note the SYN and SYN ACK at packet ID 296, 297.Wiznet packets capture when good condition.zip (12.7 KB)

Sorry I missed one point in the original description. In our project, both remote IP addresses could communicate with 10.15.1.80 successfully at the start. But if you leave the system run about 2 weeks, the issue as described above would happen.
Two capture files show the apparent differences.

Thanks!


#4

It looks like really W5300 ignores the packets not responding to them, even to the retramsmitted one - thus it is not just a kind of “packet loss” - it is steady behavior in some W5300 state.

You must prove it - that when this happens, you have at least one socket in LISTEN state. If you have diagnostic terminal to W5300, then program its driver to print all common and socket registers into this diagnostic output (e.g. pressing a key on the diag terminal) so that you can see whole the configuration when issue happens.

I suspect it is not correct assumption. In my web server application, when client’s web browser issues more requests than sockets available, all new requests are “hung” - browser keeps waiting, until it timeouts. This means for me that chip (I use W5100) simply does not respond - otherwise if browser’s stack would receive RST, it stopped waiting immediately.

Yes, it is controlled by the W5300 TCP/IP stack. Thus if something stops working, you first must look, carefully and thoroughly, into the current effective configuration of the chip at the time of the issue.

It confirms that chip is still operational, and there’s something else than faulty hardware.

Also logical as you reset whole configuration of the chip.

My conclusion - further troubleshooting is needed dumping all possible meaningful registers at the time after issue starts happening. I am sure we will find something interesting in the output.


#5

Hi Eugeny,

Sorry for the late reply.
I appreciate very much for your detailed comments.

And it is not possible at the moment to debug anything on the site so I was still trying to reproduce it on my own test bench. To dump all the registers is a good idea and I will try.

Hope I can come back with some useful information in following days.
Thanks again for your help!


#6

Hi Eugeny,

I did some tests on my test bench. The only way to reproduce the issue is, to set up two devices with the different IP addresses but the same MAC addresses.

Device 1 (TCP client), IP 192.168.30.10, MAC 00:01:19:01:01:01

Device 2 (TCP client), IP 192.168.30.11, MAC 00:01:19:01:01:01

Device 3 (TCP server), IP 192.168.30.17, MAC 00:01:19:64:01:02

All devices are connected to one switch to form a LAN.
Device 1 and 2 send the request to device 3 and device 3 responses.

I let the system run for some time (10 minutes) and I found device 3 does not send any response to SYN request any more. The only way to recover the communication is to reset the W5300. This is the similar phenomenon as our customer reports.

I reckon this is not desirable for LAN configuration because typically the MAC address over the LAN should be unique. But at the moment it is the only way to reproduce the issue stably.

And even if there are two MAC addresses on the LAN, this will confuse the switch only. For ARP table of the device, I think it should be OK if several target IP addresses use the same MAC address. Correct me if I am wrong.

When the issue happens, the register values I dumped from the firmware are:

MR: 0x01

IR: 0x00

IMR: 0x00

SHAR: 00:01:19:64:01:02

GAR: 0x0000

SUBR: 0x0fff

SIPR: 0xc0a81e11

RTR: 0xc350

RCR: 0x0008

TMSR: 0x0808080808080808

RMSR: 0x0808080808080808

MTYPER: 0x00ff

UIPR: 0x00000000

UPORTR: 0x0000

FMTUR: 0x0000

SOCK0

MR21

CR0

SR14

SOCK1

MR21

CR0

SR14

SOCK2

MR21

CR0

SR14

SOCK3

MR21

CR0

SR14

SOCK4

MR21

CR0

SR14

SOCK5

MR21

CR0

SR14

SOCK6

MR21

CR0

SR14

SOCK7

MR21

CR0

SR14

I cannot see obvious abnormal values from above. All sockets are in LISTENING status.

I also captured one situation that all sockets are in 0x17 status ESTABLISHED and the further SYN request will be responded with RST by w5300. It is reasonable.

And during the test, I ever read 0x1e and 0x11 status value of the socket randomly but they are not described in the manual. What do they mean?

Thanks!


#7

Dear sensiwood & Eugeny!
Thanks for your interest in W5300.

There is two cast W5300 is not responsiblbe to any packets.

  1. Network Information regsiters maybe changed by any reason.
    => Check SIPR, SUBR, GAR, SHAR to correct
  2. W5300 maybe entered into Erratum #1.(Refer to theerrata sheet
    => refer to close function in the iolibrary
    Version:1.0 StartHTML:000000235 EndHTML:000013417 StartFragment:000003403 EndFragment:000013327 StartSelection:000003469 EndSelection:000013317 SourceURL:https://github.com/Wiznet/ioLibrary_Driver/blob/master/Ethernet/socket.c

int8_t close(uint8_t sn)
{
CHECK_SOCKNUM();
//A20160426 : Applied the erratum 1 of W5300
#if (WIZCHIP == 5300)
//M20160503 : Wrong socket parameter. s -> sn
//if( ((getSn_MR(s)& 0x0F) == Sn_MR_TCP) && (getSn_TX_FSR(s) != getSn_TxMAX(s)) )
if( ((getSn_MR(sn)& 0x0F) == Sn_MR_TCP) && (getSn_TX_FSR(sn) != getSn_TxMAX(sn)) )
{
uint8_t destip[4] = {0, 0, 0, 1};
// TODO
// You can wait for completing to sending data;
// wait about 1 second;
// if you have completed to send data, skip the code of erratum 1
// ex> wait_1s();
// if (getSn_TX_FSR(s) == getSn_TxMAX(s)) continue;
//
//M20160503 : The socket() of close() calls close() itself again. It occures a infinite loop - close()->socket()->close()->socket()-> ~
//socket(s,Sn_MR_UDP,0x3000,0);
//sendto(s,destip,1,destip,0x3000); // send the dummy data to an unknown destination(0.0.0.1).
setSn_MR(sn,Sn_MR_UDP);
setSn_PORTR(sn, 0x3000);
setSn_CR(sn,Sn_CR_OPEN);
while(getSn_CR(sn) != 0);
while(getSn_SR(sn) != SOCK_UDP);
sendto(sn,destip,1,destip,0x3000); // send the dummy data to an unknown destination(0.0.0.1).
};
#endif
setSn_CR(sn,Sn_CR_CLOSE);
/* wait to process the command… /
while( getSn_CR(sn) );
/
clear all interrupt of the socket. */
setSn_IR(sn, 0xFF);
//A20150401 : Release the sock_io_mode of socket n.
sock_io_mode &= ~(1<<sn);
//
sock_is_sending &= ~(1<<sn);
sock_remained_size[sn] = 0;
sock_pack_info[sn] = 0;
while(getSn_SR(sn) != SOCK_CLOSED);
return SOCK_OK;
}

Thank you.

Please check


#8

Dear support team of Wiznet,

I am still waiting for your response on my query.

Our customer keeps having similar phenomenon every 2 weeks or so.

Since the workaround provided by you does not work on my setup, I cannot positively reply anything to our customer yet.

The situation is getting more pressured when time elapsed without progress.

I hope you can understand it.

I think I have provided all the details I tested and all the dumped register values when the issue happened.

If you think that is not good enough for you to start working with, please let me know. I will provide more if possible.

Thanks!


#9

Dear sensiwood!
Im sorry my answer can’t help you.

Assum W5300’s regsiters are correct. If W5300 do not send to SYN/ACK, W5300 maybe do not receive the SYN packet from a peer.

As your first questioin, Your two device share the same MAC address.

Can you check correctly send the SYN packet to the W5300 by using dump-hub switch or port-mirroing function?

I wonder W5300 can receive the SYN packet from the your network.

Also, As I know, MAC address should be unique every device. why the same mac is shared?

Thank you.


#10

Dear support team,

Thanks for your reply.

I believe _SYN packet has been sent from the peer device and this SYN has also been captured by Wireshark using the port-mirroring switch, as I shared in my original message.

I re-pasted below for your reference.

And I am sure that when W5300 does not reply to new SYN packet, the existing established socket in W5300 still works.


As illustrated above, you can identify that @216 packet, there is a new SYN request from 10.54.53.240 to 10.15.1.80, but 10.15.1.80 (W5300) does not reply. But one existing socket on W5300 still works because 10.15.1.80 can still reply to 10.10.28.240.

About your second question about MAC address. Yes typically MAC address should be unique in one LAN. In this test, I only used this special configuration to reproduce the problem faster. In this configuration, W5300 opens and closes sockets very frequently which can produce the same phenomenon (W5300 not replying SYN request) in several minutes.

I think the errata sheet you shared was helpful; I need a workable method to clear the buffer and close the socket safely. Using your suggested workaround, the socket will not change into UDP mode, so my test program is locked up here. I re-attached my original email for your reference, which also includes the dumped register value.

Please support. If you need further verification or assistance from my side, please let me know.
Re Ticket#2018111201000127 W5300 does not respond to SYN request from peer.zip (114.8 KB)


#11

In this setup, if routing device is intelligent, it will have to decide which route to send the packets. It may simply appear that SYN packets are sent to wrong wire.

I would do it differently. I would put PC (Win or Linux) with two LAN cards in bridge mode between switch and W5300, and set up Wireshark on it. In this setup you will be able to see PC’s both interfaces, and be sure that respecive packets are travelling in the wire up to the W5300’s magjack.


#12

I have to clarify that, on my setup, I don’t use router. I only used a switch. The diagram is below.

Let the system run about 10 minutes and then device 3 cannot respond to any SYN request from device 1 or device 2. I reckon that two devices with same MAC address on the same LAN will confuse the switch, but this is the only scenario I can reproduce the same phenomenon reported by customer. After I remove the duplicated MAC address device from the network and rebooted the switch, the device 3 cannot still reply to incoming SYN request. So I suppose W5300 run into an unexpected status.

Write RST to MR1 register can restore W5300, but that will reset other working sockets on the same W5300. So I am querying if any method to prevent W5300 go into that status.


#13

Thanks for sharing this good idea. I will try to figure out how to do it on my setup.


#14

That does not guarantee that this switching device performs no packet filtering.

I suspect it is normal. After some time of operation intermediate network device decides on illegal devices and “disconnects” them.

Depending on the device, rebooting it may not clear cache. I think you must manually perform commands to reinitializate the tables / clear them.

Your diagram shows 3 W5300 devices, but you say that device 3 is affected. Clear everything, and swap switch ports of device 1 and device 2 - will another physical device become unresponsive?


#15

I didn’t observe device 1 or 2 unresponsive because in this test, device 1 and 2 keep initiating requests to device 3.
Device 3 replies the requests from device 1 and 2.

When issue happened, device 1 and 2 still sent SYN request to device 3.


#16

Dear Sensiwood!

As you mentioned, It is not a problem that two same MAC address is used in two device.
But, Eugeny & I think different.

To check the same mac issue, You can try to turn your switch off and on in the issed situatiion.
If you are still in the issued situation after your switch is reboot, the same mac issued is not problem any more.

Anyway, I wonder why not a socket is opened as UDP and stocked.

So I have checked your source code again, I hope to quest you have applied your code to as previous my code.

All SOCKET command such as OPEN, SEND, RECV and etc, should be checked to be cleared as 0.

For example,
setSn_CR(sn,Sn_CR_OPEN);
while(getSn_CR(sn) != 0); //<- must

I want all your code to apply it.
Also, To avoid miss-understanding situation, I want you don’t use the same mac address.

And I need a original capture fiel, not a figture.

Thank you.


#17

Dear support team,

Thanks a lot for your reply and support.

Actually I agree that duplicated MAC address within one LAN is not a correct configuration. And the duplicate MAC address will confuse the switch because the switch detects two ports have the the same MAC address connected so when it receives the packet targeting to that MAC, it does not know which port it should forward to. I totally understood the comments from you and Eugeny. However very unluckily, it is the only scenario I can reproduce the issue so far.

I have uploaded the wireshark file in my original message of this topic. Please check.
That is the original wireshark file captured on the real site and returned by my customer.
In this capture file, 10.15.1.80 is W5300, normally it will receive the data from two IP addresses:
10.10.28.240
10.54.53.240
They are not in the same sub-net but that is OK because the routers have been configured correct to bridge and forward the packets correctly.

When the problem happened, 10.15.1.80 cannot reply SYN request from 10.54.53.240 anymore. The only way to restore the communication is, rebooting our device which will reboot W5300 from hardware level as well. This is the origin the whole story and issue.

It is not possible for me to set up the same network as our customer, so I used a simpler system to reproduce the same phenomenon faster (I mean using that duplicate MAC scenario).

I will double check if W5300 can get back when I switch off the switch and wait enough long time.
As well I will check my coding issues as you suggested. I appreciate you pointed that out.


#18

Symptoms may look like the same as at your customer site, but the cause might be different.

Try simply disconnecting and reconnecting network jack from W5300 for a minute - causing switch’s port going down and then up.


#19

I have checked the previous orgianl captured file. But, This is not helpful to solve the issue.
I need that the deivce 10.54.53.240 normally works and after then enters into no reply situation.
But, I didn’t find some normal operation packet as following picture.

I need the capture file that the device 10.54.53.240 changes the normal work to abnormal.

And I wonder 3 devices are all W5300.

Can you explain more detail about network environment and your board with two W5300 chips?

Thank you.


#20

Hi,

I have attached a diagram of the network we were talking about.
You can find that 10.15.1.80 is the master device which sits in the control center.
10.54.53.240 is a port of WAUCHOPE, 10.10.28.240 is a port of TELEGRAPH. Both of them are remote stations and connected to the master via public network. There is one router, BMD_SIGS_RTR1, as the interface between the control room and WAN. You can see from the wireshark capture file that, 10.54.53.240 and 10.10.28.240 share the same MAC but I think that is correct because they go through the same router so that MAC address should be of the LAN port of the router. That LAN port of router is connected to the switch in the control room thus connected to the master device.


All these devices use W5300.

It is not easy to capture the data exactly when the event happens. The phenomenon occurs every couple of weeks and the customer cannot set up Wireshark for that long period of time. But I will try to find opportunities.

And I confirmed again on my setup that, after the issue happened, W5300 cannot be restored automatically. I removed the devices with duplicate MAC, waited at least 24 hours and even factory reset my switch. W5300 still refuses replying SYN request, until I restarted the system.