W5300 does not respond to SYN request from peer

Your nice pic does not show that, but I suspect that this

Means that one of the devices (10.54.53.240) on the subnetwork with satellite router (10.54.53.250) and RTU127 (10.10.53.240??) sends SYN to the 10.15.1.80 (master switch, or something connected to master switch), you seen this SYN packet on the wireshark PC connected to BMD_SIGS_SW07, but do not see ACK packet returning back from 10.15.1.80 to 10.54.53.240?

Where’s W5300 device on the picture? Instead of trying to guess what is going on I propose (again) to insert network bridge PC in between of W5300 and other network, and perform capture there when you get symptoms of the problem to prove that:

  1. W5300 still receives SYN packet to reply to;
  2. W5300 does not respond with ACK packet if it receives this packet.

Dear sensiwood!

As you mentioned, You didn’t use the same mac address in real world, did use for just test.
But your diagram is not.

In your environment, It is possible that the communication doesn’t work well.

The SYN/ACK packet to sent by the server may be send to another client. The packet may cause for W5300 to enter the errorous situation. Because the packet is a same the server ip address and server port in the client-side. If the packet have the sequence number & ack number to wait by client, the connection channel is abnormal and W5300 can’t work well.

As well as, Because the server is already sent the SYN/ACK packet to a wrong client and is entered the wait state to receive the ACK packet from a client, W5300 may be not response the SYN packet from a normal client.

Even if my answer can not to solve your problem, Don’t use the same mac address.
The same mac address usage can not help to solve your problem.

Anyway, I wonder the test result that you modify your code in order to clear a command

Thank you.

Sorry, I cannot understand well about your comment. My customer’ setup can be simplified as below.


Remote device 1 and device 2 are reaching device 3 via the public network. One Cisco router acts as the gateway between the local network and the public network. So on the local network (where device 3 is), device 1 and device 2 both use the MAC address (00:24:97:8a:7e:58) of the LAN Port of the router because that is the only access point to the local network.
For switch’s ARP table, this MAC address will always be mapped to its port 1. It is a very typical scenario I think, and I will not (cannot) change that.
Just think about one example that you configured two IP addresses on the single network interface of your computer, which is connected to the local network via a switch. So two IP addresses share the same MAC address. It is a correct configuration.
MAC is only for LAN.
I hope my explanation makes sense for you.

This is different to my own test setup. On my own test setup, the switch has duplicated MAC address on two ports of it. I have reckoned this is not the typical scenario, and I only used to reproduce the issue.
I am not sure if you can set up the same as I did, reproduce the issue and have a look what status the chip has run into and how to prevent it if possible.

Sorry I cannot test further of code change due to last busy week. I will let you know when I got progress.

Thanks for your support.

Thanks for capturing a typo of IP address in the diagram. Now it is updated as below.

.

What you understood is basically correct.
10.15.1.80 is one Ethernet port of MASTER RTU 15.
All the network interfaces behind each IP address of RTU devices (RTU 15, 125, 127, 128, 129) are W5300.

I appreciate your suggestion about network bridging. Firstly I cannot do it on customer’s site unfortunately; actually it is not easy to do anything on customer site at the moment.

On my setup, I can do it; but maybe at a later time.
I am pretty sure that SYN request has sent to W5300, and W5300 did not reply it. Whether the SYN request is not received by W5300 or not processed, I don’t know. That is why I posted the question here. I ever mailed the dumped register values when I reproduced it on my own setup. I attached them again here. It shows when W5300 did not answer SYN request, all the sockets were free.dumped W5300 registers in error situation.zip (491 Bytes)

Dear Sensiwood!

How are you? I’m so sorry too late to check your issuse.

Your dumpped register values are all correct but I didn’t know why not W5300 work well.

I wonder W5300 can reply to ping-request in this situation, and I wonder the modified code is tested.

Anyway, I checked your code again!

When Sn_IR_SENDOK is checked within function w5300_send_data(), the loop count value seems be too small.
So, I will suggest to you two trials.

  1. Remove loop count

  2. Check Sn_IR_SENDOK after Sn_CR_SEND is performed.

static int w5300_send_data(CPU_PORT port, int s, unsigned char *buf, unsigned int len)
{
       int i;
       if (ethernet[port].ether_protocol == TCP_PROTOCOL)
          w5300UpdateConnTimeout(port, s, CTO_RESET);
    
      bb_disable_int();

      SET_INDIRECT_REG(port, Sn_TX_FIFO(s));
      for (i = 0; i < len; i += 2)
      {
          writeByte(port, IDM_DR, buf[i]);
          writeByte(port, IDM_DR1, buf[i+1]);
      }

      // Notifying packet length, and transmitting packets by sending SEND command.
      w5300IndirectWriteWord(port, Sn_TX_WRSR(s), len >> 16);
      w5300IndirectWriteWord(port, Sn_TX_WRSR2(s), len);

      w5300IndirectWriteWord(port, Sn_CR(s), Sn_CR_SEND);
      while(w5300IndirectReadWord(port, Sn_CR(s)) != 0);    // Should wait until command clear 
      while((w5300IndirectReadWord(port, Sn_IR(s)) & Sn_IR_SEND_OK) == 0)
      {
              if(w5300_get_status(port, s) == SOCK_CLOSED)
             {
                closeSocket(port,s);
                 return FAIL;
              }
       }
       w5300IndirectWriteWord(port, Sn_IR(s), Sn_IR_SEND_OK);
      bb_enable_int();
      return OK;
}

Can you test two trials?
I will wait your good news.

Thank you.

Hello midnightcow,

Happy new year to you!

I am just back to office from the holiday and will look into/try your suggestions in following days.
Thanks for your support!

Hi team,

Sorry for coming back so late on this thread.

Actually I did some work as my background task in the past month. Now based on your suggestions, I think I have got a progress to bypass the issue. Two key changes I have made were:

  1. Before closing the socket, always check if the send buffer has been emptied. Below are the functions I used.
    static void closeSocket(CPU_PORT port, int s)
    {
    T_SOCK *sock = GET_SOCK(port, s);
    int closing = waitForTxDone(port, s);

    if(closing == FAIL)
    applyErratum1(port, s);

    sock->timeWaitTimeout = 0;
    sock->sourcePort = 0;
    sock->remoteRtuAddress = 0;
    }

static int waitForTxDone(CPU_PORT port, int s)
{
T_SOCK *sock = GET_SOCK(port, s);
USHORT loop;

if(sock->txDone)
    return OK;

loop = 1000; // approx 50 ms.
while((w5300IndirectReadWord(port, Sn_IR(s)) & Sn_IR_SEND_OK) == 0)
{
    if(--loop == 0)
        return FAIL;
}

w5300IndirectWriteWord(port, Sn_IR(s), Sn_IR_SEND_OK);
return OK;

}

That means, every time when I close the socket, I will check if the last sending operation has been done.

  1. If the sending activity is not finished within certain period of time (50ms), I will use following function to purge the output buffer and close the socket forcefully.
    void applyErratum1(CPU_PORT port, int s)
    {
    BYTE dummyData[2] = { 0, 0 };
    union IPV4_ADDR ipAddr;

    ipAddr.addrVal = 1;
    w5300_open_socket_mode(port, s, MIN_PORT_NUM - 1, Sn_MR_UDP);
    w5300_set_dipaddr(port, s, ipAddr, TCP_KINGFISHER);

    w5300_write_tx_data(port, s, dummyData, 1);
    closeSocketWithoutWait(port, s);
    }

After changing the socket type to UDP, I do not wait the socket status to be SOCK_UDP in a dead loop; instead I just wait a period of time and then directly start to send the rubbish bytes in output buffer to a void target IP address.

As tested on my own setup, the new firmware doe not have the permanent socket lock up issue any more. Sometimes, within certain period of time, one socket still cannot respond to SYN request as I observed. But the situation can be restored automatically. Anyway, the changes really made a difference. How to coordinate with customer to validate this on site is another story which might not be easy to deploy in near future.

I appreciate your help all through this ticket, and wish all the best to the team.

Thanks!