UDP server stop working after a while


#1

Hello,

I have setup 4 sockets, 2 TCP servers, 1 TCP client and 1 UDP server. Using W5500 and ARM M4.

On a PC machine I have an UDP client and another UDP server. The client send a packet to 255.255.255.255, broadcast on a specified port. The W500 UDP server is supposed to get that packet and close the connection.

Everything works just fine but at some point the UDP server stops working. The client keeps broadcasting but on the W5500 side no longer acknowledges the packet.

This si what it;'s on the W5500 side while loop

    ...
    uint8_t udp_socket_message = getSn_SR(udpServerSocket);
    switch(udp_socket_message)
    {
      case SOCK_UDP:;        
        uint32_t udpRecvSize = getSn_RX_RSR(udpServerSocket);
        if (udpRecvSize > 0) 
        ....
     }
     disconnect(udpServerSocket);
     ...   

The problem is that udpRecvSize is being reported as 0 when the problem occurs.

Board reset will fix the issue and repeated broadcast messages are processed correct but later it stops working again.

Should I do a specific check prior to calling genSn_SR?

Thanks.


#2

Thus the problem is that at some point in time W5500 stops receiving broadcast packets, right?

Do W5500-based TCP servers continue working when it happens?
If yes, please make dump of all the registers from UDP socket to see if they are not corrupt and still reflect proper operating mode. Also please dump Sn_RXBUF_SIZE of all sockets to ensure they all are still configured properly and space does not exceed size of RX memory.


#3

Yes, all the other TCP sockets are working fine.

I get your point but the question is when to check the registers? Shall I assume that if GetSn_SR returns SOCK_UDP there is always something in the receive buffer and the size returned by getSn_RX_RSR should be always greater than 0? And if it’s 0 then something is wrong and I shall check the registry?

Thanks.


#4

Works = W5500 receives packets. Stops working = remote device sends packets, but W5500 does not receive them (RSR is 0).
Check when it stops working = something has changed, we need to ensure that everything on W5500 side is intact.
If W5500’s registers are ok, then probably something wrong is with packets - check them using Wireshark to see that they contain what you expect W5500 to get (in terms of UDP header and packet contents).
You send broadcast packets and you will be able to see them using any device on the same physical network.


#5

I am using Wireshark for debugging, that’s how I see what’s being sent and received, I’ll add the registers dump code and hopefully capture an exception to check what’s in there.

I’ll keep you posted.


#6

Not sure about that, I got 0 received length at startup, before sending anything. reg dumps

getSn_MR 02
getSn_CR 00
getSn_IMR 1F
getSn_IR 00
getSn_SR 22
getSn_MSSR C0
getSn_TX_RD 00
getSn_RX_RD 00
getSn_TX_FSR 00
getSn_KPALVTR 00

UDP getSn_RXBUF_SIZE 01
UDP getSn_TXBUF_SIZE 01

FW getSn_RXBUF_SIZE 02
FW getSn_TXBUF_SIZE 02

WEB SERVER getSn_RXBUF_SIZE 04
WEB SERVER regWEB_Sn_TXBUF_SIZE 04

TCP Client getSn_RXBUF_SIZE 00
TCP Client getSn_TXBUF_SIZE 00

TCP Server getSn_RXBUF_SIZE 08
TCP Server getSn_TXBUF_SIZE 08

TCP client was not initialized at that point.

All buffers are fine, as per socket buffer definition:

uint8_t txSize[WIZCHIP_MAX_SOC_NUM] = {0,0,8,4,2,1,1,0};
uint8_t rxSize[WIZCHIP_MAX_SOC_NUM] = {0,0,8,4,2,1,1,0};

However, it works and does reply to packets, when no packets are sent by PC client getSn_RX_RSR returns 0.


#7

I am sorry I do not understand you. RSR is received data size. If nothing is received, it is 0. It is normal state, nothing special about it.
If chip did not receive anything its RSR is 0, and this is how you identify if it received anything.
In the dump below you write

This is strange because if TX buffer is empty it is set to maximum 0x800, and it is two byte (word) value, you display it as one byte. I would expect you identifying it as 0000 or 0800 (this is how it normally looks like if there’s nothing in TX buffer to send).


#8

Obviously we were talking about different things, I was asking if the recv value of 0 means there is a fault and I have thought it is.

Shall RTFM but I won’t exactly call W5500 datasheet a good source of info. Same about W5100, just basics.

Is anyone using this chip in a commercial product or is all hobby development?


#9

Yes please study W5100 - it seems to have more useful information on how to use the chip family.

Let’s think about it differently - can the chip family satisfy commercial product’s requirements? I do not know about WIZnet strategy regarding their product development and product roadmap, but it is clear that products are having hardware limitations (like buffer memory and number of sockets) which can not be changed by additional hardware design (e.g. by adding SRAM chip to get more socket memory), but at the same time WIZnet chips provide extremely easy product development if you know how to apply them properly.


#10

Lets read between the lines and agree it’s not suitable for commercial products, a good commercial product has good support, this one does not. I am quite certain you can’t get support even if you want to pay for it. Should be good enough for a hobby project tho.

About “if you know how to” well, people don’t get the knowledge from the sky. They read proper documents with proper examples and case studies. When they don’t have these they spend time debugging and understanding how it works. May as well ask in a forum where no one from the manufacturer bother to reply. Rant over, going back to the datasheet :slight_smile:


#11

[quote=“cio74”]I am quite certain you can’t get support even if you want to pay for it. Should be good enough for a hobby project tho.

When they don’t have these they spend time debugging and understanding how it works.
[/quote]
That’s why I am here… I got to know how it works - you can look at the history of my posts - we have eaten a ton of sh*t together with WIZnet people until we found out what was wrong with my design.


#12

I have managed to replicate the issue, this are the registry at the time the fault occurs

[ul]
getSn_MR 02
getSn_CR 00
getSn_IMR 1F
getSn_IR 04
getSn_SR 22
getSn_MSSR 05C0
getSn_TX_RD 0980
getSn_TX_WR 0980
getSn_RX_RD 00DC
getSn_RX_WR 00DC
getSn_TX_FSR 0400
getSn_KPALVTR 00
UDP getSn_RXBUF_SIZE 01
UDP getSn_TXBUF_SIZE 01
FW getSn_RXBUF_SIZE 02
FW getSn_TXBUF_SIZE 02
WEB SERVER getSn_RXBUF_SIZE 04
WEB SERVER regWEB_Sn_TXBUF_SIZE 04
TCP Client getSn_RXBUF_SIZE 00
TCP Client getSn_TXBUF_SIZE 00
TCP Server getSn_RXBUF_SIZE 08
TCP Server getSn_TXBUF_SIZE 08
[/ul]


#13

I do not see source port # (Sn_PORT) and RX received size (Sn_RX_RSR) in the dump. If port # is wrong, I think chip will not receive anything even as broadcast. Does your UDP access algorithm obey pages 50-55 of the W5100 datasheet? Don’t you forget to update RX_RD pointer and issue RECV command after you process received packet(s)?
Edit: and while you have only 1K buffer for UDP socket (0-3ff), you update all pointers using 16-bit math (e.g. RX_RD=03e5, data received is 0040 -> new RX_RD is 0425 and not 0025) while data buffer position within its window will be 025.


#14

I do not manually update any register, not sure why do I have to do that anyway, do you happen to know the answer?

Sn_RX_RSR is 0, see initial post, the dump is when recv value is 0


#15

Simple question: how do you think W5500 will know that you grabbed data it received before it will reuse same its memory cells to put new packet into? Download W5100 datasheet, and thoroughly read pages 50-55 about UDP. It explains what should happen to W5100 (5500) registers for it to properly continue receiving and sending data. Please note that I said “what should happen” and not “what you should do” :blush: because I do not know how you program the chip and which routines you use to program it (the code is “…” in your first post in this thread).


#16

The chip logic knows I have issued a read command and can update it’s offset, it returns the length of the read command, it knows where to move the pointer.

Yes, I have seen the datasheet, I am asking what’s the logic behind it.

Edit: have read the library code and it does update the Sn_RX_RD

int32_t recv(uint8_t sn, uint8_t * buf, uint16_t len)
{
   uint8_t  tmp = 0;
   uint16_t recvsize = 0;
   CHECK_SOCKNUM();
   CHECK_SOCKMODE(Sn_MR_TCP);
   CHECK_SOCKDATA();
   
   recvsize = getSn_RxMAX(sn);
   if(recvsize < len) len = recvsize;
   while(1)
   {
      recvsize = getSn_RX_RSR(sn);
      tmp = getSn_SR(sn);
      if (tmp != SOCK_ESTABLISHED)
      {
         if(tmp == SOCK_CLOSE_WAIT)
         {
            if(recvsize != 0) break;
            else if(getSn_TX_FSR(sn) == getSn_TxMAX(sn))
            {
               close(sn);
               return SOCKERR_SOCKSTATUS;
            }
         }
         else
         {
            close(sn);
            return SOCKERR_SOCKSTATUS;
         }
      }
      if((sock_io_mode & (1<<sn)) && (recvsize == 0)) return SOCK_BUSY;
      if(recvsize != 0) break;
   };
   if(recvsize < len) len = recvsize;
   wiz_recv_data(sn, buf, len);
   setSn_CR(sn,Sn_CR_RECV);
   while(getSn_CR(sn));
   return len;
}
void wiz_recv_data(uint8_t sn, uint8_t *wizdata, uint16_t len)
{
   uint16_t ptr = 0;
   uint32_t addrsel = 0;
   
   if(len == 0) return;
   ptr = getSn_RX_RD(sn);
   addrsel = ((uint32_t)ptr << 8) + (WIZCHIP_RXBUF_BLOCK(sn) << 3);
   WIZCHIP_READ_BUF(addrsel, wizdata, len);
   ptr += len;
   
   setSn_RX_RD(sn,ptr);
}

So it’s all there.


#17

Something else is the issue, not ordinary read/write, this works and it just stops working under a specific scenario.

I have managed to easily reproduce the issue, if I access the web server socket then the UDP server issue is generated, always recv returns 0 and the previous registers dump applies.


#18

Which source ports you use for web server and UDP server?


#19

No port for server, just socket library used.

All socket servers are following the below basic implementation.

Create socket, init socket, then a loop made of:

  while (1) {
    switch(getSn_SR(tcpSocket)){
      case SOCK_LISTEN: break;
      case SOCK_ESTABLISHED: do work here; break;
      case SOCK_CLOSE_WAIT: disconnect(tcpSocket); break;
      case SOCK_INIT: listen(tcpSocket); break;
      case SOCK_CLOSED: if( !socket(tcpSocket, Sn_MR_TCP, netIPPort, 0x00) ) ...; break; //re-use socket
      default: break;
   }

and on the UDP server


   while(1){
     uint8_t udp_socket_message = getSn_SR(udpServerSocket);
     switch(udp_socket_message){
       case SOCK_UDP: do work here; disconnect(udpServerSocket); break;
       case SOCK_LISTEN: break;
       case SOCK_ESTABLISHED: break;
       case SOCK_CLOSE_WAIT: disconnect(udp_socket_message); break;
       case SOCK_INIT: listen(udp_socket_message); break;
       case SOCK_CLOSED: if(!socket(udp_socket_message, Sn_MR_UDP, UDP_SERVER_PORT, 0x00)) ...; break; //re-use socket
     }

while copy/paste this code it struck me the name of the socket variable, should be ‘udpServerSocket’ and not ‘udp_socket_message’

Changed it and of course now all works just fine. Code completion on IDE’s is not always great so does poor variable definition with ambiguous/similar named.

Anyway, looking above, as a generic server implementation, are the cases doing what they should, do I miss a specific case or any recommendation.

Thanks.


#20

Do I understand properly that now your application works properly and the cause was that you forgot to initialize source port numbers at W5500 side?
Because I do not see your answer regarding source port numbers on the sockets. It is one of the weak points in standard libraries and some people had problems due to it.
Or it is only and just using wrong variable because names looks similar…