W5500 maximum number of successfull TCP connection

Dear Everybody!

In one of my application I recognize the folowing:
I open two TCP listen socket on a different port (for example port H and J).
After ~85000 successfull tcp session on socket H the socket is not respond for any packet.
BUT the chip itself and socket J still works fine without any problem.
I regognize this more times on more different istance of our hardware.

Have anybody recognoze this probem?
Any idea to workaround it?

A workaround what I known is chip reset on every week, but it is unworthy for Wiznet.

Thanks for any advise!

kzsolt

On my W5500 based project, I have 4 sockets used for HTTP web server, 1 socket for HTTP client, 1 socket for MQTT client, 1 socket for DHCP client and 1 socket for DNS client. So the answer to your question is no.

Do you have Wireshark capture to show the problem?

And you count more than 85000 sueecssfull TCP connection (open-data-close) on any of this socket?

Moreover we use tcpdump.
But as I wrote I see incomming TCP open from a remote host, but no answer from wiz on this socket (after ~85T successfull connection).

There is new information.

The socket is hang on Established state.

Moreover the number of socket open-data-close (to reach this problem) is near ~28900.
The number of socket related interrupt is ~86000.

That may be a correct behavior. As your implementation is TCP server, it will wait for incoming data, then respond with ACK, and send some data if needed, and wait for ACK.

If your W5500 server is just waiting for incoming data, it will sit and wait for it without any action, and internal chip timeout has no effect here.

In general your socket is stuck because it does not get FIN packet from the client, which will tell server to finish with the communication on this socket, close it and reset it for listening.

Thus the only way for you is:

  1. Build some kind of heartbit into the protocol over the socket. Not sutable if you serve HTTP (for example) as you can not send any arbitrary data into the wire in this case;
  2. build some timeout into the application driving W5500 server. For example, if server sees that socket is open, but there’s no incoming data activity for 15 seconds and server has nothing to say, then just terminate the socket.

Dear Mr. Eugeny!

In our implementation Socket keppalive time are set.
This mean keepalive in this situation can and must take socket to disconnect state.

Moreover why occured this problem just and always after ~28000 succesfull connection?

Remark: To test this problem normally need near 3 month time because TCP timeout.

Best regards, kzsolt

Did you see that:

KA packet is transmittable after Sn_SR is changed to SOCK_ESTABLISHED and after the data is transmitted or received to/from a peer at least once.

Thus just establishing connection without sending any data (or sent data lost) will keep connection hung.

A coincidence? A packet is lost, something happened to client? It is too early blaming W5500 for being the cause of your issues.

The only way I see is turn off the auto-keepalive, and use regular SEND_KEEP command on the socket.

Dear Mr. Eugeny!

As I see (W5500 tell) socket is hang in established state.

The keep-alive work in a way where peer send an ACK with sqn=actual sqn -1. As response the distant peer respond with ack contain actual sqn at ackn.

RFC973 describe TCP protocol. There is no restriction to send ack just after receiving data. For example peer S can send it any time and peer C must respond it. You may think peer cannot send ack contain sqn less than initial sqn, but for this no restriction defined in RFC973.

Moreover RFC1122 at 4.2.3.6 provide additional standards for keep-alive. But there is no restriction to send keep-alive just after first data exchange.

Therefore it is a vendor specific implementation of a keep-alive. I tested at lab and W5500 work in this way. At socket server side this will keep connection hung if no TCP packet received on established socket. As result, keep-alive sometimes works for socket server sometimes not.

This need to add manual keep-alive to socket keep-alive mechanism. As I see SEND_KEEP have a same problem if sqn = intial sqn.

Other solution is (can?). If I detect inactivity on socket server, then I disconnect socket using “DISCON” socket command. This way (I hope) not require to resetup a socket.

As I see on my post I do not “blaming” everybody, I ask for related experience and solution (workaround).

kzsolt

Thank you for this information!

Seems logical way to design it.

You will have to close socket, and then reopen it for listening anyway. Should not be a problem.

No worries.

Dear Mr. Eugeny!

Unfortunately there is one more major problem.

Some professional protocol (what I W5500) is more complex than web. For this protocols client are open the socket at startup and keep it open. The real communication is just a quick query-response pair and can be start 10 min later than connection open. Only way to check this kind of connection is to use keep-alive. There inactivity is useless.

So I’m in trouble. But I never give up, and I try to invent a solution for.

Best Regards, kzsolt

Agreed!

This assumes you already know the type of protocol you will use. Does it allow you sending some heartbit (packet or data which will be discarded by the application)? Then you can use this dummy data as a heartbit.

Dear Mr. Eugeny!

Unfortunately this protocols are very simple and robust. Have client and server side. Client can request for information, server acknowledge it. There is no more protocol element at this level. Server has no chance to poll client.

I see one easy solution. I switch on socket keep-alive. As additional I poll every socket, and if socket established and no received packet on over 30 min, then I do chip reset. This is not elegant but easy and effective. For this solution enough to do initial chip and socket setup, and do not need to develop socket disconnect and resetup code.

Disadvantage of this solution is to block socket(s) over 30 min. Other disadvantage of this solution is to break other connected sockets. But this occurred less than one time in a month. This can be acceptable compromise.

Best Regards, kzsolt

Client may regularly perform some dummy request to server, when server will not be required to use much processing power and will answer with very simple message.

Not chip reset, but socket disconnect and then immediately close without waiting for auto-close by the client returning FIN.

Performing these disconnect, close, open and then listen is the correct way to proceed. If you will reset chip, all other sockets will reset too, and you will lose some time while chip is resetting and you set it up. During this lost time chip may be requested by the client.

Ensure you reproduce and test this scenario in test environment before going live.

Dear Mr. Brychkov!

Unfortunately client can originated from many different company. I have no permission to touch source code or physical hardware. Moreover I have no chance to trigger any modification in.

Looks like you are right. I need to develop a socket disconnect code. This is a mostly correct way.

kzsolt