WIZnet Developer Forum

W5500 get stuck on connect(), but only sometimes (infinite loop stuck on SOCK_SYNSENT)

I have an issue with my board based on W5500. I have successfully implemented a TCP client that sends an HTTP message and then waits for a response. The behavior, however, is quite puzzling. Just about half the time everything works perfectly, while the other half - the TCP socket opens, connects without error and then hangs on send().

There is no return from that function, and it seems like the program is stuck in some sort of infinite loop.

When checking wizchip_gettimeout, I have a value of 2000 for timeout and 8 for retries. If I’m not mistaken 2000 means 0.2 seconds?

I’m not sure why it hangs, and why only half the time, and not sure where to start debugging.

Any help would be greatly appreciate.

First check the circuit. Ensure that you used 49.9 Ohm resistors and not 49.9 kOhm resisotrs (for example). My suspicion is that device is having difficulties finding the simeslot on the TX wire to send the packet, and hangs on the loop CR clearing to 0. Try connecting W5500 to another network device (e.g. replace hub/switch, change port of the hub/switch). But it must be proven: compile the code with check points in the send() function (outputting some characters to the diagnostic console or anywhere else where you can see them) to find out where code gets stuck.

The circuit looks ok, I will check of course, and also I will do check points. But it seems weird to me that DHCP and DNS are 100% reliable, not a single failure. I would imagine a hardware error would affect that too, no?

I can lease IP, and resolve DNS 100% of the time. It’s send() for TCP that gets stuck.

Yeah, the circuit looks fine. 49.9 Ohm resistors are proper value.

It’s TCP that has a 50% failure rate… I’m gonna try to put checkpoints in send().

Am I wrong to assume that a hardware issue would result in failures with DHCP leasing and DNS resolution as well as TCP? DHCP/DNS work absolutely flawlessly.

Did you use the ioLibrary code from WIZnet (https://github.com/Wiznet/ioLibrary_Driver)?

Yup, I did use the ioLibrary code from that GitHub repo! As I said, the behavior is super odd to me, because DHCP works 100%, DNS works 100% , but TCP hangs 50% of the time. I’m still debugging it, but was wondering if someone knows where to point me.

In my understanding if there was something wrong with the physical TX line, DNS & DHCP would hang just as much, am I wrong?

You’re right.
very odd…Would you please give the exact line number you get stuck?
(free size check? before CMD_SEND? or after CMD_SEND? )

So, after some debugging turns out it gets stuck on socket connect() and not send(). Specifically, it gets stuck inside this loop:

while(getSn_SR(sn) != SOCK_ESTABLISHED)
		if (getSn_IR(sn) & Sn_IR_TIMEOUT)
			setSn_IR(sn, Sn_IR_TIMEOUT);
            return SOCKERR_TIMEOUT;

		if (getSn_SR(sn) == SOCK_CLOSED)

That’s the spot it ends up stuck at, about 50% of the time. (the rest of the time everything works great).

getSn_SR seems to be stuck on SOCK_SYNSENT, and never changes to anything else. Which is weird, because I have a value of 2000 for timeout and 8 for retries. So at the very least it should timeout?

This is the relevant code in my application:

// create a TCP socket
	if ((sck_status = socket(HTTP_DATA_SOCKET, Sn_MR_TCP, 0, 0)) == HTTP_DATA_SOCKET) {
		// connect to the server
		if ((con_status = connect(HTTP_DATA_SOCKET, (uint8_t*) dest_ip, (uint16_t) dest_port)) == SOCK_OK) {
			if ((send_status = send(HTTP_DATA_SOCKET, (uint8_t*) msg, strlen(msg))) != strlen(msg)) {
				printf_P(PSTR("[error] unable to send: %" PRIi32 "\r\n"), send_status);
		} else {
			printf_P(PSTR("[error] connection not available: %i\r\n"), con_status);
	} else {
		printf_P(PSTR("[error] unable to create socket: %" PRIi8 "\r\n"), sck_status);

When it works, it works flawlessly, but when it doesn’t ,it reliably gets stuck on connect, being stuck on SOCK_SYNSENT.

Do you have timeout in your SPI read/write driver?

How about checking firewall setting, port number and server application status on your test server(PC)?
I think, there’s no reply “SYN+ACK” packet from your test server(PC).

For these values TCP timeout will be 31.8 seconds, see RCR section of the datasheet, it has this value calculated. You just need to wait a little more :slight_smile:

As @Bong said it is most probably that some device located on the packet path filters out or loses the SYN/ACK return packet.

1 Like

I saw a similar issue while developing code for the W5500 and used WireShark to debug it. I found the problem was that the W5500 wasn’t seeing ARP responses from the wireless device (an old Linksys router running DD-WRT as a bridge) I was trying to connect to. I saw the ARP request from the W5500 repeated over and over, but there was no response. Never figured out if the problem was the LinkSys router or the ASUS-WRT Merlin-based router it was connected to. Sometimes it worked just fine, other times it didn’t. I’ve never seen this with any other device.

Copyright © 2017 WIZnet Co., Ltd. All Rights Reserved.