Link is stable for about 2 hrs and then jams up

Hi,

Can someone please assist with some ideas etc etc.

Ethernet link for W5100 is stable for about 2 hrs and then jams up, requiring a reboot.

Any ideas welcome. I mostly use a stripped down version of the development kit for W5100.
Not sure if this is a hardware issue. Thinking more it is a firmware issue.

This needs further clarification.

Link LED satate when it happens?
Any activity shown by RX/TX LEDs?
Does remote device see port up?
What is the temperature of the chip?
Network protocol you use?
Does software try to reopen and reconnect (TCP) / reopen socket (UDP)?
What wireshark capture says? Share logs.

Hi,

Thanks for the quick reply. Unfortunately I don’t have access to that information yet.
Ordering parts etc to try simulate problem client is having…
We had some heat issue on regulator used back when developing so possibility exist that it could be heat related issue however not sure why this will only cause a problem after 2 hours ?

Thinking it might be a firmware issue but not sure. Firmware from dev kit used…

Could it possibly be some grounding issue ? or other hardware issue ?

Would you mind carefully looking/verifying W5100 Schematic.pdf (35.0 KB)
attached circuit please ?
I am also carefully looking at the circuit again…

I do not see anything immediately wrong except reset circuitry. I do not think it will affect device while already in operation, but I would anyway rewire reset pin to the digital signal able to provide at least some current (with 10K resistor it is max of 33 uA considering internal reset pin shorted to ground - which is actually wrong, thus current even less than that). See here the part called " Bug 3 – The Funduino Reset Bug" where it explains issue with reset. It seems that circuit assumed in this article is missing capacitor, which is totally wrong; however who knows, just an idea out of impossible - when chip heats (and it does), it causes excessive current flowing through its reset pin, and thus your C21 discharges up to reset signal being out of spec, and of course circuit stops working properly. Again, very unlikely, but who knows.

Thank you !.

I will look at that.

Please let me know if you think of anything else…

Changing 10k resistor to 1k will allow 3.3/1000=3.3mA. Will that be enough ? Or do I drop the 10k and cap at reset pin 59 and only add the 10k with shmidt trigger at pin 29 and 31 ?

The reset circuitry must provide proper reset timing (see datasheet), provide required logical voltage, and some current in case there’s any current leakage. At first place you can leave your circuit based in RC, but as you proposed put schmitt trigger in between (powered from 3V3D). Then you can be sure voltage will be enough; but anyway must check reset timing at the trigger output with scope to ensure its duration meets specifications. Also you must check that your code does not start configuring chip immediately, it must wait for some time (e.g. 100 ms) after chip is brought out of reset before trying accessing it. In the best case you also perform software reset, and wait for its completion polling bit 7 of MR.

Hi Eugeny,

Hope you are well ?

So our system is simple and the controlling MCU receives single byte commands via W5100 and then responds by sending a single byte back via W5100.
I am using your development kit code with most of the code not used and commented out.
In the for loop statement almost all is commented out except this case:
case LB_TCPS:
register_channel_handler(i,loopback_tcps);
break;

So the problem our customer is having is that after about 2 hours commands via Ethernet does not work any longer. They can however still ping the w5100. After they reboot it works again for about 2 hours and so on…

So I am trying to simulate the problem:

  1. Left everything running for 24hours regularly sending commands every hour or so as well as leaving it
    running for hours. All works well.
  2. Repeated the above in 1. at about 80degrees celcius. All works well.

So now I am thinking maybe it is overloaded with commands being sent very fast. Is this a possibility ?

So I am sending the command 1 000 000 times via RealTerm to see if it can break. But I am not sure if I am breaking RealTerm or the product. After some time when RealTerm finished sending etc (if it survives, sometimes RealTerm hangs and needs to be stopped as I think it’s own buffers overflow) I do not see the response byte coming back when sending a single byte via RealTerm.
I do however see the Tx and Rx LED’s flash on the W5100 device so I am thinking the byte must be received by the MCU and send back or else the LED’s would not both flash ?
I then close and re-open my RealTerm connection (without any power reset on electronics) and all works well again.

Any advise or tips ??

Do you perhaps have some software where I can open ip address via a port and send repeatedly single bytes and display received bytes ?

Should be not mine because I write code in another language :slight_smile:

Do I understand your correctly that you applied recommendations and it still behaves the same “wrong” way?

I consider probability of fault at W5100 side is much higher than at the side of PC with software implemented network stack.

This is very good observation. What you must do is to run Wireshark and capture what is going on the network. I suspect W5100 thinks that socket is closed (reason can be any - e.g. its socket timeout) and W5100 responds with RST packets.

With terminal application reconnecting to the W5100.

You already have it and able to reproduce the issue. Use Wireshark to capture the conversation and share the log here.

I mean the W5100 Development kit code…

No, I build a board the same as was send to customer. So trying to simulate their problem first before applying recommendations.

Ok will try wireshark.
Can you elaborate on socket timeout ? is this a timer on the MCU ? If this is the problem how do I fix it ?. Also why would the socket timeout while sending data continuously ?

Ok this is what I have done so far, maybe this explains better and gives another clue…
I will then try wireshark…
Before any commands are send (nothing send)
Link LED is mostly on (flashes off regularly)
Rx LED flashes on regularly.

When sending one byte commands by hand via RealTerm:
Link LED flashes off, Both Rx and Tx LED’s flashes
RealTerm picture 01, Green byte is what I send, yellow is what received back.

When sending 1000000 times the same command via RealTerm:
Link LED flashes continuously.
Rx and Tx LED’s flashing fast and looks on.
RealTermPicture 02: Not sure what RealTerm is doing. Only green bytes send showing.

Once finished with 1000000 times send I manually send single bytes again:
Linkn LED flashes off, Both RX and Tx LED’s flash when send. But RealTerm not showing
any received command in yellow.
RealTerm picture 03.

I then close and open connection in RealTerm (picture 04) and it continuous working (picture 05) without having to hard or soft reset MCU etc.

Socket timeout occurs when W5100 waits for some response from another device, but does not get it. There’s relatively complex formula on it in the datasheet. If W5100 does not wait for any response, it should not experience timeout unless your code driving W5100 will perform some actions on the socket or W5100 will get something from remote device (e.g. RST or FIN packet).

These LEDs flash if W5100 receives packets (I assume those targeter for it - but I am not sure, it may blink for any activity on the network). Regularly means once persecond, or constantly?

Normal.

W5100 receives something, but it is not a data respose. That’s why you need wireshark to see what it actually receives. You have status pane to the right - does it still say you are connected? What is the status?

When I manually send commands by clicking RealTerm to send one byte commands even after bashing it 1000000 with RealTerm repeat setting…in other words click to send byte and wait to see if received back. Click again…I see the Tx and Rx LED’s only flashing when I click…

Also I only have my PC connected to the W5100 product. No other devices connected. So when I don’t send anything and just look at the LED’s all that happens is that every about 5 seconds the Link LED dims briefly and the Rx LED flashes quickly. The Tx LED is just staying off.

So if I then click to send single byte commands manually then the Tx and Rx LED’s both flashes every time I click to send.

So I also am able to set my W5100 product to not send a response byte back when a byte is received.
So what I also see is that when I send a byte to the W5100 product every time I click, there is a small delay before the Tx LED flashes (so I guess it is some ethernet protocol stuff). When I enable the return byte to be send on my W5100 product that delay is gone and the Tx LED flashes instantaniously with the Rx LED. So it seems the return byte is flashed for and then some additional Ethernet stuff…

It is very good - means that your wireshark installed on PC will show everything in its log.

There’re some other type of packets travelling the wire into W5100 direction, and it is normal.

How small? The delay mostly depends on the algorithm handling W5100, and maybe setting of delayed ACK.

Stop guessing, run Wireshark.

Ok ok…

So here is the wireshark file I safed and will try figure out…

0 to 20 seconds
I send single commands via realterm. So single byte send and single byte should be received back.

20 to 30 seconds
Left it by itself

30 to 93 seconds
Send single byte and single byte should be received back via realterm 1000000 repeat.

93 to 105 seconds
Left it by itself

105 to 130 seconds
Again I send single commands via realterm. So single byte send and single byte should be received back.

130 to end seconds
Left it by itself

Can’t share the wireshark file…how do I share this ?

People used to rename it with another extension. Change to PDF. Or I am sure ZIP archive will attach properly.

All.zip (438.0 KB)

Seem like it is ok ?..Command I am sending from PC (192.168.20.3) is 0x70 hex
and byte should receive back from W5100 (192.168.20.25) is 0x70 hex

The data flow seems strange. Where’s TCP handshaking at the beginning? I do not see it after PC (.3) sends ARP request to know MAC address of W5100 (.25).

Do you use W5100 socket in TCP mode?

At the end of the log it looks like PC simply stops sending data. No terminations to the connection, no errors, just nothing.

I Think so…

Using code from W5100 development board… main function:

int main (void)
{
// u_char key;
SOCKET i;

MCU_Init();

evb_init();

// check_manage(); /* administration mode check */
init_timer();

for (i = 0; i < MAX_SOCK_NUM; i++)
{
	switch(ChConf.ch[i].type)
	{

// case NOTUSE: unregister_channel_handler(i);
// break;

// case DHCP_CLIENT: //PRINTLN1("%d : DHCP Client Start.",i);
//// evb_set_lcd_text(0,(u_char*)"< DHCP CLIENT >");
//// evb_set_lcd_text(1,(u_char*)" Wait a minute “);
// get_netconf(&NetConf);
// memcpy(SRC_MAC_ADDR,NetConf.mac,6);
// init_dhcp_client(i, evb_soft_reset,evb_soft_reset);
// if(!getIP_DHCPS())
// {
//// evb_set_lcd_text(1,(u_char*)” Fail to get IP ");
//// PRINTLN(“Fail to get a IP adress from DHCP server”);
//// PRINTLN(“Apply the default network information!!!”);
// ChConf.ch[i].type = NOTUSE; // Disable DHCPC;
// unregister_channel_handler(i);
// wait_10ms(100);
// }
// else
// {
// NetConf.sip = ((u_long)GET_SIP);
// NetConf.gwip = ((u_long)GET_GW_IP);
// NetConf.sn = ((u_long)GET_SN_MASK);
// NetConf.dns = ((u_long)GET_DNS_IP);
//
//// PRINTLN(“Get network information from DHCP Server…”);
// register_channel_handler(i,check_DHCP_state);
// }
//// set_netconf(&NetConf);
// break;
case LB_TCPS:
register_channel_handler(i,loopback_tcps);
break;
// case LB_TCPC:
// register_channel_handler(i,loopback_tcpc);
// break;
// case LB_UDP:
// register_channel_handler(i,loopback_udp);
// break;
// case WEB_SERVER:
// register_channel_handler(i,web_server);
// break;
default :
break;
}
}

net_init();
evb_logo();

while(1)
{
	//Read any data coming in.
	UART0_getData();
	UART1_getData();

	//Do ethernet stuff
	for (i = 0 ; i < MAX_SOCK_NUM ; i++ )
	if(ChannelHandler[i].Handler) (*ChannelHandler[i].Handler)(i);

	//Process a packet from UART receive buffers.
	UART_Rx_Process();

	//Check what is going on with the hard lineS
	DF_Switch_HardLine();

	DF_Switch_Update_ShiftRegisters();

	//for ISR test

// if(uart_keyhit(0))
// {
//
// key = uart0_getchar(NULL);
// if(key == ‘c’)
// {
// for(i=0;i<4;i++) IINCHIP_WRITE(Sn_IR(i), 0xFF);
// IINCHIP_WRITE(IR, 0xFF);
// PRINTLN(“all interrupt register cleared!!!”);
// }
//
// }

	asm("WDR");
}

}

I am surprised that this code works at all. Did you try to understand how it works?

Did not have too much time…

Any suggestions ?