UDP send 'stalled' until another UDP packet received

We’re running the ws5500 chip from a rp2040 processor using the RP2040-HAT-C library code. The host pc sends ‘Modbus formatted’ udp packets (~60 bytes on the wire) to the ws5500/rp2040 system for state updates and expects a udp reply from the sent packet. The library code ‘recvfrom’ and ‘sendto’ are used for udp transmissions. Normally, the reply packet is received ~1ms after sending (wire time + ws to rp transfer + rp processing + rp to ws transfer + wire time). If a reply packet is not received in 10ms, a ‘retry’ packet is sent. The WireShark traces for a retry event shows both replies. The first reply comes in ~85us after the retry request, the second one comes in at the normal reply time of ~1ms. This timing indicates the reply for the original request got ‘stuck’ in the ws5500 until another message came in. This event happens around once per 10,000 transactions (or once per 20min at our update rates).

Is there a method to ‘force’ the stuck udp packet to be sent?

Why do you think something is “stuck”? May it happen the original request was discarded by the W5500 (e.g. due to RX buffer overflow) or in software (due problems in data format)? You must match packets you see sent from the ‘host’ to the ones appearing on the wire and the ones displayed in the diag window of W5500 driving processor, and back to host.
At best you put marker into the packet to definitely identify what packet you get reply for (original request packet or retry request packet).

The reply time of 85us is indicative of the original reply data already being inside the WS chip. To get a reply though the entire cycle (wire time + ws to rp + rp processing + rp to ws + wire time) has never been below 500us.

It does not look like the packets were ‘lost’: there are 2 replies being returned after the ‘retry’ message is sent. One after ~85us (the one ‘stuck’ from the original msg?) and another after ~1-2ms later (normal round trip through the rp).

Here is Wireshark data from the host for a ‘retry’ event (highlighted) plus 2 ‘normal’ transactions (before/after the highlights). The host is at 192.168.0.20, the ws/rp board is at 192.168.0.50. There are not other devices on this network segment.

packet 1918 is a normal request, packet 1919 is the ws/rp reply in 509us (2nd column)
packet 1920 is a request, there was no response after 10ms, packet 1921 is the retry packet.
packet 1922 is received 83us after the retry (proposed ‘original stuck reply’ packet in the ws?)
packet 1933 is received 1.8ms later (nominal processing for full trip through the rp, response to the retry packet?)
packets 1924/195 shows a following ‘normal’ transaction 2 ms later, 853us for full round trip through the rp.

image

The data you provide is insufficient. It is not possible to prove or disprove, in other words troubleshoot the problem. In addition to Wireshark log, you also need to show the logs of processor with microsecond timestamps of executing the SEND command to the involved socket.

Agreed, having a log of the send timing would provide a more complete picture and I’ll work on producing that log. However, it seems unlikely that anything other than the ws chip can get data back out on the wire in 80us considering all the other replies involving the rp2040 are normally in the 500-1500us range.

I just completed running a test setup for 23hrs, 3.1M send/reply transactions, with only 289 ‘retry’ events. Whatever is going on has a very low percentage rate but happens on average every 5 mins.

Thanks for looking and providing suggestions!

Due to other factors, I will not be able to provide further information for a couple of weeks. I will provide the requested logs asap.