W5300 delay in sending real-time DATA packets over TCP socket

We are seeing an issue with the W5300 device where during a transmission of DATA packets in TCP mode, it seems that the Wiznet temporarily stops sending packets that have been copied to the TX buffer through the TX FIFO register.

We are using the W5300 device to send DATA packets to a PC over an ethernet LAN at 850Hz. Each DATA packet is 348 Bytes. This requires 348 * 8 * 850 = 2.4Mbps of throughput, well within the spec of the W5300. A hardware interrupt at the 850Hz rate triggers a handler where instrumentation data is sampled, the DATA packet is assembled and the packet is sent over a connected TCP socket on the Wiznet.

The TCP socket TX buffer is configured as 8KB (0x2000). The send routine is implemented as in the example code in the DS and in the ioLibrary. I am checking Sn_FSR and waiting for the Sn_CR to clear the SEND command at each send.

At some point in the transmission (sometimes at 10min, other times 30min, 2hrs or several hours) there is a delay in the transmission of packets that my application is not able to recover from (due to the real-time DAQ nature of the application, dropping frames is not an option). I have captured the event in Wireshark and the characteristics are always similar. Immediately preceeding the event, transmission seems normal: The Wiznet is receiving one ACK from the remote peer for each two DATA packets sent, thus the RTT for the two DATA packets alternate between 50us for the second DATA packet in a pair and and 1.2ms for the first. This is expected with a 850Hz transmission rate as the time delta between the DATA packets being copied into the TX buffer should be roughly 1.17ms.

During the event there is a delay in DATA packets being seen on the wire. Sometimes it is small (~3ms) and recovers but eventually the delay is large enough and the application stops. Immediately after the delay several DATA packets (and respective ACKS) are seen on the wire in rapid succession. I have modified the DATA packet to contain the value of Sn_SR (including the reserved MSB) and the Sn_TX_FSR register during the transmission (the register values are obtained just prior to the send() routine that copies the DATA packet into the TX buffer). Each packet shows a Sn_SR of 0x0117
(SOCK_ESTABLISHED and the 0th bit of the reserved MSB). The Sn_TX_FSR is at MAX or MAX-len freesize for the first delayed DATA packet where MAX = 8KB and len = size of one DATA packet. Each successive packet on the wire shows Sn_TX_FSR shrinking by len until it reaches 0x0b9c (2972 Bytes) after 15 DATA packets have been “sent”. The 16th DATA packet shows Sn_TX_FSR as 0x0000 where Sn_SR 0x0917 (SOCK_ESTABLISHED and the 3rd and 0th bits of the reserved MSB).

  • No retransmissions are seen on the wire since ACKs are eventually seen by the Wiznet before the RTO expires, but this is not fast enough for the real-time application which is always producing more DATA frames to send.

  • The TCP receive window on the receiving host is ~64KB and does not shrink before or during the event and all ACKs, once returned, are seen on the order of tens of usecs after the preceeding DATA packets.

  • I usually run the test with a desktop ethernet switch between the my sending device and my PC but I can reliably reproduce the issue with a wire connectied directly to the PC.

Are there any other registers or indicators that I should look into to continue debugging this issue? Any help would be appreciated.

Thanks

my send() source code here:

int diag_ntwSendBin(short nFd, unsigned short *buf, unsigned long len)
{
    uint8 tmp = 0;
    uint32 freesize = 0;
    uint16 i;

    // check socket
    tmp = getSn_SSR(nFd) & 0xff;
    if ((tmp != SOCK_ESTABLISHED) && (tmp != SOCK_CLOSE_WAIT))
        return SOCKERR_SOCKSTATUS;

    // check previous send
    if (sock_is_sending & (1<<nFd))
    {
        tmp = getSn_IR(nFd);
        if (tmp & Sn_IR_SENDOK)
        {
            setSn_IR(nFd, Sn_IR_SENDOK);
            sock_is_sending &= ~(1<<nFd);
        }
        else if (tmp & Sn_IR_TIMEOUT)
        {
            return SOCKERR_TIMEOUT;
        }
        else
            return SOCK_BUSY;
    }

    while (1)
    {
        freesize = getSn_TX_FSR(nFd);
        tmp = getSn_SSR(nFd) & 0xff;
        if ((tmp != SOCK_ESTABLISHED) && (tmp != SOCK_CLOSE_WAIT))
            return SOCKERR_SOCKSTATUS;
        if ((sock_io_mode & (1<<nFd)) && (len > freesize))
            return SOCK_BUSY;
        if (len <= freesize)
            break;
    }

    // copy new data to internal TX memory
        for (i = 0; i < (len / 2); i++)
    {
                *((volatile unsigned short *)Sn_TX_FIFOR(nFd)) = *buf;
                buf++;
        }

    // set WRSR
    setSn_TX_WRSR(nFd, len);

    setSn_CR(nFd, Sn_CR_SEND);
    while (getSn_CR(nFd));
    sock_is_sending |= (1<<nFd);

    return len;
}

I did not program W5300, but it seems you skip one important step:

/* check previous SEND command completion */

if (is first send ?) ; /* skip check Sn_IR(SENDOK) */
else
{
    while(Sn_IR(SENDOK)==‘0’)
    {
        if(Sn_SSR == SOCK_CLOSED) goto CLOSED state; /* check connection establishment */
    }
    Sn_IR(SENDOK) = ‘1’; /* clear previous interrupt of SEND completion */
}

I think in your case you try to launch another SEND process while previous did not finish yet.

While TCP is reliable communication, sometimes it may not be up to speed, and this is what you see. It may be issue with/caused by any network device, and it is a reality. For high loaded networks you will see even worse picture with bigger delays and other transmission problems.

If dropping frames is not an option, then you must check for SEND command complete as described in datasheet, and store pending data in MCU’s buffers, implementing simple memory management so that when SEND command finally completes you send all waiting data at once (as an example, then there will be larger packets than 348 data bytes). Timing will not be accurate, but at least no data loss.

If dropping of sampling data is allowed and timing is more important, then you’d better use UDP. It does not require ACK, but your application will not know if frame was delivered and when it happened.

Eugeny, thank you for your response.

The following code is intended to check the Sn_IR that the SEND cmd has completed and then clear the SENDOK bit:

tmp = getSn_IR(nFd);
        if (tmp & Sn_IR_SENDOK)
        {
            setSn_IR(nFd, Sn_IR_SENDOK);
            sock_is_sending &= ~(1<<nFd);
        }

I should add that the LAN that I am using to transmit the DATA has only two hosts (my PC and the MCU device) so I feel that it could not be characterized as a “high loaded network”.

I think that unless I find some other root cause of the issue my next step would be to buffer up frames in the MCU as you suggested when I detect the event starting to occur then send them in larger packets in hopes of recovering from the delay.

I see now, apologies.

May it happen that interrupt occurs when previous service did not finish yet, and you just corrupt the workflow? Do you disable interrupts in the ISR?

This is possible as the ISR is complex and I am sampling from 16 ADC channels (4 at a time) and I am disabling / enabling interrupts so as not to reenter the ISR handler function.

Perhaps what I need to do as a test would be to attempt to send similar sized packets at the same rate from a large static file to see if the issue is with the W5300 or as a result of the ISR handling.

You just set flag that you are currently in ISR, and check it on the entrance and log event that you have entered ISR if ISR flag is set.

Update: I propose you to change architecture of the software. You log data using ISR into MCU’s RAM, and in main loop send accumulated data when socket becomes ready. Most of the times it will send 348 bytes packet, but in case two or more interrupts occur in between, you send 348 * number of interrupts happened at once.