Corrupt or incomplete UDP packets with W5500

Hi there,

I’m using an ESP32 with ESP-IDF and a W5500 module to receive ArtNet packets streamed over Ethernet. The SPI interface is running at 20Mhz and each of two sockets is set to 8K.

While I’m able to get some packets, it seems the very frequently UDP packets are received incomplete or corrupt, which results in wrong values in the UDP packet header and the addressing to get our of sync with actual packet starts and ends.

I’m quite to using the W5500 but tried to follow the datasheet and examples I saw in both the Arduino Ethernet library and your IoLibrary on GitHub.

My reading code is quite simple and follows your example’s workflow for UDP:

esp_err_t EthernetDriverClass::read(int socketIndex, uint16_t port, uint8_t* buffer, size_t length, size_t* received, in_addr_t* remoteIp) {
	if(this->sockets[socketIndex] == -1) {
		ESP_LOGE(ETH_TAG, "Attempted to read from closed socket");
		return ESP_FAIL;
	}

	// take semaphore
	W5100.lock();

	/*
		1. Check available data
	*/

	uint16_t rd = getSnRX_RSR(socketIndex);

	if (rd > 0) {
		ESP_LOGI(ETH_TAG, "Available: %d", rd);
	} 

	rd %= SSIZE;

	// // nothing to do, no new data
	if (rd == 0) {
		W5100.unlock();
		return ESP_OK;
	}


	/*
		2. Read packet header
	*/

	uint8_t header[8];
	
	// read packet header from current RX address
	// read_data(socketIndex, state[socketIndex].address, (uint8_t*)&header, sizeof(header));
	socketRead(socketIndex, (uint8_t*)&header, sizeof(header));


	/*
		3. Parse UDP packet header
	*/

	// get remote IP
	memcpy(&state[socketIndex].remoteIp, (uint8_t*)&header, 4);

	// parse UDP payload size
	uint16_t payloadSize = (header[6] << 8) + header[7];


	/*
		4. Read data
	*/

	// check for invalid packets
	if (payloadSize == 0 || payloadSize > SOCKET_MAX_PAYLOAD) {
		ESP_LOGI(ETH_TAG, "Invalid packet: %d, flushing", payloadSize);
		ESP_LOGI(ETH_TAG, "Payload: %d", payloadSize);

		// state[socketIndex].address += rd;
		payloadSize = 0;
	} else {
		// read_data(socketIndex, state[socketIndex].address + sizeof(header), buffer, payloadSize);
		socketRead(socketIndex, buffer, payloadSize);
		// state[socketIndex].address += sizeof(header) + payloadSize;
	}
	

	// write out data and remote IP
	*received = payloadSize;
	*remoteIp = state[socketIndex].remoteIp;


	/*
		Update RX address
	*/

	// apply mask
	// state[socketIndex].address = state[socketIndex].address & SMASK;


	// give semaphore
	W5100.unlock();

	vTaskDelay(1);

	// If we get here, there's no data available or recv failed
	return ESP_OK;
}

with socketRead implemented as:

void EthernetDriverClass::socketRead(int socketIndex, uint8_t* buffer, size_t length) {
        uint16_t ptr;
        ptr = W5100.readSnRX_RD(socketIndex);
        read_data(socketIndex, ptr, buffer, length);

        ESP_LOGI(ETH_TAG, "Reading: %d from %d", length, ptr);

        ptr += length;

        W5100.writeSnRX_RD(socketIndex, ptr);
        W5100.execCmdSn(socketIndex, Sock_RECV);
    }

With this code however, after a short amount of time (and variable number of packets received) the data seems to get either corrupted or incomplete, and the UDP headers are wrongly parsed.

Here is a log of a stream of equally sized ArtNet packets, each packet contains 512 bytes of data and 18 bytes of header, and the UDP header is 8 bytes:

I (10070) ETHERNET: IP address: 192.168.2.7
I (10120) NETWORK: Got IP
I (10124) ETHERNET: Available: 1076
I (10124) ETHERNET: Reading: 8 from 0
I (10124) ETHERNET: Reading: 530 from 8
ARTNET
I (10127) ETHERNET: Available: 538
I (10130) ETHERNET: Reading: 8 from 538
I (10135) ETHERNET: Reading: 530 from 546
ARTNET
I (10158) ETHERNET: Available: 1076
I (10158) ETHERNET: Reading: 8 from 1076
I (10158) ETHERNET: Reading: 530 from 1084
ARTNET
I (10162) ETHERNET: Available: 538
I (10165) ETHERNET: Reading: 8 from 1614
I (10170) ETHERNET: Reading: 530 from 1622
ARTNET
I (10176) ETHERNET: Available: 2048
I (10179) ETHERNET: Reading: 8 from 104
I (10184) ETHERNET: Reading: 1 from 112
I (10190) ETHERNET: Available: 3115
I (10192) ETHERNET: Reading: 8 from 113
I (10197) ETHERNET: Invalid packet: 0, flushing

The first few calls receive all data correctly, with the first available call returning 1076 (538*2) and several subsequent calls also working well. The reading is split to reading the UDP header and following up with another read of the UDP datagram’s size which is 530 on all successful calls.

Notice how in the end of the log, available reports 2048, which is incorrect and is not divisible by 538, meaning data is either missing or the report is wrong?

From then on the reading loses track of the actual UDP datagrams in the buffer and tried to read headers from wrong places and can’t recover unless I reset the program or the sockets.

This happens within seconds of the program staring to run, but not in consistent places. As you can see from the pointer in the log, the datagram size is wrong when parsed, 1 then 0, all random numbers.
The issue doesn’t seem to be related to the buffer wrap-around, which was my original assumption and I’m at wit’s end in trying to figure this out.

Would appreciate your help!

Edit:
Just wanted to add some additional details:

  1. this seems related to UDP packet RX pointer not at header but I seems to be unresolved there as well.

  2. Resetting the socket by closing and opening it extends runtime but happens very frequently and causes the W5500 to eventually freeze and not be able to read until the board is power-cycled.

Running wireshark for a few minutes showed exactly the same packet size being send repeatedly:

217 0.267572 192.168.2.30 192.168.2.255 DMX Channels 572 ArtDMX (0x5000)

length is always correct at 572 and it’s the only packet sent on this interface, which eliminates any extraneous reasons for data being off when parsed.

Another look at the logs reveals a potential source for this issue, notice the changing numbers logged in available data, current RX pointer and parsed UDP packet size:

I (12923) ETHERNET: Available: 6994
I (12924) ETHERNET: Reading: 8 from 416
I (12930) ETHERNET: Payload: 530
I (12933) ETHERNET: Reading: 530 from 424
I (12940) ETHERNET: Available: 6456
I (12941) ETHERNET: Reading: 8 from 954
I (12947) ETHERNET: Payload: 530
I (12950) ETHERNET: Reading: 530 from 962
I (12958) ETHERNET: Available: 416
I (12959) ETHERNET: Reading: 8 from 1492
I (12964) ETHERNET: Payload: 769
I (12967) ETHERNET: Reading: 769 from 1500
I (12975) ETHERNET: Available: 7831
I (12976) ETHERNET: Reading: 8 from 2269
I (12982) ETHERNET: Payload: 0
I (12986) ETHERNET: Available: 7823
I (12989) ETHERNET: Reading: 8 from 2277
I (12994) ETHERNET: Payload: 0
I (12999) ETHERNET: Available: 2313
I (13001) ETHERNET: Reading: 8 from 2285
I (13007) ETHERNET: Payload: 3601
I (13010) ETHERNET: Invalid packet: 3601, flushing

The first few reads return available numbers that are all divisible by 538, and result in correct packet parsing - 8 bytes of header and a parsed payload size of 530. Packets are parsed one at a time so you’d expect numbers to increment slowly by 8 and 530 on the pointer, and decrease as slowly on the available side.

However, timestamp 12958 reveals a wrong available data value returned, 416, despite the code correctly reading only 530 bytes in the line before, the available data dropped from 6456 to 416, and corrupted packets follow - payloads of 769, 0 3601, resulting in flush since it catches that the wrong value finally exceeds the max packet size.

The available value is returned following the ioLibrary:

static uint16_t getSnRX_RSR(uint8_t s) {
        uint16_t val, prev;
        prev = W5100.readSnRX_RSR(s);
        while (1) {
            val = W5100.readSnRX_RSR(s);


            if (val == prev) {
                return val;
            }

            prev = val;
        }
    }

and increment using similar logic to the library adjusted for 8K buffers, which works most of the time when there is less data throughput so I imagine the logic is sound.