W5500 MQTT and TLS

I’m trying to setup a secure MQTT connection using a W5500 and I’m having some problems. The MQTT client works perfectly using an unsecure connection, but as soon as I try to connect over TLS, it looks like I don’t receive any data at all from the broker after the connection is established and my MQTT connection is closed by the broker after the keepalive timer (I don’t receive the ping from the broker so I can’t reply to the keepalive request).

Messages sent from the broker also don’t come in. What confuses me is that the TLS handshake is successful so this means that the module actually receives data but it looks like when the TLS connection is established, the data stops coming in. The connection to the MQTT broker is also successful as I can send messages while the connection is active (and I receive those messages on MQTT explorer on my PC so I’m sure it works).

To enable TLS on the W5500, I followed the example I found on

But I’m clearly missing something and I can’t find what it is.

I’m using an STM32F429ZI MCU with a WIZ850io module. I’m working on STM32CubeIDE (eclipse).

Does anyone have a complete W5500 MQTT + TLS example they don’t mind to share?

Any help would be greatly appreciated.

Thanks.

Where broker is running - on the PC? First I would confirm that secure channel (socket) is actually being set up successfully, for this you can use Wireshark running on the broker side. It would help to analyze the whole packet exchange to see what is wrong.

This is confusing. Are you talking about unsecure connection? Or you connect to broker with PC and this connection is successful?

Broker is paid hosting at cloudmqtt, using mosquitto 1.5.7. I’ll setup a local broker and see if I get different results. I will also install wireshark to get some readings because all I have right now is a serial debug terminal.

Some clarifications:

When using an unsecure connection (port 1883), I have no problem at all. My IoT device can send and receive messages to/from the broker and the connection doesn’t drop. It works as expected.

When I use a secure connection on port 8883:

My PC can connect to the mqtt broker using TLS so my certificates are setup properly.

My IoT device can open a connection to the broker using TLS. When I run the wiz_tls_connect() from the github example, I receive the following debug message:

init connect[1] 
  . Performing the SSL/TLS handshake...
=> handshake

And then I get about 300 lines of debug messages about the ongoing handshake process, the exchange of keys, random numbers, etc and it ends like so:

<= handshake wrapup
<= handshake
ok
    [ Ciphersuite is TLS-RSA-WITH-AES-256-CBC-SHA256 ]
Connected to Socket 0

I can post the entire debug message if it’s useful, but all I read seems normal, I get no error messages up to this point

After that I get to run MQTTConnect() and that’s where it fails. My debug messages show some data is being sent and received (about 40 lines of debug messages, no error messages), and after 5 seconds I get:

MQTTConnect() returned: 255
Connection closed

The CONNACK is never received and MQTTConnect() times out and returns 255.

At this point I only have server side certificates enabled. My broker doesn’t require clients to use certificates. I’ve read I can still use username and password to login to the mqtt broker while using server certificates only for authentication. My connectData structure uses the same ID, username and password as my unsecure connection. Is this why it fails? Because on my broker I get the following messages in the log:

2022-10-05 12:15:09: New connection from ***.***.***.*** on port 8883.
2022-10-05 12:15:15: New client connected from ***.***.***.*** as 002200375456500320383743 (c1, k60, u'********').
2022-10-05 12:15:18: Client 002200375456500320383743 disconnected.

(IP and username were hidden with ***, I get the correct info there, Client ID is a random number)
So the broker gets the connection info, the username/password are accepted and connection is dropped from the client side (no CONNACK timeout)

Thanks for the help

Looked into the Mosquitto 1.5.7 sources:

/* Error values */
enum mosq_err_t {
	MOSQ_ERR_CONN_PENDING = -1,
	MOSQ_ERR_SUCCESS = 0,
	MOSQ_ERR_NOMEM = 1,
	MOSQ_ERR_PROTOCOL = 2,
	MOSQ_ERR_INVAL = 3,
	MOSQ_ERR_NO_CONN = 4,
	MOSQ_ERR_CONN_REFUSED = 5,
	MOSQ_ERR_NOT_FOUND = 6,
	MOSQ_ERR_CONN_LOST = 7,
	MOSQ_ERR_TLS = 8,
	MOSQ_ERR_PAYLOAD_SIZE = 9,
	MOSQ_ERR_NOT_SUPPORTED = 10,
	MOSQ_ERR_AUTH = 11,
	MOSQ_ERR_ACL_DENIED = 12,
	MOSQ_ERR_UNKNOWN = 13,
	MOSQ_ERR_ERRNO = 14,
	MOSQ_ERR_EAI = 15,
	MOSQ_ERR_PROXY = 16,
	MOSQ_ERR_PLUGIN_DEFER = 17,
	MOSQ_ERR_MALFORMED_UTF8 = 18,
	MOSQ_ERR_KEEPALIVE = 19,
	MOSQ_ERR_LOOKUP = 20,
};

and 255 must be byte representation if the MOSQ_ERR_CONN_PENDING. If I am not mistaken returned value is int. May it happen you use wrong bitsize for variables when working with mosquitto library?

I am afraid we need to troubleshoot further gathering more info on why it happens.

Edit: this is the only place where this error code is set:

		/* Set non-blocking */
		if(net__socket_nonblock(sock)){
			continue;
		}

		rc = connect(*sock, rp->ai_addr, rp->ai_addrlen);
#ifdef WIN32
		errno = WSAGetLastError();
#endif
		if(rc == 0 || errno == EINPROGRESS || errno == COMPAT_EWOULDBLOCK){
			if(rc < 0 && (errno == EINPROGRESS || errno == COMPAT_EWOULDBLOCK)){
				rc = MOSQ_ERR_CONN_PENDING;
			}

this assumes you have errno set to any of these two values and rc being < 0. AFAIK EINPROGRESS and WOULDBLOCK means you try to open socket in non-blocking mode.

EINPROGRESS
The socket socket is non-blocking and the connection could not be established immediately. You can determine when the connection is completely established with select; see Waiting for Input or Output. Another connect call on the same socket, before the connection is completely established, will fail with EALREADY.

But at the same time code

		/* Set non-blocking */
		if(net__socket_nonblock(sock)){
			continue;
		}

assumes that the connect code is not executed when socket is open in non-blocking mode?
assumes that socket is successfully set into non-blocking mode, so the issue is connect() returning non-success code and setting errno to EINPROGRESS/COMPAT_EWOULDBLOCK. Probably some issue in initialization of the arguments’ structures?

The broker runs mosquitto 1.5.7, I’m not using mosquitto source codes in my client

For the client I’m using the paho mqtt port included in the iolibrary driver from wiznet.

I did some more debugging and I think I found where this is coming from. in MQTTConnect():

    if (waitfor(c, CONNACK, &connect_timer) == CONNACK) // This is true so CONNACK is received
    {
        unsigned char connack_rc = 255;
        unsigned char sessionPresent = 0;
        if (MQTTDeserialize_connack(&sessionPresent, &connack_rc, c->readbuf, c->readbuf_size) == 1) // This returns 1 (success)
            rc = connack_rc; // <--- Why is connack_rc not changed in MQTTDeserialize_connack ?
        else
            rc = FAILURE;
    }
    else
        rc = FAILURE;

exit:
    if (rc == SUCCESS)
        c->isconnected = 1;

    return rc;

MQTTDeserialize_connack returns 1 (success), so rc = connack_cr, but connack_rc which is passed to MQTTDeserialize_connack is left untouched and keeps its initialized value of 255 which is the returned value. So that’s where the 255 comes from.

int MQTTDeserialize_connack(unsigned char* sessionPresent, unsigned char* connack_rc, unsigned char* buf, int buflen)
{
	MQTTHeader header = {0};
	unsigned char* curdata = buf;
	unsigned char* enddata = NULL;
	int rc = 0;
	int mylen;
	MQTTConnackFlags flags = {0};

	FUNC_ENTRY;
	header.byte = readChar(&curdata);
	if (header.bits.type != CONNACK)  // header.bits.type is equal to CONNACK this is OK
		goto exit;

	curdata += (rc = MQTTPacket_decodeBuf(curdata, &mylen)); // rc == 1, mylen == 0
	enddata = curdata + mylen;
	if (enddata - curdata < 2) // mylen == 0 so this evaluates to true and exits
		goto exit;

	flags.all = readChar(&curdata);
	*sessionPresent = flags.bits.sessionpresent;
	*connack_rc = readChar(&curdata); // <--- never happens so connack_rc is left unchanged

	rc = 1;
exit:
	FUNC_EXIT_RC(rc);
	return rc;
}

The variable mylen is at 0 after MQTTPacket_decodeBuf() returns. So enddata is equal to curdata and the function exits. The code where connack_rc is assigned never gets executed.

When MQTTDeserialize_connack is called, c->readbuf only contains {0x20, 0x00}. Theorically this looks like a CONNACK packet to me, according to the specs I have for mqtt. 0x20 means a CONNACK message type and the 0x00 is the remaining length. So my CONNACK received packet looks good, but why do I get a 255 as a return value to connect?

And why does it work when I use an unsecure connection? (I did’t try to debug the unsecure connection yet, maybe I should)

Could it be a bug in the mqtt library? I doubt it, this is old code, someone would have figured it out by now.

1 Like

Great research. Try also asking at Cedalo forum.

I went and checked the most recent paho mqtt code and on their version the value of connack_rc is initialized at 0 (connack_rc is changed to data->rc but it serves the same purpose).

The code from paho mqtt embedded_c / MQTTConnect() :

if (waitfor(c, CONNACK, &connect_timer) == CONNACK)
    {
        data->rc = 0;
        data->sessionPresent = 0;
        if (MQTTDeserialize_connack(&data->sessionPresent, &data->rc, c->readbuf, c->readbuf_size) == 1)
            rc = data->rc;
        else
            rc = FAILURE;
    }
    else
        rc = FAILURE;

exit:
    if (rc == SUCCESS)
    {
        c->isconnected = 1;
        c->ping_outstanding = 0;
    }

#if defined(MQTT_TASK)
	  MutexUnlock(&c->mutex);
#endif

    return rc;

So I changed it in my code to initialize connack_rc = 0 and voilà, everything works fine, well sort of. I can now connect to mqtt broker and keep the connection up, I can send messages from the IoT device and I receive those messages on my PC and I’m not disconnected after the keepalive timer. The only thing not working at the moment is subscribe. MQTTSubscribe() returns -1 (failure) but the broker receives the subscribe message as it tries to send messages to my IoT device (I see data coming in on my debug).

I still got some debugging work to do. I also want to check the effects of these changes on the unsecure connection. I’ll keep updating this post as it may be helpful to someone else here and I will submit a bugfix to the ioLibrary github when everything works fine.

Thanks for your support! Sometimes just talking about your problem rewires your brain and you find the answer!

I didn’t know about Cedalo, thanks for the tip, I’ll keep it in my bookmarks for future reference.

1 Like

Finally found the culprit!

So first I had this weird bug described above with the MQTTConnect() return value of 255… I went on and completely upgraded the MQTT library supplied with Wiznet’s ioLibrary with the latest paho-embedded-c library. I replaced the MQTTClient.c and .h files, and the content of the MQTTPacket/src folder and I kept the mqtt_interface.c and .h that are part of Wiznet’ ioLibrary.

Then I had this second bug where I received fragmented data that didn’t make any sense. It seemed when I called recv(), I received the previous packet missing the first byte, plus the first byte of the current packet. (MQTT packets that is)

I found out, in the file mqtt_interface.c from Wiznet’s ioLibrary, the following code:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{

	if((getSn_SR(n->my_socket) == SOCK_ESTABLISHED) && (getSn_RX_RSR(n->my_socket)>0))
		return recv(n->my_socket, buffer, len);

	return SOCK_ERROR;
}

This works flawlessly when you don’t use TLS. The function first checks if the W5x00 has a socket open, and if there’s data in the RX buffer of the W5x00 chip and calls recv() accordingly. I modified the code to use TLS like so:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{

	if((getSn_SR(n->my_socket) == SOCK_ESTABLISHED) && (getSn_RX_RSR(n->my_socket)>0))
		return wiz_tls_read(&tlsContext, buffer, len);

	return SOCK_ERROR;
}

The wiz_tls_read() function comes from the W5x00 TLS example linked previously in this post. It’s basically just a wrapper to call mbedtls_ssl_read(). The problem is, the MQTT library starts by reading the received data byte per byte as each byte will determine the next action to take. But when reading the first byte through TLS, mbedtls reads a whole chunk of data from the W5x00 chip to fill it’s own SSL Rx buffer. When the second byte is read, the w5x00_read() function checks if there’s data in the chip but the chip is empty, all the data is now in the TLS buffer! So the recv() is not called until more data comes in to fill the W5x00 buffer, so I get the first byte of the buffer only, and I get the rest of the buffer only if the W5x00 receives new data.

So I simply removed the check to see if there’s data in the W5x00 chip like so:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{
	if(getSn_SR(n->my_socket) == SOCK_ESTABLISHED)
		return wiz_tls_read(&tlsContext, buffer, len);

	return SOCK_ERROR;
}

And magically everything started to work. Unfortunately I didn’t find a way with mbedtls to check how much data is in the SSL Context Rx Buffer. If someone has insights on this, please let me know. But for now with this code above, mbedtls keeps feeding data from its own buffer on each single byte read from the mqtt library, and refills this buffer as needed from the W5x00 chip.

2 Likes