W5500 MQTT and TLS

Sevinz · October 4, 2022, 8:12pm

I’m trying to setup a secure MQTT connection using a W5500 and I’m having some problems. The MQTT client works perfectly using an unsecure connection, but as soon as I try to connect over TLS, it looks like I don’t receive any data at all from the broker after the connection is established and my MQTT connection is closed by the broker after the keepalive timer (I don’t receive the ping from the broker so I can’t reply to the keepalive request).

Messages sent from the broker also don’t come in. What confuses me is that the TLS handshake is successful so this means that the module actually receives data but it looks like when the TLS connection is established, the data stops coming in. The connection to the MQTT broker is also successful as I can send messages while the connection is active (and I receive those messages on MQTT explorer on my PC so I’m sure it works).

To enable TLS on the W5500, I followed the example I found on

But I’m clearly missing something and I can’t find what it is.

I’m using an STM32F429ZI MCU with a WIZ850io module. I’m working on STM32CubeIDE (eclipse).

Does anyone have a complete W5500 MQTT + TLS example they don’t mind to share?

Any help would be greatly appreciated.

Thanks.

Eugeny · October 4, 2022, 8:26pm

Where broker is running - on the PC? First I would confirm that secure channel (socket) is actually being set up successfully, for this you can use Wireshark running on the broker side. It would help to analyze the whole packet exchange to see what is wrong.

This is confusing. Are you talking about unsecure connection? Or you connect to broker with PC and this connection is successful?

Sevinz · October 5, 2022, 1:31pm

Broker is paid hosting at cloudmqtt, using mosquitto 1.5.7. I’ll setup a local broker and see if I get different results. I will also install wireshark to get some readings because all I have right now is a serial debug terminal.

Some clarifications:

When using an unsecure connection (port 1883), I have no problem at all. My IoT device can send and receive messages to/from the broker and the connection doesn’t drop. It works as expected.

When I use a secure connection on port 8883:

My PC can connect to the mqtt broker using TLS so my certificates are setup properly.

My IoT device can open a connection to the broker using TLS. When I run the wiz_tls_connect() from the github example, I receive the following debug message:

init connect[1] 
  . Performing the SSL/TLS handshake...
=> handshake

And then I get about 300 lines of debug messages about the ongoing handshake process, the exchange of keys, random numbers, etc and it ends like so:

<= handshake wrapup
<= handshake
ok
    [ Ciphersuite is TLS-RSA-WITH-AES-256-CBC-SHA256 ]
Connected to Socket 0

I can post the entire debug message if it’s useful, but all I read seems normal, I get no error messages up to this point

After that I get to run MQTTConnect() and that’s where it fails. My debug messages show some data is being sent and received (about 40 lines of debug messages, no error messages), and after 5 seconds I get:

MQTTConnect() returned: 255
Connection closed

The CONNACK is never received and MQTTConnect() times out and returns 255.

At this point I only have server side certificates enabled. My broker doesn’t require clients to use certificates. I’ve read I can still use username and password to login to the mqtt broker while using server certificates only for authentication. My connectData structure uses the same ID, username and password as my unsecure connection. Is this why it fails? Because on my broker I get the following messages in the log:

2022-10-05 12:15:09: New connection from ***.***.***.*** on port 8883.
2022-10-05 12:15:15: New client connected from ***.***.***.*** as 002200375456500320383743 (c1, k60, u'********').
2022-10-05 12:15:18: Client 002200375456500320383743 disconnected.

(IP and username were hidden with ***, I get the correct info there, Client ID is a random number)
So the broker gets the connection info, the username/password are accepted and connection is dropped from the client side (no CONNACK timeout)

Thanks for the help

Eugeny · October 5, 2022, 2:32pm

Looked into the Mosquitto 1.5.7 sources:

/* Error values */
enum mosq_err_t {
	MOSQ_ERR_CONN_PENDING = -1,
	MOSQ_ERR_SUCCESS = 0,
	MOSQ_ERR_NOMEM = 1,
	MOSQ_ERR_PROTOCOL = 2,
	MOSQ_ERR_INVAL = 3,
	MOSQ_ERR_NO_CONN = 4,
	MOSQ_ERR_CONN_REFUSED = 5,
	MOSQ_ERR_NOT_FOUND = 6,
	MOSQ_ERR_CONN_LOST = 7,
	MOSQ_ERR_TLS = 8,
	MOSQ_ERR_PAYLOAD_SIZE = 9,
	MOSQ_ERR_NOT_SUPPORTED = 10,
	MOSQ_ERR_AUTH = 11,
	MOSQ_ERR_ACL_DENIED = 12,
	MOSQ_ERR_UNKNOWN = 13,
	MOSQ_ERR_ERRNO = 14,
	MOSQ_ERR_EAI = 15,
	MOSQ_ERR_PROXY = 16,
	MOSQ_ERR_PLUGIN_DEFER = 17,
	MOSQ_ERR_MALFORMED_UTF8 = 18,
	MOSQ_ERR_KEEPALIVE = 19,
	MOSQ_ERR_LOOKUP = 20,
};

and 255 must be byte representation if the MOSQ_ERR_CONN_PENDING. If I am not mistaken returned value is int. May it happen you use wrong bitsize for variables when working with mosquitto library?

I am afraid we need to troubleshoot further gathering more info on why it happens.

Edit: this is the only place where this error code is set:

		/* Set non-blocking */
		if(net__socket_nonblock(sock)){
			continue;
		}

		rc = connect(*sock, rp->ai_addr, rp->ai_addrlen);
#ifdef WIN32
		errno = WSAGetLastError();
#endif
		if(rc == 0 || errno == EINPROGRESS || errno == COMPAT_EWOULDBLOCK){
			if(rc < 0 && (errno == EINPROGRESS || errno == COMPAT_EWOULDBLOCK)){
				rc = MOSQ_ERR_CONN_PENDING;
			}

this assumes you have errno set to any of these two values and rc being < 0. AFAIK EINPROGRESS and WOULDBLOCK means you try to open socket in non-blocking mode.

EINPROGRESS
The socket socket is non-blocking and the connection could not be established immediately. You can determine when the connection is completely established with select; see Waiting for Input or Output. Another connect call on the same socket, before the connection is completely established, will fail with EALREADY.

But at the same time code

		/* Set non-blocking */
		if(net__socket_nonblock(sock)){
			continue;
		}

~~assumes that the connect code is not executed when socket is open in non-blocking mode?~~
assumes that socket is successfully set into non-blocking mode, so the issue is connect() returning non-success code and setting errno to EINPROGRESS/COMPAT_EWOULDBLOCK. Probably some issue in initialization of the arguments’ structures?

Sevinz · October 5, 2022, 4:26pm

The broker runs mosquitto 1.5.7, I’m not using mosquitto source codes in my client

For the client I’m using the paho mqtt port included in the iolibrary driver from wiznet.

I did some more debugging and I think I found where this is coming from. in MQTTConnect():

    if (waitfor(c, CONNACK, &connect_timer) == CONNACK) // This is true so CONNACK is received
    {
        unsigned char connack_rc = 255;
        unsigned char sessionPresent = 0;
        if (MQTTDeserialize_connack(&sessionPresent, &connack_rc, c->readbuf, c->readbuf_size) == 1) // This returns 1 (success)
            rc = connack_rc; // <--- Why is connack_rc not changed in MQTTDeserialize_connack ?
        else
            rc = FAILURE;
    }
    else
        rc = FAILURE;

exit:
    if (rc == SUCCESS)
        c->isconnected = 1;

    return rc;

MQTTDeserialize_connack returns 1 (success), so rc = connack_cr, but connack_rc which is passed to MQTTDeserialize_connack is left untouched and keeps its initialized value of 255 which is the returned value. So that’s where the 255 comes from.

int MQTTDeserialize_connack(unsigned char* sessionPresent, unsigned char* connack_rc, unsigned char* buf, int buflen)
{
	MQTTHeader header = {0};
	unsigned char* curdata = buf;
	unsigned char* enddata = NULL;
	int rc = 0;
	int mylen;
	MQTTConnackFlags flags = {0};

	FUNC_ENTRY;
	header.byte = readChar(&curdata);
	if (header.bits.type != CONNACK)  // header.bits.type is equal to CONNACK this is OK
		goto exit;

	curdata += (rc = MQTTPacket_decodeBuf(curdata, &mylen)); // rc == 1, mylen == 0
	enddata = curdata + mylen;
	if (enddata - curdata < 2) // mylen == 0 so this evaluates to true and exits
		goto exit;

	flags.all = readChar(&curdata);
	*sessionPresent = flags.bits.sessionpresent;
	*connack_rc = readChar(&curdata); // <--- never happens so connack_rc is left unchanged

	rc = 1;
exit:
	FUNC_EXIT_RC(rc);
	return rc;
}

The variable mylen is at 0 after MQTTPacket_decodeBuf() returns. So enddata is equal to curdata and the function exits. The code where connack_rc is assigned never gets executed.

When MQTTDeserialize_connack is called, c->readbuf only contains {0x20, 0x00}. Theorically this looks like a CONNACK packet to me, according to the specs I have for mqtt. 0x20 means a CONNACK message type and the 0x00 is the remaining length. So my CONNACK received packet looks good, but why do I get a 255 as a return value to connect?

And why does it work when I use an unsecure connection? (I did’t try to debug the unsecure connection yet, maybe I should)

Could it be a bug in the mqtt library? I doubt it, this is old code, someone would have figured it out by now.

Eugeny · October 5, 2022, 6:28pm

Great research. Try also asking at Cedalo forum.

Sevinz · October 5, 2022, 7:04pm

I went and checked the most recent paho mqtt code and on their version the value of connack_rc is initialized at 0 (connack_rc is changed to data->rc but it serves the same purpose).

The code from paho mqtt embedded_c / MQTTConnect() :

if (waitfor(c, CONNACK, &connect_timer) == CONNACK)
    {
        data->rc = 0;
        data->sessionPresent = 0;
        if (MQTTDeserialize_connack(&data->sessionPresent, &data->rc, c->readbuf, c->readbuf_size) == 1)
            rc = data->rc;
        else
            rc = FAILURE;
    }
    else
        rc = FAILURE;

exit:
    if (rc == SUCCESS)
    {
        c->isconnected = 1;
        c->ping_outstanding = 0;
    }

#if defined(MQTT_TASK)
	  MutexUnlock(&c->mutex);
#endif

    return rc;

So I changed it in my code to initialize connack_rc = 0 and voilà, everything works fine, well sort of. I can now connect to mqtt broker and keep the connection up, I can send messages from the IoT device and I receive those messages on my PC and I’m not disconnected after the keepalive timer. The only thing not working at the moment is subscribe. MQTTSubscribe() returns -1 (failure) but the broker receives the subscribe message as it tries to send messages to my IoT device (I see data coming in on my debug).

I still got some debugging work to do. I also want to check the effects of these changes on the unsecure connection. I’ll keep updating this post as it may be helpful to someone else here and I will submit a bugfix to the ioLibrary github when everything works fine.

Thanks for your support! Sometimes just talking about your problem rewires your brain and you find the answer!

I didn’t know about Cedalo, thanks for the tip, I’ll keep it in my bookmarks for future reference.

Sevinz · October 6, 2022, 6:33pm

Finally found the culprit!

So first I had this weird bug described above with the MQTTConnect() return value of 255… I went on and completely upgraded the MQTT library supplied with Wiznet’s ioLibrary with the latest paho-embedded-c library. I replaced the MQTTClient.c and .h files, and the content of the MQTTPacket/src folder and I kept the mqtt_interface.c and .h that are part of Wiznet’ ioLibrary.

Then I had this second bug where I received fragmented data that didn’t make any sense. It seemed when I called recv(), I received the previous packet missing the first byte, plus the first byte of the current packet. (MQTT packets that is)

I found out, in the file mqtt_interface.c from Wiznet’s ioLibrary, the following code:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{

	if((getSn_SR(n->my_socket) == SOCK_ESTABLISHED) && (getSn_RX_RSR(n->my_socket)>0))
		return recv(n->my_socket, buffer, len);

	return SOCK_ERROR;
}

This works flawlessly when you don’t use TLS. The function first checks if the W5x00 has a socket open, and if there’s data in the RX buffer of the W5x00 chip and calls recv() accordingly. I modified the code to use TLS like so:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{

	if((getSn_SR(n->my_socket) == SOCK_ESTABLISHED) && (getSn_RX_RSR(n->my_socket)>0))
		return wiz_tls_read(&tlsContext, buffer, len);

	return SOCK_ERROR;
}

The wiz_tls_read() function comes from the W5x00 TLS example linked previously in this post. It’s basically just a wrapper to call mbedtls_ssl_read(). The problem is, the MQTT library starts by reading the received data byte per byte as each byte will determine the next action to take. But when reading the first byte through TLS, mbedtls reads a whole chunk of data from the W5x00 chip to fill it’s own SSL Rx buffer. When the second byte is read, the w5x00_read() function checks if there’s data in the chip but the chip is empty, all the data is now in the TLS buffer! So the recv() is not called until more data comes in to fill the W5x00 buffer, so I get the first byte of the buffer only, and I get the rest of the buffer only if the W5x00 receives new data.

So I simply removed the check to see if there’s data in the W5x00 chip like so:

int w5x00_read(Network* n, unsigned char* buffer, int len, long time)
{
	if(getSn_SR(n->my_socket) == SOCK_ESTABLISHED)
		return wiz_tls_read(&tlsContext, buffer, len);

	return SOCK_ERROR;
}

And magically everything started to work. Unfortunately I didn’t find a way with mbedtls to check how much data is in the SSL Context Rx Buffer. If someone has insights on this, please let me know. But for now with this code above, mbedtls keeps feeding data from its own buffer on each single byte read from the mqtt library, and refills this buffer as needed from the W5x00 chip.

joel · April 6, 2023, 11:14pm

i followed the above mentioned procedure and was able to communicate both with the mosquitto and aws iot however the MQTTYield was not working it was stuck in a loop trying to read so i did a small fix and it worked like charm

Rene · August 31, 2023, 1:57am

Hi Sevinz,

I’m working on a similar solution, but I need to send data to AWS. Have you connected to AWS, or are you using another broker?

Rene · August 31, 2023, 2:08am

Hi Joel,

Could you show me how you set the AWS parameters in the code? This includes the AWS endpoint, client ID, client certificate, CA certificate, and private key. Thanks!

Sevinz · August 31, 2023, 3:13pm

I have not tried AWS, I’m just using a basic “cloud compute” instance running linux. I am hosting my own mosquitto broker on it, along with a web app and a database.

MQTT is secured through mbedtls, using self signed certificates for mutual client / server authentication.
Been working like a charm for almost a year.

joel · September 3, 2023, 12:04pm

Capture1

use the wiznet tls example and edit according to these images
Hope this works you can ask me further after following these steps

Rene · September 3, 2023, 10:08pm

Thanks for your reply, Sevinz, and Joel.

Joel,

I followed the steps and I loaded the certificates, but I am unable to connect. I got a time-out error from the connect function. Please see the log below.

I got the network log, and the module tried to connect using the IP instead of the endpoint.

I was just wondering, do I need to use DNS to obtain the IP from the endpoint, or does the library handle this internally? How did you do this? Also, do you know where to configure the ClientID in the code? I need to publish data with a specific client ID.

Thank you!
Rene.

Rene · September 4, 2023, 3:25pm

Hi Joel,

I included the DNS in my code before trying to connect, and it seems to work. But now I have a handshake issue. Do you have any suggestions on how I can fix that? Please look at the log below.

IP address : 192.168.18.112
SM Mask : 255.255.255.0
Gate way : 192.168.18.1
DNS Server : 8.8.8.8
Loading the CA root certificate
mbedtls_ssl_setup : 0
init [1]
dns while
dns while
dns while
dns while

DNS: [a9dqi14kxnf7l-ats.iot.us-east-1.amazonaws.com] Get Server IP - 52.23.141.92
socket open port : 0
socket[0]
server ip : 52.23.141.92 port : 8883
init connect[1]
. Performing the SSL/TLS handshake…=> handshake

handshake: 536882496
client state: 0

=> flush output

<= flush output

handshake: 536882496
client state: 1

=> flush output

<= flush output

=> write client hello

=> write record

=> flush output

message length: 66, out_left: 66

Port:[0]/Send(66) :
16 03 03 00 3D 01 00 00 39 03 03 CB 4E 3E A4 39
A1 32 11 92 4F 3B D8 2E AE 01 02 F7 9F 85 B8 F0
C8 18 E1 AB 10 E9 CE C9 47 60 CC 00 00 06 00 9C
00 3D 00 FF 01 00 00 0A 00 0D 00 06 00 04 04 01
03 01
ssl->f_send() returned 66 (-0xffffffbe)

<= flush output

<= write record

<= write client hello

handshake: 536882496
client state: 2

=> flush output

<= flush output

=> parse server hello

=> read record

=> fetch input

in_left: 0, nb_want: 5

Port:[0]/Recv(5)[5]:
16 03 03 00 51
in_left: 0, nb_want: 5

ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)

<= fetch input

=> fetch input

in_left: 5, nb_want: 86

Port:[0]/Recv(81)[81]:
02 00 00 4D 03 03 45 F8 D2 A6 04 6E E4 EA 99 E3
E8 23 D5 6E FF 9C 77 8F 1E 92 15 9F 0F 7A 58 7E
3F 51 E8 9C C8 6A 20 45 BD 72 1F 9B 73 C1 2A A1
C7 10 C1 22 9E 11 61 9F 9E 09 CF A1 D6 A0 4B 4C
69 90 29 72 F3 59 2F 00 9C 00 00 05 FF 01 00 01
00
in_left: 5, nb_want: 86

ssl->f_recv(_timeout)() returned 81 (-0xffffffaf)

<= fetch input

<= read record

server hello, total extension length: 5

<= parse server hello

handshake: 536882496
client state: 3

=> flush output

<= flush output

=> parse certificate

=> read record

=> fetch input

in_left: 0, nb_want: 5

Port:[0]/Recv(5)[5]:
16 03 03 13 8E
in_left: 0, nb_want: 5

ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)

<= fetch input

bad message length

mbedtls_ssl_read_record_layer() returned -29184 (-0x7200)

mbedtls_ssl_read_record() returned -29184 (-0x7200)

<= handshake

failed
! mbedtls_ssl_handshake returned -29184: SSL - An invalid SSL record was received

Sevinz · September 7, 2023, 3:36pm

I remember having a similar problem where the handshake would terminate abnormally.

The example I was using had these 2 functions in SSLInterface.c:

/*Shell for mbedtls recv function*/
int WIZnetRecv(void *ctx, unsigned char *buf, unsigned int len )
{
	int32_t ret;
	ret = recv(*((int *)ctx),buf,len);
	return ret;
}

/*Shell for mbedtls send function*/
int WIZnetSend(void *ctx, const unsigned char *buf, unsigned int len )
{
    return (send(*((int *)ctx),buf,len));
}

For this example to work with my project, I had to modify these 2 functions to avoid getting handshake failures. I’m not sure if this is only a problem with connecting to mosquitto and the way the handshakes are handled, I didn’t have time to do more testing.

IIRC the recv() function returns 0 if the buffer is empty which basically tells mbedtls “success, there is no more data to receive” and this results in an invalid SSL record among other things. Same thing for the send() function. Don’t take my word for it though and test it yourself to be sure!

So I had to avoid returning a 0 and instead return an error code mbedtls is expecting. I modified those 2 functions like so:

int WIZnetRecv(void *ctx, unsigned char *buf, unsigned int len)
{
	int32_t ret = 0;
	if(getSn_RX_RSR(*((int *)ctx)) == 0)
	  return MBEDTLS_ERR_SSL_WANT_READ;

	ret = recv(*((int *)ctx),buf,len);
	return ret;
}

/*Shell for mbedtls send function*/
int WIZnetSend(void *ctx, const unsigned char *buf, unsigned int len)
{
	int32_t ret = 0;
	ret = send(*((int *)ctx),buf,len);
    if(ret == 0) return MBEDTLS_ERR_SSL_WANT_WRITE;
	return ret;
}

For the receiving part, this will first check if the buffer is empty and if so, return the correct mbedtls error code instead of 0. If there’s data in the W5500 buffer, recv() is called normally and the number of received bytes is returned.

For the sending part, this is pretty much self explanatory.

Like I said I didn’t have time to push the tests so I don’t know if this error is “generalized” or if it just applies to my situation. Feel free to post your results to compare.

Rene · September 11, 2023, 1:08pm

Thank you, Sevinz, for your reply. I decided to use the “aws-iot-device-sdk-embedded-C”, and it worked well. I found an example with W5500 + AWS SDK + mbedTLS, which I ported to my project. Thank you for your help.

alien · January 12, 2024, 11:22am

Hi Rene! Tell me, please, where did you find an example with W5500 + AWS SDK + mbedTLS? Is this somewhere on Github? Thank’s!

Rene · January 12, 2024, 4:19pm

Hi Alien,

Check the link to the GitHub

https://github.com/scarletwiz/W5300-AWS-C.git

Regards.

Topic		Replies	Views
W5500 Transmission Buffer Issue TCP/IP Chip	3	153	July 19, 2023
Stuck in while "handshake". W5500	0	305	March 26, 2023
W5500 maximum number of successfull TCP connection W5500	13	2072	June 22, 2018
W5500 TCP Client CONNECT problem W5500	28	10722	May 18, 2017
W5500 get stuck on connect(), but only sometimes (infinite loop stuck on SOCK_SYNSENT) W5500	17	2414	November 14, 2022

W5500 MQTT and TLS

Related Topics