Connection drop mqtt AWS


#1

Hi There,

Also with the latest Zerynth release I encounter connection drop issues with my product. I’m able to connect to AWS, update my shadow and so on. Also FOTA works just fine. But after a few minutes, the product reports:

Establishing Wi-Fi connection…
MQTTConnectionError @[00CC:0067:00CC:00DE:00D3:00E9:0000:0000]

It seems the Wi-Fi connection is lost, I have implemented a reconnection procedure, but MQTT raises an error?

My code (thread) for connected part:

###############################################################################
#                                                                             #
#                   Define general connectivity thread                        #
#                                                                             #
###############################################################################

def connectivity_thread():
    
    global myjobs
    global thing_update_queue
    global motor_level
    global light_level
    global timer_state
    global firmware_version
    global fota_cntr
    global CHECK_FOR_STATE_CHANGED_MS
    global wifi_ssid
    global wifi_pass
    global LED_INDICATOR

    mqtt_setup_run_once_flag = 0
    only_run_once_flag = 0

    while True:
    
        if wifi.is_linked() != True: # try to establish Wi-Fi connection
            
            lock.acquire()
            print("Establishing Wi-Fi connection...")
            lock.release()
            
            while wifi.is_linked() != True:
                try:
                    wifi.link(wifi_ssid, wifi.WIFI_WPA2, wifi_pass)
                    break
                except Exception as e:
                    lock.acquire()
                    print("Can't connect to network: " + wifi_ssid)
                    lock.release()
                    
                    # only run once to prevent display flicker
                    if only_run_once_flag == 0:
                        update_indicators()
                        only_run_once_flag = 1
                    
            lock.acquire()
            print("Connected to network: " + wifi_ssid)
            lock.release()
            
            update_indicators()
            
            # Run mqtt connection establishment once, reconnection method is already implemented
            if mqtt_setup_run_once_flag == 0:
                # create aws iot thing instance, connect to mqtt broker, set shadow update callback and start mqtt reception loop
                lock.acquire()
                print('Connecting to mqtt broker...')
                lock.release()
                thing.mqtt.connect()
                thing.on_shadow_request(shadow_callback)
                
                thing.mqtt.loop()

                #create an IoT Jobs object
                myjobs = jobs.Jobs(thing)
                # check if there are FOTA jobs waiting to be performed
                # This function executes a FOTA update if possible
                # or confirms an already executed FOTA update
                awsfota.handle_fota_jobs(myjobs,force=True)
                lock.acquire()
                print('Connected to mqtt broker!')
                lock.release()
                mqtt_setup_run_once_flag = 1
                
            thing_update_queue = 1 # reset queued to prevent spamming (only update new data once)
                
        else: # Wi-Fi conecction OK, wait for publish to Cloud flag
            
            # reset var
            only_run_once_flag = 0
            
            if thing_update_queue > 0:
                lock.acquire()
                print('Update thing shadow')
                lock.release()
                
                thing.mqtt.publish("$aws/things/"+ thing_conf['thingname'] +"/shadow/update", json.dumps({"state": {"desired": None, "reported": {'motor_level': motor_level, 'light_level' : light_level, 'timer_state' : timer_state, 'firmware_version' : firmware_version}}}))

                thing_update_queue += -1
            
            if fota_cntr < (CHECK_FOR_FOTA_SEC/(CHECK_FOR_STATE_CHANGED_MS*0.001)):
                fota_cntr += 1
            else:
                # check for new incoming jobs
                # again, FOTA is executed if a correct FOTA job is queued
                lock.acquire()
                print('Checking for fota job...')
                lock.release()
                awsfota.handle_fota_jobs(myjobs)
                fota_cntr = 0

            sleep(CHECK_FOR_STATE_CHANGED_MS) # check every xx ms for new data to update CLP

Problems seems somehow similair to: Amazon Mqtt connection lost


#2

A small update: this happens especially often at my work. When I connect elsewhere it seems to work fine for at least 12 hours, but after around 24 hours it doesn’t accept shadow updates anymore…

After the MQTT error the ESP32 restarts.


#3

Hello @Marcel,

the Zerynth MQTT module handles reconnection automatically, anyway, after each reconnection, the MQTT session is cleared: for this reason you should subscribe again to all topics, shadow included.

Automatic subscription after reconnection can be achieved using the aconnect_cb parameter of the mqtt.connect function.

def subtoshadow(mqttclient):
    thing._shadow_cbk = None
    thing.on_shadow_request(shadow_callback)
...
thing.mqtt.connect(aconnect_cb=subtoshadow)
...

This parameter registers a callback to be called every time a connection is successfully performed.
Let me know


#4

Thanks! Running test now, let you know the outcome. Still strange that Wi-Fi connection drops several times. I will investigate this in more detail after this fix.


#5

Hi @LorenzoR,

When I switch off the Wi-Fi network while being connected and check the UART log I see this result:

Connected to network: WIRELESS<LF>
Connecting to mqtt broker...<LF>
Connected to mqtt broker!<LF>
Update thing shadow<LF>

*SWITCH OFF NETWORK FROM HERE*

Establishing Wi-Fi connection...<LF>
MQTTConnectionError @[00CC:0067:00CB:01CC:00CC:00DE:00D3:00E9]<LF>

It doesn’t seems to make a difference with or without your code snippet.
So it seems MQTT is trying to do something while being disconnected? Is this something in the background? Because I don’t do anything with MQTT when Wi-Fi is disconnected. Also see the code above.


#6

Hello @Marcel,

yes, MQTT tries to reconnect automatically in the background.
The snippet I posted is needed to add automatic subscription to topics when reconnection is performed.


#7

Thanks @LorenzoR

But it tries to reconnect even when the Wi-Fi connection is not re-established. How can I prevent this? Because now, it raises an MQTT error and causes a crash and reboot. Please point me in the right direction and keep up the good work :slight_smile:


#8

Hello @Marcel,

at the moment it is not possible to prevent reconnection.
You could try overriding the reconnect method with something like client.reconnect = custom_reconnect, but it is not a clean solution…

Is it important for you to avoid rebooting?


#9

Hi @LorenzoR

Yes that’s important to me, I would like to build a robust re-connection mechanism. For the Wi-Fi part the code above works just fine, but I want to achieve the same for MQTT. Can you give me some code example how I can reset all MQTT vars when Wi-Fi is disconnected, so that, I can establish MQTT connection again when Wi-Fi is back?

Will this issue be solved in future update?


#10

Hi @Marcel,

actually there is a callback inside the mqtt client which is not exposed, but might be good to achieve your goal.
After creating your thing, you can set the callback like this:

thing.mqtt._before_reconnect = perform_actions_before_trying_to_reconnect

and you can define it to wait for WiFi connection to be established again:

def perform_actions_before_trying_to_reconnect(mqtt_client):
    wifi_connection_event.wait()

This way you do not have to reset all MQTT parameters and MQTT will automatically reconnect when WiFi connection is up again.
If this works, we will try to add this feature to the AWS MQTT Client ASAP.
Let me know


#11

Hello @Marcel

we included reconnect callback support for AWS IoT within the latest patch.


#12

Hi @LorenzoR,

Thanks! Will run test asap :slight_smile:


#13

@LorenzoR,

Just tried, but things don’t seems to be improved? It’s still the same behavior as described above. I would like to create a very solid connection with reconnection mechanism. That won’t cause any resets.

I really appreciate it if you can provide me with some examples.


#14

Hello @Marcel,

can I have your code of the reconnect callback?


#15

@LorenzoR, see:

Connected to network: NETWORK-SSID
Connecting to mqtt broker...
Connected to mqtt broker!
Update thing shadow

FROM HERE I TURN OF THE WiFi NETWORK

Establishing Wi-Fi connection...
Thread 7 exited with exception MQTTConnectionError @[00CC:0053:00CC:00CA:00D3:00E9:0000:0000]

I use exactly the same code as in the first post.

If I use this code and wait till Wi-Fi connection is back:
thing.mqtt._before_reconnect = perform_actions_before_trying_to_reconnect

Then the thing doesn’t crash, but doesn’t reconnect the callback either when the Wi-Fi connection is re-established. I’m able to update the thing to AWS, but can’t receive info via the callback.


#16

Hello @Marcel,

I tried to insert what I meant directly in your code:

###############################################################################
#                                                                             #
#                   Define general connectivity thread                        #
#                                                                             #
###############################################################################

def check_link_before_reconnecting(mqtt_client)
    while not wifi.is_linked():
        sleep(100) # wait to be linked again before reconnecting

def connectivity_thread():
    
    global myjobs
    global thing_update_queue
    global motor_level
    global light_level
    global timer_state
    global firmware_version
    global fota_cntr
    global CHECK_FOR_STATE_CHANGED_MS
    global wifi_ssid
    global wifi_pass
    global LED_INDICATOR

    mqtt_setup_run_once_flag = 0
    only_run_once_flag = 0

    while True:
    
        if wifi.is_linked() != True: # try to establish Wi-Fi connection
            
            lock.acquire()
            print("Establishing Wi-Fi connection...")
            lock.release()
            
            while wifi.is_linked() != True:
                try:
                    wifi.link(wifi_ssid, wifi.WIFI_WPA2, wifi_pass)
                    break
                except Exception as e:
                    lock.acquire()
                    print("Can't connect to network: " + wifi_ssid)
                    lock.release()
                    
                    # only run once to prevent display flicker
                    if only_run_once_flag == 0:
                        update_indicators()
                        only_run_once_flag = 1
                    
            lock.acquire()
            print("Connected to network: " + wifi_ssid)
            lock.release()
            
            update_indicators()
            
            # Run mqtt connection establishment once, reconnection method is already implemented
            if mqtt_setup_run_once_flag == 0:
                # create aws iot thing instance, connect to mqtt broker, set shadow update callback and start mqtt reception loop
                lock.acquire()
                print('Connecting to mqtt broker...')
                lock.release()
                thing.mqtt.connect(breconnect_cb=check_link_before_reconnecting)
                thing.on_shadow_request(shadow_callback)
                
                thing.mqtt.loop()

                #create an IoT Jobs object
                myjobs = jobs.Jobs(thing)
                # check if there are FOTA jobs waiting to be performed
                # This function executes a FOTA update if possible
                # or confirms an already executed FOTA update
                awsfota.handle_fota_jobs(myjobs,force=True)
                lock.acquire()
                print('Connected to mqtt broker!')
                lock.release()
                mqtt_setup_run_once_flag = 1
                
            thing_update_queue = 1 # reset queued to prevent spamming (only update new data once)
                
        else: # Wi-Fi conecction OK, wait for publish to Cloud flag
            
            # reset var
            only_run_once_flag = 0
            
            if thing_update_queue > 0:
                lock.acquire()
                print('Update thing shadow')
                lock.release()
                
                thing.mqtt.publish("$aws/things/"+ thing_conf['thingname'] +"/shadow/update", json.dumps({"state": {"desired": None, "reported": {'motor_level': motor_level, 'light_level' : light_level, 'timer_state' : timer_state, 'firmware_version' : firmware_version}}}))

                thing_update_queue += -1
            
            if fota_cntr < (CHECK_FOR_FOTA_SEC/(CHECK_FOR_STATE_CHANGED_MS*0.001)):
                fota_cntr += 1
            else:
                # check for new incoming jobs
                # again, FOTA is executed if a correct FOTA job is queued
                lock.acquire()
                print('Checking for fota job...')
                lock.release()
                awsfota.handle_fota_jobs(myjobs)
                fota_cntr = 0

            sleep(CHECK_FOR_STATE_CHANGED_MS) # check every xx ms for new data to update CLP 

Let me know


#17

@LorenzoR,
Thanks, but same result:

  1. Connected to AWS
  2. Turn off Wi-Fi, Turn on Wi-Fi again
  3. Wait…result:

Thread 7 exited with exception MQTTConnectionError @[00CC:0053:00CC:00CA:00D3:00E9:0000:0000]

I even excluded the FOTA part when testing.


#18

Hello @Marcel,

can you try to add some print calls inside the check_link_before_reconnecting to understand what happens?
To see if the function actually gets called when you turn off the Wi-Fi, how long it waits before trying to reconnect, etc…


#19

Hi @LorenzoR,

When I modify the code:

def check_link_before_reconnecting(mqtt_client):
    while not wifi.is_linked():
        sleep(1000) # wait to be linked again before reconnecting 
        print('Still no Wi-Fi...')
    sleep(1000)
    print('Yes, there is Wi-Fi again!...')

The result:

Still no Wi-Fi…
Still no Wi-Fi…
Can’t connect to network: NETWORK
Still no Wi-Fi…
Still no Wi-Fi…
Yes, there is Wi-Fi again!..
Connected to network: NETWORK
Update thing shadow

It seems that the thing doesn’t crash, since I added additional delay, probably there is more time to re-establish the Wi-Fi connection or so?

But still, I’m not able to update thing via AWS, the thing does however update the shadow if pressing a button. So it still seems that the shadow callback isn’t reinitialized correctly.

If I change the delay to 100 again, I get the: “thread 7 exited with exception MQTTConnectionError @[00CC:0053:00CB:01CC:00CC:00CA:00D3:00E9]”
This error is raised just after the message that Wi-Fi is connected again.


#20

Hi @Marcel,

probably is_linked returns true when the device is linked to the AP but does not wait for an IP to be assigned, we will try to fix it ASAP, in the meantime keep the sleep out of the while loop.

Concerning the shadow the device should subscribe again after reconnecting, try the following:

def reinit_shadow(client):
    thing._shadow_cbk = None
    print('Subscribe again to shadow topic')
    thing.on_shadow_request(shadow_callback)

...
thing.mqtt.connect(breconnect_cb=check_link_before_reconnecting, aconnect_cb=reinit_shadow)