[error] An issue on ZDM FOTA

Hi @karimhamdy1

I’m trying to implement FOTA feature on my Zerynth application. I added FOTA using ZDM to my firmware and it works. However, the board slows down over time and reboots about once every two hours.
The board works well without rebooting when Fota is disabled.
In order to figure out, I disabled all the functions except FOTA. The firmware was completely minimized with only the FOTA function. But the rebooting still happens about once every four hours.
The ESP32 devkitC has a fota enabled and fw secured VM installed. I’ve set enough watchdog timers.(about 15 hours).
The operating speed is important factor in my application and the board should work stably for a long period of time without rebooting. How can I solve the rebooting issue and operating speed issue???
I’ve already posted this problem a month ago, but I haven’t receive any reply from you.
I really hope you can suggest a good solution to solve this.

Thanks in advance.
Anatoli

Hi @karimhamdy1

How are you? I’m waiting for your reply.
Did you check my post about ZDM fota issue?
It’s been 3 days since I posted it. Isn’t Zerynth’s problem? Is it my mistake?
I really want you guys to respond to me, whatever.

Anatoli

hi
Are you kicking the watchdog timers periodically?
Can you also check the reason for the reset with watchdog_triggered() method, like in the example

Thanks for your reply!

I didn’t kick the watchdog timers periodically because I’ve set enough watchdog time(15 hours). In addition, the reboots occur within this watchdog time. It occurred about once every 4 hours.
In other words, the reboots were not caused by the WDT.
Is it necessary to add kick the watchdog timer periodically?

I didn’t check it, I’ll test it again and let you know the result.
Thanks.

Anatoli

I just finished the test.
I’ve set the wdt timers to 10 hours.

sfw.watchdog(0, 36000000)

The result of watchdog_triggered() was False when d.hat first started.
The first reboot happened about in 2 hours. Then the result of watchdog_triggered() was True.


Does it mean that the reboot was caused by WDT??? As you can see, the WDT timer has been set to 10 hours.
I attached my code.
main.zip (2.0 KB)

Hi @karimhamdy1
I added kick() function to my code.
It kicks the watch dog timer once every five seconds, but the esp32 still reboots before the WDT timeout completes. What I’m still weird is that the reboot seems to be caused by the WDT.
The WDT trigger function returns True when the reboot occurred.
How can I explain this???

Hi @karimhamdy1
I’m very looking forward to your solution about ZDM Fota issue!
I didn’t find any issue on my ZDM test code. It’s very simple and only includes ZDM and RTC feature.
I run kick function periodically, but ESP32 still restarts even before timeout of sfw. In addition, the more functions are added to the code, the shorter the reboot time interval. The return of watchdog_triggered() is still True.
Is it a ZDM Fota library’s issue?
I have been having a very hard time with this problem for two weeks. :exploding_head:
I’d like to hear from you on this matter as soon as possible.
I have to decide if I should add this feature.

Very strange behavior, I could not replicate your problem with my ESP32, Please let us look into this and get back to you.

Thanks for your reply.
Yes, it’s very weird. But that’s a real problem.
I attached my test code(main.zip). You can use it for testing my problem. You will probably see a reboot once every 3~4 hours.
Looking forward to your test result.
Thanks in advance.

Anatoli
main.zip (2.0 KB)

Hi @karimhamdy1

How are you?
We’re still waiting for your reply. How is it going? Did you test the code I sent before?
One more thing to note is that esp32 behaves very unstable when mqtt and zdm are used together in the code. I’m using the mqtt function to receive commands from the server and the http API to send data to the server.
However, when those functions (http request, mqtt and ZDM fota) are enabled in the firmware, esp32 reboots frequently. If zdm is disabled, esp32 works well. There certainly seems to be a problem with the zdm fota.
I hope you can get back to me as soon as possible
Thanks in advance.

Anatoli

hi @Anatoli_Juny
I tested the ZDM simple example while the watchdog was running and could not replicate your error i.e the board didn’t reset.
as of now we will try to test your code with the hardware components you are using, but that will take some time.
In the meantime, I think the problem is not in ZDM or watchdogs, but rather in the RTC or the sockets used to get the timestamp.
I suggest you test each component separately ( RTC, ntp socket, FOTA) then if they all work separately try using the ZDM simple with watchdogs and one of them and add features one by one.
If you are using custom hardware, please also check the power supply, make sure the microcontroller is getting the appropriate power needed when the wifi is on specially.

Thanks for your reply.
I’ll test the simple ZDM simple with watchdogs. However, many functions must be included to my code, including the RTC function.
As for the RTC function, ntp socket is used to get the current time and set the RTC chip by it. After that, ntp socket is no longer used unless the board restarts.
Anyway, please let me know when you tested my code.

Thanks in advance.
Anatoli

Hi @karimhamdy1
I tested the simple ZDM example with watchdogs using ESP32 DevikitC. I set the WDT timeout to 10 hours. The test result is as follows.
The board reboots once every about an hour and 40 minutes when I didn’t add kick()
The board reboots once every about 4 hours when I added kick() function to the code. (Kick function runs every five seconds).
As far as I am aware, reboot does not occur in secure fw enabled VM unless WDT timer is reached to threshold set by timeout is reached. In other words, if I set the WDT timeout to 10 hours, I don’t need to use kick function for 10 hours to prevent reboots.
However, as you can know in the above test, when I runs the kick function periodically regardless of the timeout, the reboot interval becomes longer. Very strange.
I attached my test code (ZDM simple example with WDT).
I’ll look forward to your next reply.
Thanks in advance.fota_test.zip (1.1 KB)

Hi @Anatoli_Juny,

We are under test trying to replicate the weird behavior you described.
There are 2 things we can try to test in parallel:

  • first of all, let’s try to avoid watchdog issue: you and I can perform a test with only watchdog set to 10 hour without connection, ZDM, etc.
    If this bad behavior (reset after 1h 40m without kicking, and reset after 4 hour with kicking) persists, the bug is in the WDT or in the to-big-time you set the timer (10h). Otherwise, the issue is in the connection/mqtt and we can go forward to the second point
  • second test we can perform is to put this new connection parameter before zdm device creation and insert the zdm_cfg in the args:

zdm_cfg = zdm.Config(keepalive=30)
device = zdm.Device(cfg=zdm_cfg,fota_callback=fota_callback)

Please, let me know the test results on your board and in the meantime I’ll try to test mine as well.

Hi @Matteo_Cipriani

Thanks for your reply.
Ok, I’ll test the first and second on my ESP32 custom board and ESP32 DevKit.
And I’ll let you know the results!

Thanks again!
Anatoli

Hi @Matteo_Cipriani

I just finished the first test on my custom board and 3sp32 devkitC.
Both boards restarts in about an hour and 50 minutes. I set WDT’s timeout to 10 hours. And I included only WDT function.
How is your test going? I am really curious.
I’m going to test it again with 5 hours of timeout. I’ll also let you know the result.

Hi @Matteo_Cipriani
I just tested WDT with 5 hours of timeout.
Both esp32devkit and my esp32 custom board restarted in 4 hours and 30 minutes.
My test results says that there is a problem on WDT function. Right?
How was your test?

Hi @Matteo_Cipriani
How is it going?
I’m waiting for your test results.

hi @Anatoli_Juny
Currently we are doing some tests on the watchdog and the ZDM module.
We are testing these features and we will keep you updated regarding this topic.
thank you.

Thanks for your reply.
Looking forward to solving this issue as soon as possible.