-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't restart mesh once stopped (IDFGH-14433) #15213
Comments
@leenowell This log indicates that the mesh network has not stopped. Could you provide the log after calling mesh stop? |
@zhangyanjiaoesp as requested this is the log from when I received MESH_EVENT_NO_PARENT_FOUND ` I (21788) mesh: 452 I (21898) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:62 I (22068) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:63 I (22418) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:64 I (22588) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:65 I (22938) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:66 I (23098) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:67 I (23448) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:68 I (23618) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:69 I (23968) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:70 |
This log seems that the mesh has successfully stopped. It’s strange that you encountered a failure when setting the config. I tested it locally, and after stopping, calling init -> config -> start() successfully starts the mesh. Below is my log. Could you provide your test demo for further investigation?
|
In your example did you restart immediately after the esp_mesh_stop() call returns or wait for the mesh to be stopped (e.g. when received MESH_EVENT_STOPPED)? Reason I ask is that in my scenario I did the following
Therefore it would have had sufficient time to fully stop whilst SC was running. I need to get my code from backup to reproduce it so tried to quickly change the internal comms mesh example but stopping the mesh on MESH_EVENT_NO_PARENT then trying to start it immediately afterwards I intermittently get ` I (19517) mesh: 452 Core 0 register dump: Backtrace: 0x401716d4:0x3ffbf5a0 0x401717d7:0x3ffbf5f0 ELF file SHA256: 54368c912 Rebooting... ` Waiting to restart after the mesh stopped event I get ` I (14817) mesh: 452 Backtrace: 0x400D9DC8:0x3FFB20A0 0x400831B1:0x3FFB20D0 0x40171767:0x3FFBF5A0 0x401717D3:0x3FFBF5F0 ` |
I have double checked the code and I actually restarted in the mesh in the wifi event handler when I received WIFI_EVENT_STA_CONNECTED. ` // Setup Mesh config
` |
@leenowell So this issue has been solved? |
@zhangyanjiaoesp Unfortunately not, the issue is still there. As a temporary work around so I can continue my project I have to do the following
The other side of this is that it seems you can't run SmartConfig whilst the mesh is running. I am not sure if this is another bug or this is intended and the only way to use SmartConfig and Mesh is to fully stop the Mesh and restart it. If it is intended I think there is probably a feature request to better integrate the 2. |
@leenowell Is there more logs after this ? |
@zhangyanjiaoesp Happy to redo the code to run the test. Just want to double check what you want me to test. Is the following correct?
Is that correct? |
This is just what I saw in your comment from this link(http://esp32.io/viewtopic.php?f=21&p=142998). I don't know what your testing scenario was like at that time.
This seems to be ok |
In my test I gave the correct SSID and channel but wrong password. Oddly it generated MESH_EVENT_PARENT_DISCONNECTED instead of MESH_EVENT_NO_PARENT_FOUND. This is the full log ` rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) I (5963) mesh_node: MeshStart: SSID len [6] I (6253) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x1, need_scan_router:0x0, look_for_nwk_count:1 I (6583) mesh_node: MeshNodeEvent: <MESH_EVENT_FIND_NETWORK>new channel:11, router BSSID:00:00:00:00:00:00 I (7283) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (7603) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (7933) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (8263) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (8593) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (8923) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (9253) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (9583) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (9913) wifi_node: WIFINodeEvent: Unknown WiFi event [1] I (9943) mesh: [DONE]connect to router:Lounge, channel:11, rssi:-47, f0:86:20:13:b4:33[layer:0, assoc:0], my_vote_num:1/voter_num:1, rc[24:0a:c4:82:6f:1d/-47/1] abort() was called at PC 0x400897af on core 1 Backtrace: 0x40081add:0x3ffd7ad0 0x400897b9:0x3ffd7af0 0x4009145d:0x3ffd7b10 0x400897af:0x3ffd7b80 0x400d9b92:0x3ffd7bb0 0x4008a25d:0x3ffd7be0 ELF file SHA256: 9bfc081f3 I (12788) esp_core_dump_flash: Save core dump to flash... ` |
@leenowell The log is:
|
@zhangyanjiaoesp thanks for getting back to me. I thought the internal network example doesn't connect to a router? If so, are you able to test with one that does please? Also, I notice you call esp_mesh_disconnect before esp_mesh_set_self_organised is this required as I think in the docs it just says to call esp_mesh_set_self_organised. Finally where did you call esp_mesh_set_self_organised to put it back into self organised mode? Oddly the smartconfig errors are still logged too but as you say it does seem to get the smartconfig info. Does it then reconnect to the router with it and the mesh continues to work? |
We need to analyze specific problems on a case-by-case basis, as the examples in the documentation cannot cover all scenarios.
The back to self organised code is here, same to your code. |
Feels like we are making some progress - thanks. I added esp_mesh_disconnect() as per your example and that seems to have fixed the core dump issue. However similar to what you are seeing (I think?) the whole thing seems to hang once the SmartConfig has received the SSID/ PWD and I never get a SC_EVENT_SEND_ACK_DONE event. For ease I moved the code into the smart config task function as follows `
} This is the log (I removed a load of Wifi Event [43] lines to save space I (20309) mesh_node: MeshNodeEvent: <MESH_EVENT_NO_PARENT_FOUND>scan times:60 |
I have been doing some more testing and discovered the following
`
} `
` and the log
It hangs here nothing else logged. 2 .I tried calling esp_smartconfig_stop and that didn't seem to make any difference and it continued to hang
However none connect to the router and end up in a stream of the following I (29798) wifi_node: WIFINodeEvent: Unknown WiFi event [1] backoff:0 |
@leenowell
The log is here, after 60 times fail, the mesh connect successfully.
|
Great news you have got it working - which example did you use? I have tried your fix on the internal_communication example because this was most like my scenario and it doesn't work. Mine doesn't connect after the 60 attempts and triggers smart config again. I have attached the source code - I had to give them .txt extensions as it wouldn't let me upload otherwise. smart_config_h.txt So feels like we are close to finding the problem if we compare what you did with mine. One difference I noticed is that your log shows you getting a MESH_EVENT_PARENT_DISCONNECTED when the first attempt fails whereas mine gets a MESH_EVENT_NO_PARENT_FOUND. Then yours gets a MESH_EVENT_NO_PARENT_FOUND after you set the new ssid/ password mine also gets MESH_EVENT_NO_PARENT_FOUND but this then triggers smart config again! This is my log I have removed the repeated logs during the 60 retries. ` rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) I (804) wifi:Total power save buffer number: 16 I (1394) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:2 I (14344) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:59 I (14654) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:60 I (14794) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:61 backoff:0 |
|
Yes exactly very odd. Would you be able to try it with the internal_communication example your side please so we can rule out environmental/ SDK version issues or whether there is a difference between the 2 examples which is causing the problem. For the ip_internal_network example did you simply add smart config task on PARENT_DISCONNECTED and then add the change of SSID. password code in the smart config? If you can confirm, I can try your test my end too to see what happens. |
@leenowell here is my test code |
@zhangyanjiaoesp is this the correct file as I can't find any smart config or the code you mentioned above? |
Sorry, have updated with the correct file |
Thanks. I will run some tests on this one and let you know how I do. Are you able to test the internal_communication one your side please? I wonder if the difference is that your test doesn't attempt to connect to the router prior to doing the smart config? |
@zhangyanjiaoesp I created a clean version of ip_internal_network and copy and pasted your mesh_main code into main.c. I did not set any of the config just used the defaults. Curiously I still get MESH_EVENT_NO_PARENT_FOUND ` I (14808) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:61 I (15158) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x1, look_for_nwk_count:62 ` Logically though if it is a single node with no router to connect to nor another ESP mesh node then wouldn't we expect MESH_EVENT_NO_PARENT_FOUND to be the event on start up? Wonder if this is a version difference. For me idf.py --version returns ESP-IDF v5.5-dev-1050-gb5ac4fbdf9 |
I have done a lot more testing to try and narrow down where the problem is and suspect there may be a few underlying bugs/ issues here. Firstly the good news, I added a flag to stop the smart config being kicked off when the second MESH_EVENT_NO_PARENT_FOUND is received and it does the second 60 attempt scan then connects to the router. Not a real fix but at least we are getting somewhere :). I have tried a number of things out and gone through the various logs and found the following
Have you been able to find anything your side? |
@leenowell
If you use the default setting, the router ssid is However, when you initially described the issue, you only mentioned using an invalid password. So, during my tests, I always set the correct SSID but an incorrect password, which resulted in a
The |
@leenowell
|
Thanks very much for your reply. I have applied your suggestions to the internal communications examples. Firstly the good news is removing the smart config stop means I get the smart config ACK event and that comes after the node has successfully connected to the router which is probably why it never appeared before. Having said that, the ESPTouch app never gets the ack and fails. I added the BSSID code you suggested but for me bssid_set is false `
` Log is ` In the log I see the AP MAC address coming through although this isn't quite the same as the BSSID shown on the ESPTouch app (the last byte is 32 rather than 33 on the app). I am using the ESPTouch app on Android (app version 2.3.3 and ESP Touch version 1.1.1) what are you using? |
I have tried a few more tests to help get to the bottom of the bug. If I hard code the router config BSSID to be the same as the one logged by smart config then I get the same results as you and the extra scan does not happen. If I set it but to different BSSID then it does the extra scan So, I think we are looking at a few different potential framework bugs
What do you think? |
@leenowell
I'm using the ESPTouch app on ios (app version 2.3.0 and ESP Touch version 1.1.0)
No, the BSSID is important, because there might be multiple APs with the same SSID in the environment. We rely on the BSSID to ensure that the AP we connect to is the one we actually want to connect to.
The BSSID displayed on the ESPTouch app is that send to node. I don't know why your BSSID displayed on the ESPTouch app is not same with the BSSID in the logs.
If the SSID is correct ,but password is incorrect, the event should be |
I tried the smart config example and my ESP Touch app doesn't get the ack back and fails despite sending the details correctly, the node connecting to the router and I received the ACK event. Does yours do the same or is it a bug in the Android version?
I agree it is needed if you only want to attach to a specific router but mesh doesn't require it on either initial start or when you give it new router information. Without the BSSID on start when you give it the correct SSID and password it just connects to any of the ones it finds. The same should happen when we change the router information rather than it doing a random scan on channel 1. The wifi docs seem to confirm this - https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html. Also, I checked the smart config example and it doesn't set the BSSID at all and connects immediately to the router - doesn't even do the short scan the mesh does (i.e. the one that actually connects not the 60 attempt one)
After some further testing I think I know what the issue is. If the phone switched to a different BSSID then I believe the screen updates to the new BSSID but what gets sent is the old BSSID, It could be the other way around but I am pretty sure that is the problem. Should I raise this as a separate issue or are you able to raise it internally?
I double checked the documentation (https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/network/esp_smartconfig.html) and it says that bssid_set on the smartconfig_event_got_ssid_pswd_t event structure will be set if the bssid is set rather than it is set and it is the correct one. At this point, we haven't called esp_mesh_set_router yet so in theory it wouldn't be able to check it until after the call. Having said that, even with a correct BSSID being sent from the ESPTouch app (seen in the smart config log) bssid_set isn't set. Therefore if the mesh has already checked the BSSID one assumes it has also confirmed the SSID if not the password at this point. So it seems odd that it doesn't just connect automatically rather than us having to call esp_mesh_set_router and then that will check the details?
Is there a document that describes which event is triggered when? The documentation is a bit ambiguous MESH_EVENT_PARENT_DISCONNECTED - parent is disconnected on station interface |
Sorry I forgot to ask. Have you managed to look into the errors we see in our smart config logs?
I have checked the smart config example and it doesn't have them so must be something to do with the mesh integration? |
I have updated my earlier comment on bssid_set - sorry I had forgotten I had hard coded the BSSID to test if it stopped the extra 60 scans. I run some further tests and can confirm that whilst the bssid appears in the ESPTouch app and comes through to the smart config logs and is one of the 2 BSSIDs of the router (the ESPTouch issue sending a different one is still there) bssid_set is not set and bssid is set to 0's |
@leenowell
These two errors are not significant and can be ignored. When SmartConfig starts, it first stops scanning and disconnects the Wi-Fi connection. These two logs indicate that the scan has already been stopped and the connection has been disconnected beforehand.
I think you're confused. The
For the
We have never encountered this issue before, and I have tested it with the Android version without any problem. And even if this bssid is not set, it doesn't matter, the subsequent connection of the mesh can still be successful, it just takes some extra time. |
Thanks very much for your reply it helped a lot. I have run some more tests and think we are getting close :)
` I (13882) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us |
Answers checklist.
IDF version.
ESP-IDF v5.5-dev-1050-gb5ac4fbdf9
Espressif SoC revision.
ESP32-D0WDQ6 (revision v0.0)
Operating System used.
Linux
How did you build your project?
Eclipse IDE
If you are using Windows, please specify command line type.
None
Development Kit.
ESP WROOM 32
Power Supply used.
USB
What is the expected behavior?
If I stop the mesh I can restart it later
What is the actual behavior?
If I call esp_mesh_start I get an ESP_ERR_MESH_NOT_INIT error. Calling esp_mesh_init before esp_mesh_start I then get ESP_ERR_MESH_NOT_CONI.
So I think the sequence should be
esp_mesh_init
esp_mesh_set_config
esp_mesh_start
However, now I get ESP_ERR_MESH_NOT_ALLOWED when I call esp_mesh_set_config.
Steps to reproduce.
...
Debug Logs.
No response
More Information.
This thread I posted on the forum has more detail if needed http://esp32.io/viewtopic.php?f=21&p=142998.
Essentially, I am trying to get smart config and mesh to work together so smart config is invoked if no parent is found. Since you can't call and wifi APIs when the mesh is working I am stopping the mesh, running smart config to get the ssid/ password and then trying to start the mesh again with the new details. Not the best scenario as the 2 should work together but seems like the only option - other than my workaround for this which is to restart the device.
The text was updated successfully, but these errors were encountered: