Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RIOT federated CoAP example and fix CoapUdpIp channel implementation #142

Merged
merged 17 commits into from
Jan 10, 2025

Conversation

LasseRosenow
Copy link
Collaborator

For some reason the receive example fails to sleep when PlatformRiot_wait_for is called.
I haven't found out why this is the case. But it must be caused somewhere in the runtime. Maybe the thread gets woken up.
With the main.c code from the sender it sleepy correctly.

I also fixed some bugs in the CoAP channel implementation.

Copy link
Contributor

github-actions bot commented Nov 29, 2024

Memory usage after merging this PR will be:

Memory Report

action_empty_test_c

from to increase (%)
text 59715 59715 0.00
data 744 744 0.00
bss 10112 10112 0.00
total 70571 70571 0.00

action_microstep_test_c

from to increase (%)
text 60586 60586 0.00
data 752 752 0.00
bss 10112 10112 0.00
total 71450 71450 0.00

action_overwrite_test_c

from to increase (%)
text 60423 60423 0.00
data 744 744 0.00
bss 10112 10112 0.00
total 71279 71279 0.00

action_test_c

from to increase (%)
text 60327 60327 0.00
data 752 752 0.00
bss 10112 10112 0.00
total 71191 71191 0.00

deadline_test_c

from to increase (%)
text 55913 55913 0.00
data 760 760 0.00
bss 10784 10784 0.00
total 67457 67457 0.00

delayed_conn_test_c

from to increase (%)
text 61342 61342 0.00
data 744 744 0.00
bss 10112 10112 0.00
total 72198 72198 0.00

event_payload_pool_test_c

from to increase (%)
text 18330 18330 0.00
data 624 624 0.00
bss 320 320 0.00
total 19274 19274 0.00

event_queue_test_c

from to increase (%)
text 27597 27597 0.00
data 736 736 0.00
bss 480 480 0.00
total 28813 28813 0.00

nanopb_test_c

from to increase (%)
text 42888 42888 0.00
data 904 904 0.00
bss 320 320 0.00
total 44112 44112 0.00

port_test_c

from to increase (%)
text 61290 61290 0.00
data 744 744 0.00
bss 10112 10112 0.00
total 72146 72146 0.00

reaction_queue_test_c

from to increase (%)
text 27319 27319 0.00
data 736 736 0.00
bss 480 480 0.00
total 28535 28535 0.00

request_shutdown_test_c

from to increase (%)
text 60558 60558 0.00
data 744 744 0.00
bss 10112 10112 0.00
total 71414 71414 0.00

startup_test_c

from to increase (%)
text 55612 55612 0.00
data 752 752 0.00
bss 10784 10784 0.00
total 67148 67148 0.00

tcp_channel_test_c

from to increase (%)
text 93846 93837 -0.01
data 1224 1224 0.00
bss 21312 21280 -0.15
total 116382 116341 -0.04

timer_test_c

from to increase (%)
text 55503 55503 0.00
data 744 744 0.00
bss 10784 10784 0.00
total 67031 67031 0.00

Copy link
Contributor

github-actions bot commented Nov 29, 2024

Benchmark results after merging this PR:

Benchmark results

Performance:

PingPongUc:
Best Time: 135.874 msec
Worst Time: 144.583 msec
Median Time: 138.877 msec

PingPongC:
Best Time: 170.662 msec
Worst Time: 178.068 msec
Median Time: 171.485 msec

ReactionLatencyUc:
Best latency: 22369 nsec
Median latency: 59775 nsec
Worst latency: 856280 nsec

ReactionLatencyC:
Best latency: 33003 nsec
Median latency: 59768 nsec
Worst latency: 175868 nsec

Memory usage:

PingPongUc:
text data bss dec hex filename
39564 752 8496 48812 beac bin/PingPongUc

PingPongC:
text data bss dec hex filename
46044 872 360 47276 b8ac bin/PingPongC

ReactionLatencyUc:
text data bss dec hex filename
29689 736 2112 32537 7f19 bin/ReactionLatencyUc

ReactionLatencyC:
text data bss dec hex filename
41666 840 360 42866 a772 bin/ReactionLatencyC

@erlingrj
Copy link
Collaborator

Where is wait_for called? Is it in the code connecting the federates doing the wait_for between each try_connect? I don't think wait_for has really been tested on RIOT yet, but the implementation is straight-forward so its hard to imagine a bug there? You could try inserting a wait_for in a normal program and see that it works?

@LasseRosenow
Copy link
Collaborator Author

Where is wait_for called? Is it in the code connecting the federates doing the wait_for between each try_connect? I don't think wait_for has really been tested on RIOT yet, but the implementation is straight-forward so its hard to imagine a bug there? You could try inserting a wait_for in a normal program and see that it works?

"Wait for" itself works at least for the sender strangely. I played around with it for quite a bit but haven't found out why it works there but not for the receiver .. I will see further debug it tomorrow

@LasseRosenow LasseRosenow force-pushed the add-riot-coap-federated-things branch 2 times, most recently from 135f8b0 to ad40dcc Compare December 9, 2024 22:58
@LasseRosenow LasseRosenow changed the title DRAFT: Add RIOT federated CoAP example and fix CoAP channel DRAFT: Add RIOT federated CoAP example Dec 9, 2024
@LasseRosenow LasseRosenow force-pushed the add-riot-coap-federated-things branch 2 times, most recently from 3b284a3 to 708bf90 Compare December 13, 2024 14:06
@LasseRosenow LasseRosenow force-pushed the add-riot-coap-federated-things branch from 708bf90 to b17d4d8 Compare December 16, 2024 13:02
@LasseRosenow
Copy link
Collaborator Author

@erlingrj So I found out why the wait_for doesn't work here.
The wait_for is used within a critical section inside lf_start and since interrupts are disabled in critical sections the wait_for also does not work.

Is it really needed to have all these things inside of a critical section?
Or should we maybe turn the critical section of when doint wait_for and turn it on after?

I also have some other issues caused by this: I now create a thread to manage the connections of all the CoAP channels.
But that thread is created in the constructor of the Coap channel. But that is also within the critical section, so the thread is not created until the critical section is closed.

But the problem is the critical section cannot be closed because it first waits until all channels are connected.

But that is not possible because the coap thread for handling the connection has not started yet. So we are going in circles :D

@erlingrj
Copy link
Collaborator

@erlingrj So I found out why the wait_for doesn't work here. The wait_for is used within a critical section inside lf_start and since interrupts are disabled in critical sections the wait_for also does not work.

That's a great find! Important detail that PlatformRiot_wait_for does not work when called from a critical section. This might cause problems in the future also. Either we have to decide that wait_for cannot be called from a critical_section and possibly test this and throw an error. Or we have to use a different underlying RIOT API for wait_for. I think we should go with the latter, make sure that wait_for also can be called from a critical section on RIOT.

Is it really needed to have all these things inside of a critical section? Or should we maybe turn the critical section of when doint wait_for and turn it on after?

I think we can safely reduce the critical_section here. I am trying to remember why I added it. It must've been to protect against the NetworkChannel thread doing anything inside the runtime (which would be through the callback function passed to it). This callback FederatedConnectionBundle_msg_received_cb in federated.c.

I think we can safely remove the critical section but we have to be careful with the order in which things are done:

  1. Construct federate and reactors, this will spawn any network threads which will be busy-spinning waiting for open_connection().
  2. Setup net_bundles pointer on Environment to point to _bundles on federated reactor
  3. Call Environment_assemble(), this will release the network threads.
  4. Call Environment_start() to wait for StartTagSignal and start the scheduler.

It is important that (2) comes before (3).

I also have some other issues caused by this: I now create a thread to manage the connections of all the CoAP channels. But that thread is created in the constructor of the Coap channel. But that is also within the critical section, so the thread is not created until the critical section is closed.

But the problem is the critical section cannot be closed because it first waits until all channels are connected.

But that is not possible because the coap thread for handling the connection has not started yet. So we are going in circles :D

That is a good point! I think we can remove the critical section if we are careful in constructing the lf_start() function

@LasseRosenow
Copy link
Collaborator Author

image

Hello from sender through CoAP! yayy :)

@LasseRosenow LasseRosenow marked this pull request as ready for review December 19, 2024 15:34
Copy link
Collaborator

@erlingrj erlingrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, if you could just provide some info for how to run the example we can merge this! 👍

src/platform/riot/riot.c Show resolved Hide resolved

// Inform runtime about new state if it changed from or to NETWORK_CHANNEL_STATE_CONNECTED
if ((old_state == NETWORK_CHANNEL_STATE_CONNECTED && new_state != NETWORK_CHANNEL_STATE_CONNECTED) ||
(old_state != NETWORK_CHANNEL_STATE_CONNECTED && new_state == NETWORK_CHANNEL_STATE_CONNECTED)) {
_env->platform->new_async_event(_env->platform);
}

// Let connection thread evaluate new state of this channel
msg_t msg = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this writes a "msg" to the network thread to notify that the state has been updated. This implies that both runtime and network thread might update the state, I think we discussed earlier to just have the network thread update the state, but this was perhaps only possible for the TcpIp channel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the issue here is that the TcpIpChannel is polling. We have a main loop and wait for incoming messages and then depending on what comes in we update the state.

But the RIOT CoAPChannel is async. Messages come in from the internal gcoap thread.
So we have a lot of callbacks such as _client_open_connection_callback etc.
And there is also one place where the runtime changes state which is in open_connection. This also exists in the current implementation of the TcpIpChannel from what I can see. So if we want to really set it only from one thread maybe we need to fix this also.

A solution here could be to maybe have an extra state variable that is only set by the runtime. (connection_open: bool) or something like that. This one would only be read by the coap / tcpip thread and then they would set their internal state accordingly?

I opened a new Issue (#183). I think this PR is already quite large and since this anyways not only affects the Coap channel alone, maybe we can fix this in a follow up PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Yes open_connection is there, in the Federated branch I use a mutex now to protect the shared state. I think that is necessary

@LasseRosenow
Copy link
Collaborator Author

Okay I have added a readme to the examples folder that explains how to build the various examples including the federated coap example.

@LasseRosenow
Copy link
Collaborator Author

I am currently also in parallel working on a test that we can run in the CI that uses the tap interface and sends some messages between two devices, but I think we should merge this first. I still have some issues with the test that I need to figure out

@LasseRosenow LasseRosenow requested a review from erlingrj January 7, 2025 19:06
@erlingrj erlingrj changed the title DRAFT: Add RIOT federated CoAP example Add RIOT federated CoAP example Jan 7, 2025
Copy link
Collaborator

@erlingrj erlingrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the README, I was still not able to follow it, but it is probably just some minor mistake. I think it might be in the Makefiles?

Get the IP address of the `receiver` by specifying the `PORT=tap1` and `ONLY_PRINT_IP=1` environment variables:

```shell
make ONLY_PRINT_IP=1 BOARD=native PORT=tap1 all term
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not work for me:

$ make ONLY_PRINT_IP=1 BOARD=native PORT=tap1 term
Makefile:36: *** REMOTE_ADDRESS is not defined. Please define it!.  Stop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ONLY_GET_IP

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still confused by the output:

make ONLY_GET_IP=1 BOARD=native PORT=tap1 term
/home/erling/tools/RIOT/dist/tools/pyterm/pyterm -ps /home/erling/dev/reactor-uc/examples/riot/coap_federated/sender/bin/native/lf-coap-federated-sender.elf --process-args tap1 
Twisted not available, please install it if you want to use pyterm's JSON capabilities
Welcome to pyterm!
Type '/exit' to exit.
2025-01-08 17:00:25,982 # RIOT native interrupts/signals initialized.
2025-01-08 17:00:25,983 # RIOT native board initialized.
2025-01-08 17:00:25,983 # RIOT native hardware initialization complete.
2025-01-08 17:00:25,983 # 
2025-01-08 17:00:25,983 # main(): This is RIOT! (Version: 2024.10)
2025-01-08 17:00:25,983 # [INFO] [NET] IPv6 address: fe80::b020:26ff:fef2:2497
2025-01-08 17:00:25,983 # [INFO] [NET] IPv6 address: f4ff:708:f4ff:708:60bd:608:f3:41dc
2025-01-08 17:00:25,983 # [DEBUG] [NET] CoapUdpIpChannel: Start connection thread
2025-01-08 17:00:25,983 # [INFO] [NET] CoapUdpIpChannel: Register receive callback
2025-01-08 17:00:25,983 # [DEBUG] [ENV] Calculating levels for Reactor MainSender
2025-01-08 17:00:25,983 # [DEBUG] [ENV] Calculating levels for Reactor Sender
2025-01-08 17:00:25,983 # [DEBUG] [ENV] Reaction 0 has level 0
2025-01-08 17:00:25,983 # [DEBUG] [NET] CoapUdpIpChannel: Open connection
2025-01-08 17:00:25,983 # [DEBUG] [NET] CoapUdpIpChannel: Update state: UNINITIALIZED => OPEN
2025-01-08 17:00:25,984 # 
2025-01-08 17:00:25,984 # [DEBUG] [NET] CoapUdpIpChannel: Sending 14 bytes
2025-01-08 17:00:25,984 # [DEBUG] [NET] CoapUdpIpChannel: CoAP Message sent
2025-01-08 17:00:25,984 # [DEBUG] [NET] CoapUdpIpChannel: Update state: OPEN => CONNECTION_IN_PROGRESS
2025-01-08 17:00:25,984 # 
2025-01-08 17:00:27,386 # [DEBUG] [NET] CoapUdpIpChannel: Client open connection callback
2025-01-08 17:00:27,387 # [ERROR] [NET] CoapUdpIpChannel: TIMEOUT => Try to connect again
2025-01-08 17:00:27,387 # [DEBUG] [NET] CoapUdpIpChannel: Update state: CONNECTION_IN_PROGRESS => CONNECTION_FAILED
2025-01-08 17:00:27,387 # 
2025-01-08 17:00:27,387 # [DEBUG] [NET] CoapUdpIpChannel: Sending 14 bytes
2025-01-08 17:00:27,387 # [DEBUG] [NET] CoapUdpIpChannel: CoAP Message sent
2025-01-08 17:00:27,387 # [DEBUG] [NET] CoapUdpIpChannel: Update state: CONNECTION_FAILED => CONNECTION_IN_PROGRESS
2025-01-08 17:00:27,387 # 
2025-01-08 17:00:29,773 # [DEBUG] [NET] CoapUdpIpChannel: Client open connection callback
2025-01-08 17:00:29,773 # [ERROR] [NET] CoapUdpIpChannel: TIMEOUT => Try to connect again
2025-01-08 17:00:29,773 # [DEBUG] [NET] CoapUdpIpChannel: Update state: CONNECTION_IN_PROGRESS => CONNECTION_FAILED

...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ONLY_GET_IP

Oups I have renamed it to ONLY_PRINT_IP everywhere now.

Still confused by the output:

This is very strange. On my computer it only prints the IP Address if ONLY_PRINT_IP is enabled.
I have rewritten the makefile code again, but it should not change the logic, but is more readable now. Not sure how your system is able to print the IP and the lf_start() code. it should only print the one or the other:

int main() {
#ifdef ONLY_PRINT_IP
  print_ip_addresses();
#else
  lf_start();
#endif
  return 0;
}

As can be seen in the main.c file of the sender and receiver examples. Is your main.c maybe outdated for whatever reason?

examples/riot/coap_federated/receiver/Makefile Outdated Show resolved Hide resolved
examples/riot/coap_federated/sender/Makefile Show resolved Hide resolved
@LasseRosenow
Copy link
Collaborator Author

Okay should be ready for review again :) 🤓

@LasseRosenow LasseRosenow requested a review from erlingrj January 9, 2025 15:46
@erlingrj
Copy link
Collaborator

OK I have it running now! It works please check my updates in the README.

Hopefully we can find a less manual and error prone way of spinning them up. Easy to mix up how to set PORT=tap0 or tap1 which changes depending on sender, receiver and whether you launch it or you want to find the IP address of the other.

Copy link
Contributor

Coverage after merging add-riot-coap-federated-things into main will be

71.19%

Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
src
   action.c77.69%65.63%100%81.18%134–135, 24, 42–45, 48, 50–51, 54–56, 62–63, 70–72, 72, 72–75, 81–82, 93–94
   builtin_triggers.c90.91%70%100%96.77%14, 18, 40, 43
   connection.c78.52%51.16%100%88.66%10, 104, 11, 110, 123–124, 136–137, 14, 14, 143, 145, 16–17, 21–22, 22, 22–23, 25, 27–28, 33, 48, 48, 48–49, 55, 60–62, 97
   environment.c78.35%55.56%84.62%83.33%12–13, 18, 20–21, 31, 35–36, 42–43, 51–52, 55–56, 60–61, 93–95
   event.c95.35%92.86%100%96.15%14–15
   federated.c5.61%2.97%7.69%6.76%101, 104–105, 105, 105–106, 108–109, 11, 111, 115–116, 118–120, 123, 125–129, 13, 13, 13, 130, 132–134, 137–139, 139, 139, 14, 140, 140, 140–142, 144, 147–148, 15, 150–154, 156–159, 16, 160–161, 163, 163, 163–166, 168, 168, 168–169, 17, 17, 17, 170, 170, 170–171, 175–176, 176, 176, 179–180, 184–186, 188, 188, 188, 190–194, 197, 197, 197–199, 20, 200, 203–204, 204, 204–205, 207–208, 21, 211–212, 217–218, 218, 218–219, 22, 22, 22, 221, 223, 223, 223–226, 226, 226, 226, 226–229, 23, 230–238, 24, 24, 24, 242, 245, 245, 245–247, 25, 251, 254–255, 255, 255, 255–259, 26, 260–263, 265, 27, 27, 27, 271–273, 28, 28, 28, 28, 28, 285–286, 289, 29, 290–292, 294, 294, 294–295, 299–300, 300, 300, 302, 304–305, 305, 305–306, 306, 306–307, 307, 307–308, 308, 308–309, 309, 309, 31, 310, 310, 310, 312, 312, 312–313, 313, 313–314, 314, 314–315, 315, 315, 317, 34, 34, 34, 34, 34–35, 38, 42–43, 45–48, 50, 50, 50–51, 51, 51, 53, 53, 53–55, 55, 55–57, 61–62, 66–67, 69–72, 74, 76, 76, 76–77, 77, 77–78, 78, 78–79, 79, 79, 82–83, 85–88, 9, 90, 90, 90–93, 95, 97, 97, 97–98
   logging.c87.50%80%100%88.64%24, 37–39, 46, 59–60
   network_channel.c57.69%50%100%58.82%40, 40, 40, 40, 45–48, 53–54, 57
   port.c78.08%45.83%100%93.33%10, 10, 10, 16, 20, 25, 25–27, 27, 27–28, 39, 39, 39–40
   queues.c89.94%80.36%100%94.06%108, 113, 119, 21–23, 47–48, 60–61, 84–88, 91–92
   reaction.c70.34%54.55%100%78.26%15, 17, 21–22, 28–31, 31, 31–32, 42, 45, 47, 52–53, 53, 53–55, 55, 55–56, 73, 89–91, 91, 91–94, 94, 94–95
   reactor.c69.33%51.52%100%82.28%10, 101–102, 14–19, 22, 28, 30, 32–37, 37, 37–38, 38, 38, 43, 55, 58–59, 59, 59–60, 60, 60–61, 63, 77–78, 81–82, 82, 82–83, 83, 83–84, 86, 91
   serialization.c50%50%50%50%16–17, 26–27, 33–35, 38–40
   tag.c40.19%31.48%60%47.92%14, 14–15, 17, 17–18, 23–24, 24, 24, 24, 24–25, 27, 27, 27, 27, 27–28, 30, 30, 30–31, 33–34, 34, 34–35, 37, 37, 37, 37, 37–38, 40, 40, 40, 40, 40–41, 43, 53–54, 63, 63–64, 83–85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85–87, 89
   timer.c95%66.67%100%100%14, 25
   trigger.c100%100%100%100%
   util.c0%0%0%0%10, 3–4, 4, 4–5, 5, 5–6, 8–9
src/platform/posix
   posix.c52.73%30%66.67%56%100, 100, 100–102, 106, 16, 18, 20–21, 34–36, 38–40, 48–49, 54–59, 59, 59–62, 62, 62–64, 67, 73–74, 78, 81, 92–94, 94, 94–96, 98–99
   tcp_ip_channel.c66.41%51.61%100%75.99%100, 102, 105–106, 116–117, 117, 117–118, 123–124, 124, 124–126, 130–131, 131, 131–133, 147, 153, 153, 153, 153, 153–154, 154, 154–155, 157, 157, 157–158, 167, 174–175, 179, 183, 183, 183–185, 185, 185–186, 188, 188, 188–190, 194, 194, 194–195, 215, 228–229, 229, 229–230, 236, 241–242, 242, 242–243, 243, 243–244, 246–247, 247, 247–248, 248, 248, 250–253, 263, 263–264, 264, 264–265, 289–290, 290, 290–292, 292, 292–293, 293, 293–294, 303, 303, 308, 310, 312, 314, 326–327,

@LasseRosenow
Copy link
Collaborator Author

Hopefully we can find a less manual and error prone way of spinning them up. Easy to mix up how to set PORT=tap0 or tap1 which changes depending on sender, receiver and whether you launch it or you want to find the IP address of the other.

I 100% agree! :)

@LasseRosenow
Copy link
Collaborator Author

Your changes look good, thank you!

@LasseRosenow LasseRosenow changed the title Add RIOT federated CoAP example Add RIOT federated CoAP example and fix CoapUdpIp channel implementation Jan 10, 2025
@LasseRosenow LasseRosenow merged commit b09c2ed into main Jan 10, 2025
8 checks passed
@LasseRosenow LasseRosenow deleted the add-riot-coap-federated-things branch January 10, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants