draft: debugging PTAG and ABS messages #312

byeonggiljun · 2023-11-24T08:33:23Z

This PR aims to debug the PTAG and ABS messages. Currently, in this PR (reactor-c/enclaves3), PTAG isn't sent when a federate is not in a zero-delay cycle (ZDC). However, the semantics of PTAG should be correct even for federates that aren't in a ZDC and there is a problem with PTAG now. So I made this PR to preserve the state that does not skip PTAGs and debug them.

…f a previous ptag

byeonggiljun · 2023-12-01T02:00:41Z

@edwardalee @hokeun @lhstrh I tried to resolve the problem on AfterDelays.lf and encountered additional difficulties. This is the summary of the efforts and I will explain this in the next meeting.

The reason why a federate sends a wrong LTC (described in this issue): Federates only consider zero-delay actions when advancing MLAA (Max Level Allowed to Advance).

So I let federates look up every network action in this commit (reactor-c/pull/312/commits/9307777). However, this commit released other masked problems.

Problem on AfterDelays.lf

There is a non-deterministic error in AfterDelays.lf. Let's compare two cases. I attached images below. We only need to see NET (100 msec, 0) from sw2 (The second from the right one). In the success trace, NET (100 msec, 0) is sent and in the failure trace, it is not sent. This is caused by 'the timing of PTAG'.

reactor-c/core/federated/federate.c

Lines 2719 to 2723 in 16af56a

    
           if (lf_tag_compare(_fed.last_TAG, tag) >= 0) { 
        
               LF_PRINT_DEBUG("Granted tag " PRINTF_TAG " because TAG or PTAG has been received.", 
        
                       _fed.last_TAG.time - start_time, _fed.last_TAG.microstep); 
        
               return _fed.last_TAG; 
        
           }

In the failed trace, a federate received PTAG(82 msec) before it completed the tag 81 msec. So it tried to send NET(82 msec) but it didn't send it because last_TAG = 82 msec >= tag = 82 msec. However, NET(82 msec) must be sent because it's stalled by MLAA and still waiting for TAG(82 msec).

Successfully executed trace

Deadlock occurred trace

Solution of the problem on AfterDelays.lf
Thus, I changed the code above like this.

reactor-c/core/federated/federate.c

Lines 2721 to 2727 in 51da45b

    
           if (lf_tag_compare(_fed.last_TAG, tag) > 0 
        
           || (!_fed.is_last_TAG_provisional && lf_tag_compare(_fed.last_TAG, tag) == 0) 
        
           || (_fed.is_last_TAG_provisional && lf_tag_compare(env->current_tag, _fed.last_TAG) < 0)) { 
        
               LF_PRINT_DEBUG("Granted tag " PRINTF_TAG " because TAG or PTAG has been received.", 
        
                       _fed.last_TAG.time - start_time, _fed.last_TAG.microstep); 
        
               return _fed.last_TAG; 
        
           }

This code will not send NET when

TAG >= intended NET
PTAG > intended NET
PTAG > current tag.

You can see that a federate sends NET(62 msec) although it received PTAG(62 msec) before in the trace below. And that NET(62 msec) allowed the RTI to send TAGs. So the deadlock didn't happen. I guess this is not an optimum solution yet it works. Because duplicated NETs are being sent. I'm trying to devise an elegant solution.

The trace after fixing the problem

Problem on ChainWithDelay.lf
A problem occurred after I made this commit (reactor-c/pull/312/commits/9307777) to consider every action in a federate.

Let's assume that we're at the tag 33 msec. The RTI sent PTAG(33 msec) to everyone. So, p: PhysicalPlant sends T_MSG with the tag 33 msec. However, c:Controller cannot execute reaction_2. Because it has reaction_1. We know that there is no message with the tag 33 msec for reaction_1 as pl:Planner's NET is 33 msec and there is a delay in the connection. But c:Controller has no sense about it and is waiting for ABS, T_MSG, or TAG at 33 msec.
Please note that when we only consider zero-delay actions when advancing MLAA, this isn't a problem because we didn't wait for reaction_1.
Solution of the problem on FeedbackDelay.lf

I'm still trying to take the shape of the solution.

The bottom line is that c:Controller cannot know the status of reaction_1. The RTI knows that information but it doesn't have a way to inform it. pl:Planner also knows that information and sending ABS(33 msec) will be the simplest solution. However, pl:Planner cannot recognize that c:Controller has an event at 33 msec and it is waiting for its input for 33 msec. Maybe NDT can help to solve this problem?

Let a user is able to select the start and end time visualizing a trace

1cd8de9

byeonggiljun added bug Something isn't working federated labels Nov 24, 2023

byeonggiljun marked this pull request as draft November 24, 2023 08:33

byeonggiljun changed the title ~~draft: debugging PRAG and ABS messages~~ draft: debugging PTAG and ABS messages Nov 24, 2023

Preventing the RTI to forward tagged messages for debugging

955b7a8

byeonggiljun mentioned this pull request Nov 24, 2023

RTI mistakenly gives PTAG in ChainWithDelay lf-lang/lingua-franca#2084

Closed

byeonggiljun and others added 6 commits November 25, 2023 11:57

Polish the start and end time option

a5682bf

Start making a unit test for the RTI

e90d3b2

Merge branch 'enclaves3' into ptag-debug

3111681

Do not ignore PTAG for debugging

aea1291

Do not ignore non-zero delay actions when updating MLAA

9307773

Remove RTI_test from CMakeLists.txt of the RTI

4830496

byeonggiljun force-pushed the ptag-debug branch from ed1018e to 4830496 Compare November 28, 2023 02:21

byeonggiljun added 5 commits November 28, 2023 11:34

Let the RTI forward T_MSGs again

58159e7

Send next event tag when a federate is stalled by MLAA

a13d0cb

Do not skip NETs when a federate is being stalled by MLAA

9b5ec7d

Build the unit test for the RTI

b1c6d9c

Do not block a federate when it can advance its current tag because o…

51da45b

…f a previous ptag

Base automatically changed from enclaves3 to main December 2, 2023 00:03

byeonggiljun and others added 2 commits December 4, 2023 16:17

Merge branch 'main' into ptag-debug

30852bb

Do not send duplicate NETs

8fbf281

byeonggiljun closed this Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: debugging PTAG and ABS messages #312

draft: debugging PTAG and ABS messages #312

byeonggiljun commented Nov 24, 2023

byeonggiljun commented Dec 1, 2023 •

edited

Loading

draft: debugging PTAG and ABS messages #312

draft: debugging PTAG and ABS messages #312

Conversation

byeonggiljun commented Nov 24, 2023

byeonggiljun commented Dec 1, 2023 • edited Loading

byeonggiljun commented Dec 1, 2023 •

edited

Loading