Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the JSON-RPC connection #52

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

sylane
Copy link
Contributor

@sylane sylane commented Nov 26, 2024

Changed

  • The name of the grisp_connect configuration key to control the timout of
    individual JSON-RPC requests changed from ws_requests_timeout ot
    ws_request_timeout.
  • Le default log filter changed to trying to filter out only some messages to
    filtering out all progress messages, as it wasn't working reliably.
  • The connection is not a persistent process anymore, it is now a transiant
    process handling a connection and dying when the connection is closed.
  • Internally, the JSON-RPC is parsed into a list of atom or binaries to pave the
    road for namespaces. foo.bar.Buz is parsed into [foo, bar, <<"Buz">>] (if foo
    and bar are already existing atoms, but 'Buz' is not).

Fixed

  • The client is now waiting 1 second before trying to reconnect when it gets
    disconnected fomr the server.

@sylane sylane requested review from maehjam and ziopio November 26, 2024 15:46
@sylane sylane force-pushed the sylane/cleanup-connection branch from 6855b25 to 755c7f5 Compare November 26, 2024 15:46
@maehjam
Copy link
Member

maehjam commented Nov 26, 2024

Can you explain to me the different purposes of the modules grisp_connect_connection, grisp_connect_client and grisp_connect_jsonrpc. First impression is that it's all mixed up. This needs better documentation.

@sylane
Copy link
Contributor Author

sylane commented Nov 26, 2024

@maehjam It is less mixed up than it was, but I get that it may still be under documented. grisp_connect_jsonrpc is only doing JSON-RPC encoding/decoding, grisp_connect_connection encapsulate a JSON-RPC connection, and grisp_connect_client encapsulate the girps_connect client protocol.

Copy link
Member

@maehjam maehjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The logger changes and architecture changes of the websocket connection shouldn't be in the same PR. Both are changes in parts that are hard to test. They each need very careful testing and probably manual testing in addition to the automatic testing.
  2. Is this working with the current grisp_manager implementation or do we need to adjust grisp_manager as well?

reset_log() ->
reset_log(undefined).

reset_log(LasSeq) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have just fixed last_seq/0 to do that. Now we do more or less the same twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it ended up like this because I tried a lot of different ways to fix the issue, and the last working one could have been done in last_seq/0. I could change that.

@@ -20,7 +20,6 @@ wait_connection() ->
wait_connection(30_000).

wait_connection(0) ->
ct:pal("grisp_connect_ws state:~n~p~n", [sys:get_state(grisp_connect_ws)]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because now there is not always a permanent connection process, it may not be there and instead of trying to do some error handling and showing the state only when the connections there, given it is only debug information I just removed the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have connect and disconnect in every test case. I would do that in init_per_testcase and end_per_testcase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel with the timeouts and the distinction between connected and disconnected in handle_call this would be much better readable as a gen_statem. The module name is a bit strange, I liked grisp_connect_ws better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does way more than the old grisp_connect_ws. This remove complexity not related to the API from the client and API and could be reused easily. What would be the states if it was a statem ?

@sylane
Copy link
Contributor Author

sylane commented Dec 3, 2024

  1. The logger changes and architecture changes of the websocket connection shouldn't be in the same PR. Both are changes in parts that are hard to test. They each need very careful testing and probably manual testing in addition to the automatic testing.
  2. Is this working with the current grisp_manager implementation or do we need to adjust grisp_manager as well?

I only change the logging int he new code, I didn't touch the logging in the old code.
The change in the logging is due to the connection not being a persistent process anymore, and this is part of the connection architecture change.

What do you mean by these change shouldn't go together ?

 - Small changes in tests from review feedback
@peerst peerst self-requested a review January 8, 2025 16:16
@peerst
Copy link
Member

peerst commented Jan 8, 2025

This PR is stuck for a while now. Adding 1860 Lines and removing 430 is clearly beyond the size of reviewability (max 500 lines). How do we want to proceed with this? @sylane can you split it up in several PRs?

Also from what I can see so far: it looks in parts more like a rewrite than a refactoring. I'm not sure if that is due to too many refactoring steps stacked on top of each other so it only looks like a rewrite?

Rewriting or large refactors have a tendency to sneak in new bugs, so therefore require substantial improvement of testing at the same time: is that the case here?

How is the relationship between this code and DAB? Will this eventually end up in DAB? Parts of it? I don't want to have to maintain two forks of the same thing if that is the actual case.

@sylane
Copy link
Contributor Author

sylane commented Jan 8, 2025

@peerst This is already only some part of a bigger changeset that I split so it was simpler to review... Some changes already got merged in, but there is still more changes pending. The changes here are to go toward having a self-contained independent json-rpc client instead of having json-rpc logic spread everywhere in grisp-connect. So the connection is indeed a rewrite, but all the rest are changes to mostly remove things that are now abstracted by the new connection. I don't think I could easily split the changes further. For the testing, I wrote more tests, and I made the code more test-friendly, as the json-rpc connection can be tested completely independently. If we want more tests for it, I could write more, this shouldn't be complicated.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
#{formatter => {grisp_connect_logger_bin, #{}},
filter_default => log,
filters => [
{disable_progress, {fun logger_filters:progress/2, stop}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that If this should ship in the default dettings for a grisp_connect based project we need to open a PR for rebar3_grisp/ grisp_tools to update the configuration for new generated user projects;
(rebar3 grisp configure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure that it make things better. It looked better for me, and this is the common solution I found about our problem with logging certificate, but others should test it too.

src/grisp_connect_client.erl Outdated Show resolved Hide resolved
src/grisp_connect_connection.erl Outdated Show resolved Hide resolved
when Conn =/= undefined ->
ReqCtx = #{on_result => OnResult, on_error => OnError},
Params2 = maps:put(type, Type, Params),
case jarl:post(Conn, Method, Params2, ReqCtx) of
Copy link
Member

@ziopio ziopio Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have jarl:request and jarl:post and you only use jarl:post? Json RPC does not have the concept of a post but only request/notify

Copy link
Member

@ziopio ziopio Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the jarl application you have this 700+ lines gen_statem that implements JsonRPC message handling and GUN logic together.
Could these 2 concepts be split in 2 modules in Jarl,as it was in grisp_connect before this PR?
The jarl_connection module looks already too big

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the idea was first to consolidate the JSON-RPC in a single place in grisp_connect first, this is what I did with jarl_connection, but then I moved it out of grisp_connect to make the PR more compact. Maybe this could be refactored again to split again, not sure if it is needed now though, I am open to it. The main reason I think it was more important before is because the guns management and the json-rpc was done in different processes before, now it is done in the same process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maehjam What do you think about that @maehjam ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylane still... why post instead of request ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused what post is. There is no documentation in jarl about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post it to asynchronously post a request, request is doing a post and wait for the response so it is synchronous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using post, you need to handle the asynchronous messages with either a response or an error.

@maehjam
Copy link
Member

maehjam commented Jan 22, 2025

Sorry, I still don't think I'm able to review this. It is too much changes at once. Now, we have changes here and a new app jarl to review together. Splitting it up, didn't reduce the difficulty to review.

I feel before we can review the changes here, we need to review jarl which is 2000+ lines in one commit without any documentation. I looked into it and it's really hard to read. For example a function named jarl:error/5 can do different things: Did we receive a json-rpc error from the server or do we want to send one or something else? And what is the difference between jarl:error/5 and jarl:reply/2, when error is about replying an error? The naming of jarl:post/4 was already discussed inline. The name post is usually associated with HTTP and has no equivalent in JSON-RPC.

Furthermore, I have to note: Even when we move a big chunk of this rewrite in a separate library that is still a big rewrite. We already have two JSON-RPC client implementations here in grisp_connect as well as in dab and should reuse the knowledge and preferably also the implementations. I like the idea of a reusable JSON-RPC client app, but that should be developed together with the people who worked on the implementations in grisp_connect and dab to use their experience. Just putting us here as reviewers after implementing 2000+ lines doesn't work. That is too much to read while trying to compare with the existing work (without readable diffs).

If we want this big rewrite. I would prefer that we first develop jarl starting with the concept and general architecture that we discuss and agree on as a team. Then we can review the implementation in smaller steps (commits).

We should make sure a library like jarl is usable by grisp_connect as well as by dab, because otherwise what is the point of a separate app? Can you tell me whether jarl can be used in dab as well?

@sylane
Copy link
Contributor Author

sylane commented Jan 23, 2025

This PR is only around 330+330- lines if you don't count tests, docs and config, it is big but I don't think it is unrasonable. I understand that moves some review work to jarl, but it is now a self-contains library with limited scope and API. They weren't any documentation with the old jrpc-code, at least now theres is well defined type specifications. I could add more documentation, this is not a problem, but would that change your mind ?
API jarl:error/5 is not doing different things, it is sending an error to the connection, either in the context of a request or not, I am not sure what you mean by it is doing different things. Everything you receive from the json-rpc peer, you get either synchronously when using jarl:request, or asynchronously with messages definied in jarl type specs. Post is used in http but convei some meaning too, it as a strong meaning of asynchronicity this is why it is used there, but this is just naming, if this was the only problem it would be easy to resolve, right ?
I am curious about what would be the smaller steps to implement a json-rpc client library, and I don't see why this couldn't be used by DAB, we could discuss that if you were interested.
Anyway, I am not sure how to react to your comment, the feeling is that there is nothing to do in your opinion to get this to a mergable state, you have issue with some naming and with documentation, all good points, but I don't feel that would change your opinion on the PR... Should I just delete the PR and the branch and forget about it ?

@ziopio
Copy link
Member

ziopio commented Jan 23, 2025

@sylane I think we should discuss jarl usage, architecture and API. This PR would be ok but it relies on us accepting jarl as is.
The best would be:

  • you present jarl to the team
  • we try to understand if and how the design would also help DAB or be improved using DAB design
  • We agree on an API that can be generic and reusable ecc
  • We define what needs to change from what jarl is now.
  • You rewrite Jarl with incremental PRs so we can quickly follow the development.
    So the PR can be parked here until we are fine with jarl.

@maehjam
Copy link
Member

maehjam commented Jan 23, 2025

Just to clear it up: The examples I mentioned about the function naming error and post in jarl were just to point out one of the things that make it difficult to read the code. The names were not self-explanatory to me, and like that I find most of the code hard to understand. The bigger issue is that it's just too much at once. Since it's one big commit, I can't even follow your steps of implementation to get the idea und structure that is probably totally clear to you as the author.

Thanks @ziopio for the proposal on how we can continue here. That is for sure one way. I would like a decision from @peerst and the rest of the team on how we proceed, before we put too much time into it. That can be discussed after the first point of presenting jarl to the team.

@sylane sylane force-pushed the sylane/cleanup-connection branch from 6d741a5 to c8495b4 Compare January 30, 2025 13:34
Copy link
Member

@maehjam maehjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the first 8 files and found that the JSON-RPC API must be broken, since errors are not handled. I was wondering whether the tests wouldn't catch that. Please extend grisp_connect_api_SUITE to better cover the API. It only tests two methods.

I stop my review here till the tests pass.

There are already some change requests in the inline comments.

One other thing I noticed is that rebar.lock is missing. With the new jarl library as a dependency it should be updated.

CHANGELOG.md Outdated
- The name of the grisp_connect configuration key to control the timeout of
individual JSON-RPC requests changed from ws_requests_timeout to
ws_request_timeout.
- Le default log filter changed to trying to filter out only some messages to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Le" -> "The".
Maybe mention that this is only done for testing and development. (I see log level in prod is notice so we don't need it there.)

]
}}
]}
]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary newline.

## Fixed

- The client is now waiting 1 second before trying to reconnect when it gets
disconnected from the server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to find this. Is this done here or in jarl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done in in grisp_connect_client, in the connecting function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look again and found it in waiting_ip. So that is a counter intuitive place to do it. I was expecting it in connecting somewhere, for example in enter there. Is there some deeper reason to put it in waiting_ip that I don't see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make much sense to not check again if the IP is available after a connection error, so going back to waiting_ip seems better, and as we want to wait between each ip checks, better leveraging this than behaving differently if we were connected or not before then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not leveraged. The waiting between waiting_ip is implemented in line 156 and the waiting between connects in line 144. They are not related at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, because by switching to state waiting_ip from either the connecting or connected state you will have the correct delay. Maybe using repeat_state instead of next_state at line 156 and not registering a new state_timeout would make it clearer for you ? This way the timeout would only be setup in waiting_ip enter function.

- Perform synchronous requests
- Start asynchronous requests
- Reply to a request
- Send and error result for a request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and" -> "an"

- Reply to a request
- Send and error result for a request
- Send asynchronous notifications
- Send generic errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That formulation is a bit unclear. Do you mean the automatic sending for predefined JSON-RPC errors from the JSON-RPC specifications? Then instead of "generic errors" rephrase it to "predefined JSON-PRC errors".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we figured out wasn't even in the JSON-RPC spec, and I removed it. O I will remove the generic error line.

- Send generic errors

When performing an asynchronous request, the caller can give an opaque context
term, that will given back when receiving a response or an error for this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that will be given back"

{filter_out_progress, {
fun grisp_connect_logger_bin:filter_out/2,
{supervisor, report_progress}}}
{disable_progress, {fun logger_filters:progress/2, stop}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to keep the comment why we do that here and also add it to local.config, test.config.

notify(Method, Type, Params) ->
Rpc = {notification, Method, maps:put(type, Type, Params)},
grisp_connect_jsonrpc:encode(Rpc).
% @doc Handles requests, notifications and errors from grisp.io.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it only handling notifications and requests, not errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't before the refactor IIRC, so I didn't add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done before the refactoring (you can clearly see it in the diff) and it had a totally different documentation.
So that documentation is about your change and it doesn't fit. Fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, and I remembered that and posted the second comment about where the error is handled. Now the error are handled in a generic way in grisp_connect_client, line 260

{error, already_updating} ->
{error, -11, already_updating, undefined, ID};
{error, already_updating, undefined, undefined, ID};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is not defined in grisp_connect_client.

{error, grisp_updater_unavailable} ->
{error, -10, grisp_updater_unavailable, undefined, ID};
{error, grisp_updater_unavailable, undefined, undefined, ID};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the error tuples are not handled as return values is grisp_connect_client:handle_connection_message/2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants