Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: IPC Framework #229

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Feature request: IPC Framework #229

wants to merge 1 commit into from

Conversation

LittleHuba
Copy link
Contributor

@LittleHuba LittleHuba commented Jan 27, 2025

Provides a proposal for a communication framework feature.

Note:
Only the second commit is actually relevant. The first commit is only to provide the means to put requirements.

@LittleHuba LittleHuba force-pushed the ulhu_ipc_fr branch 3 times, most recently from 3bdef14 to 8aea350 Compare January 27, 2025 13:34
@arsibo
Copy link
Contributor

arsibo commented Jan 27, 2025

@qor-lb qor-lb linked an issue Jan 27, 2025 that may be closed by this pull request
@HartmannNico HartmannNico marked this pull request as ready for review January 28, 2025 13:16
@nordanz
Copy link

nordanz commented Jan 28, 2025

My two cents for the requirment description:

To begin with, I believe the terminology we choose imposes significant constraints on our approach, so we must be mindful.

For instance, if we prioritize SOA (Service-Oriented Architecture) as the main requirement for our communication stack, My understanding is that we should focus on endpoints and provide toolings for those endpoints, including API descriptions, service discovery, and load balancers. Conversely, if our emphasis is on data, like streaming and MQ (Message Queuing) architecture, we should concentrate on protocols such as publish-subscribe (pub-sub), request-response (req-resp), monitors, and streams.

As I read the revised version, I understand that the intention as well is to provide a combined architecture, but in that case as well I think it's better to add for every aspect a desciption from relevant architecture. for example following aspects are different based on selected architecture:

Aspect MQ SOA
Communication Style Asynchronous, messaging-based Synchronous and asynchronous, service-based
Protocols Pub-sub, req-resp, streaming gRPC, and other service protocols
Tooling and Components Message brokers (or brokerless), queues, topics API gateways, service discovery, load balancers
Focus Data streaming, reliable messaging Service encapsulation, reuse, and orchestration
Scalability Highly scalable with horizontal scaling Scalable, but depends on service implementations
Use Cases Real-time data processing, event-driven systems Business services, microservices, enterprise integration

Another thing I can recall from top of my head is the Quality of Service (QoS) layer, which will be effected significantly too:

QoS in Message Queuing (MQ)
In MQ, QoS refers to the mechanisms that ensure messages are delivered reliably and efficiently between producers and consumers. Key aspects include:

  • Message Durability: Ensuring messages are not lost even if the system crashes.
  • Message Redelivery: Mechanisms to resend messages if delivery fails.
  • Message Ordering: Ensuring messages are processed in the order they were sent.

QoS in Service-Oriented Architecture (SOA)
In SOA, QoS is about ensuring that services meet certain performance and reliability standards. Key aspects include:

  • Availability: Ensuring services are accessible when needed.
  • Performance: Ensuring services respond within acceptable time frames.
  • Security: Ensuring data integrity and confidentiality.
  • Compliance: Ensuring services adhere to relevant policies and regulations.

At the end to summarize, my prefer architecture is something like DDS, due to the fact that our focus is data and communication protocols. but ofcourse it's a big standard and it's hard to find an opensource which support it completely with a good performance.

Copy link
Contributor

@HartmannNico HartmannNico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preparation for Merge FRs

Motivation
==========

SCORE is targeting high-performance automotive systems with safety impact. In general, these systems consist of multiple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find-Replace SCORE -> S-CORE
I did the same in the parallel FR, as we decided on this spelling in last Leadership Circle.

To seamlessly integrate into this architectural approach, the communication framework shall facilitate a
service-oriented approach.

Services in regard to the communication framework consist of a selection of events and remote-procedure-calls, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the discussion list:

  1. Interface and Service as aggregating entities are useful, but the two-step requirement of 1) finding the service and 2) utilizing the requested item seems compicated to me, if I just want an item.
  2. When we look at the information exchange items I see pure Data Pub/Sub, RPC Request/Response and Event (no data) Pub/Sub, an Event not being the same as an empty data item. An event should be communicatable by e.g. a PCIe interrupt line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I agree. Explicit service discovery holds little benefit. We should strive to handle service discovery implicitly in the binding-independent layer. In very rare cases the discovery might require user input, but even in those cases we can and should still keep the majority of the effort hidden from the user.
  2. I'll ask a colleague of mine to join on Friday for our discussion. We want to discuss this further.

Support for mixed criticality is a core feature of a communication framework. Hence, it greatly impacts architectural
decisions and influences many other aspects. One of them being performance, which is the third primary aspect of a
communication framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed


1. High throughput
2. Low latency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should consider KPI values with regards to a reference platform.

Further, reliable low-latency communication is only possible with appropriate scheduling.
Meaning, if a consumer is not scheduled when he receives an event, the latency of the communication is out of the hand
of the communication framework.
Thus, the communication framework must be capable to interact with the scheduler to influence the scheduling behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we have to discuss which scheduler. From a perspective of Deterministic Runtime Orchestration asynchronous execution outside the control of such orchestration breaks the determinism. Hence there is a close interaction required between the two.
One of the reasons why RTO is not part of an application framework, but a core framework like communication.

A method in a service interface is an element that has:

- a name
- a specified application routine with a given set of parameters and a return type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should add something about error handling. Invoking a method will essentially have two sources of error:

  1. The invocation itself fails. That is an invocation error
  2. The execution of the method produces an error. That should be a transmittable return type, an Result<T, E> analogon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree Error handling should be well defined anywhere (Or that we want to support some kind of)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mention this explicitly, since this comes in my head with any safety aware library.
But you are absolutely right in pointing it out. I'll add a sentence to the document about error handling. I'll keep it generic, since this will be mainly a discussion when we agree on the API. We will have to do error handling in a lot more places than just RPC.

1. it shall invoke the application routine with the provided parameters, and
2. return its result to the communication partner

A method call shall be possible both synchronously and asynchronously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When async is possible we also need explicit futures (or compatible structures) to be present, that are handles by the communication framework.
As requirement ok here, I propose to further specify an async invoke + future + await logic. This is a pretty proven pattern meanwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did keep this on a high level intentionally. What I want to avoid is having some pattern that comes from one programming language and we try to fit that into all other languages with force.

Let's use the paradigms of the programming language. At least C++ and Rust have their own ideas how asynchronizity should look like. Let's use those and not reinvent the wheel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely.

1. it shall invoke the application routine with the provided parameters, and
2. return its result to the communication partner

A method call shall be possible both synchronously and asynchronously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further consideration:
If a method can be called asynchronously and synchronously, the method itself must be asynchronous by nature. The implementation at the service itself is maybe sync, but then there must be an async wrapper around it, making it async again to allow async invocation.
Would we therefor require further capabilities of async routines like

  • yielding a result to the future, making it an assymmetic co-routine (my vote: no, as it would require a reusable future)
  • providing progress to the future (my vote: yes, useful for lifelines anyway, and com QoS will have it anyway)
  • receiving cancellation through the future (my vote: yes, optionally, to override long running tasks blocking methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asynchronicity comes from the communication itself already. The main motivation is to achieve decoupling.

How we treat this in the application should IMHO be up to the application. We can of course provide the tools to do this in any meaningful way of the programming language chosen.

For your specific points: This highly depends on the capabilities of the programming language and how this is implemented. I would keep it simple for now and then extend it later on if we see the need.

:satisfies: STKH_REQ__2,STKH_REQ__282
:status: valid

IPC communication shall be possible without copying to be transferred data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence structure.

:satisfies: STKH_REQ__2
:status: valid

The IPC binding shall ensure confidentiality of its communication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that not contradict the zero-copy approach? Confidentiality ultimately requires encryption which is kind of a breach of zero-copy. With appropriate system tools I will always be able to look into the shared memory, hold the execution through hardware debugging interfaces etc. So from a security perspective, only encryption would secure the data here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends. If the OS is capable of ensuring confidentiality of shared memory objects, then you can do zero-copy within those without encryption.

Of course you are at a certain point able to inspect this. But by that point you have access to the system on a level where you could also just read the encryption key from the ROM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the keys are in the HSM. But of course you have a point.

:satisfies: STKH_REQ__2
:status: valid

The communication framework shall support multiple bindings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bindings should be declared anywhere

:satisfies: STKH_REQ__2
:status: valid

The public API of the communication framework shall be binding-agnostic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also too short to understand. It needs small definition about what is meant with binding-agnostic

docs/features/communication/ipc/requirements/index.rst Outdated Show resolved Hide resolved
:satisfies: STKH_REQ__242
:status: valid

The communication framework shall provide infrastructure to enable binding-agnostic zero-copy tracing of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add that it also needs tooling or tooling support to visualize the communication in realtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss this point with the tooling community. Potentially, it makes sense to integrate this into a bigger tracing solution for the overall stack.

Tracing
-------

Based on :need:`STKH_REQ__242` the communication framework must support tracing of communication events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is for development purpose not if the car is in the field. But replay should be supported

Security Impact
===============

Security of communication is important for the security of the overall system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be something like zero trust

:satisfies: STKH_REQ__2
:status: valid

The communication framework shall allow multiple services per SW component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the definition of what a service is in communication should be defined anywhere

:satisfies: STKH_REQ__2,STKH_REQ__282
:status: valid

An event in a service interface is an element that has:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great idea Nico. I would see that as a feature of an future release version

A method in a service interface is an element that has:

- a name
- a specified application routine with a given set of parameters and a return type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree Error handling should be well defined anywhere (Or that we want to support some kind of)

@LittleHuba
Copy link
Contributor Author

@nordanz
Thank you for your comment.

In general, I agree with you that the right terminology is quite important. Let's dig down why I took SOA as motivation.

Disclaimer:
I'm not advertising that S-CORE will be an absolute SOA implementation where anything you do must go through the communication framework to an abstract service. This FR is about communication only.

I chose SOA for a different reason:
A main goal of S-CORE is to provide an overall stack with a versioned API.
For me communication is also part of the public API. Some of it might be between modules of S-CORE, but we should still consider this as "public API" in the sense of module compatibility. Otherwise, every migration to a new release of S-CORE would be very painful.

A lot of the principles of SOA (as I understand its key principles) go into the direction of a well-structured stable API.
(SOA helps you to group public API elements into logical units and define versioning on those units while it keeps everything underneath a blackbox).

Since communication is part of the public API, we should also strive to keep it stable. But the communication framework cannot do so on its own. Instead, we must provide the means to do so. This is where a pure data-centric approach (with my limited knowledge of DDS) falls short. It does not take versioning into account.

For versioning, you must have control over the elements an interface contains and how those elements behave. So, a loose collection of events and methods in a namespace does not really suffice. An application would have no way to ensure that the namespace (e.g. the unit of public APIs related to a specific part of the stack) stays stable. Anybody could add further elements to the namespace.

To summarize, I agree that SOA is somewhat misleading, with the way how loaded that term is. But we need something more than just MQ for S-CORE.

I'll reword the motivation and skip the whole loaded terminology.

@LittleHuba
Copy link
Contributor Author

@nordanz @FScholPer I decided to fully avoid the buzzword bingo with SOA.
Loose coupling and high cohesion are a goal independent of the decision with what overall architecture concept this project or its users go.
Instead of taking the shortcut with SOA, I decided to logically argue my way to the relevant conclusions.

I'll publish the reworked version after my sync with @HartmannNico today (given that he agrees with my improvements to his points). I hope you can give it another read then and provide me with feedback whether I covered your concerns.

A feature request for a communication framework.
Currently, this is based on IPC communication.
Copy link

The created documentation from the pull request is available at: docu-html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request for IPC
6 participants