Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAI Proposal TAM stream telemetry #2089

Merged
merged 2 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,010 changes: 1,010 additions & 0 deletions doc/TAM/SAI-Proposal-TAM-stream-telemetry.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions doc/TAM/netlink_dma_channel.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions inc/saiobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,20 @@ sai_status_t sai_query_stats_capability(
_In_ sai_object_type_t object_type,
_Inout_ sai_stat_capability_list_t *stats_capability);

/**
* @brief Query statistics capability for statistics bound at object level under the stream telemetry mode
*
* @param[in] switch_id SAI Switch object id
* @param[in] object_type SAI object type
* @param[inout] stats_capability List of implemented enum values, the statistics modes (bit mask) supported and minimal polling interval per value
*
* @return #SAI_STATUS_SUCCESS on success, #SAI_STATUS_BUFFER_OVERFLOW if lists size insufficient, failure status code on error
*/
sai_status_t sai_query_stats_st_capability(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should make this more generic ? since very similar function already exists and this is adding only 1 field to structure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi Kamil, we discussed internally using the sai_query_stats_capability directly and add the polling interval in it, but the reason we are not using that API is because of 2 reasons:

  1. Not all counters support using this way to retrieve, so if the counters are not returned in the list, it means it is not supported.
  2. Adding another field in the current stats capability breaks the ABI and making this frequently used function not backward compatible.

So after discussing it, we ended up deciding creating a new API for it.

_In_ sai_object_id_t switch_id,
_In_ sai_object_type_t object_type,
_Inout_ sai_stat_st_capability_list_t *stats_capability);

/**
* @brief Bulk objects get statistics.
*
Expand Down
40 changes: 40 additions & 0 deletions inc/saiswitch.h
Original file line number Diff line number Diff line change
Expand Up @@ -3084,6 +3084,46 @@ typedef enum _sai_switch_attr_t
*/
SAI_SWITCH_ATTR_EXTENDED_PORT_STATE_CHANGE_NOTIFY,

/**
* @brief Tam telemetry reporting byte size of chunk under the stream telemetry
*
* Defines the maximum number of bytes in a single report.
* The total number of bytes in a report should be as close as possible to this value.
* We can increase this value to reduce the number of sys calls.
* If the type of message is IPFIX, this value should not be less than 65535.
Pterosaur marked this conversation as resolved.
Show resolved Hide resolved
* Because we don't expect the IPFIX record to be fragmented.
*
* @type sai_uint32_t
* @flags CREATE_ONLY
* @default 65535
*/
SAI_SWITCH_ATTR_TAM_ST_REPORT_CHUNK_SIZE,

/**
* @brief Tam telemetry chunk count under the stream telemetry
*
* This value indicates how many chunks of reports that can be restored in the buffer.
* If the data structure is a ring buffer, the byte size of ring buffer is chunk count * chunk size.
* The default value, 0, means that this value was determined by the vendor.
* If the buffer is full, new incoming data will be dropped.
*
* @type sai_uint32_t
* @flags CREATE_ONLY
* @default 0
*/
SAI_SWITCH_ATTR_TAM_ST_CHUNK_COUNT,

/**
* @brief Set TAM telemetry type config change event notification callback function passed to the adapter.
*
* Use sai_tam_tel_type_config_change_notification_fn as notification function.
*
* @type sai_pointer_t sai_tam_tel_type_config_change_notification_fn
* @flags CREATE_AND_SET
* @default NULL
*/
SAI_SWITCH_ATTR_TAM_TEL_TYPE_CONFIG_CHANGE_NOTIFY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to TAM object ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all notify handlers have to be bound on the switch object. It's the limitation of SAI design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if its SAI design limitation, or just that all callback handlers are at switch level currently. @kcudnik please comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Bandaru, yes, in my understanding, this is basically how the notify handler works. The handler will be used by all objects in that type, hence it is added on upper layers, and the best place is the switch, as it will be setup during the initialization.

For example, fob handler will be used by all fdb entries, hence it is not tied to any fdb entry.

This is why the notifications are added on the switch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if its SAI design limitation, or just that all callback handlers are at switch level currently. @kcudnik please comment.

Currently it's forced that all notifications be in switch, this will unify entire design, and it will help update pointers when switch is put into warm boot mode, currently we don't have specific notifications per object and we are not planning to do so, more explanation is like Riff said in above comment

Copy link
Collaborator

@kcudnik kcudnik Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can return object_id of that tam in notification, and act accordingly, we currently are not planning to decouple pointers from switch to single objects, we generate a lot of metadat which help us organize everygnig, including notification pointers assignments, which are coupled to the switch, at this point it will be up to user to decide differnt action based on notificaion data

do you have any concearns about that? like performance reasons ? will there be expected a lot of tam notifications per second ?

internally in vendor implementation you can still have "differnet" pointers on each created internally tam objects, but assign them the same pointer that was given by switch object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a restriction in the metadata and other aspects, I think it is OK. I will wait for the name edits that I suggested above, before approving this PR. This callback item is closed from my side.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not hard restrictions, but we're made for convince, and drifting away from current schema would cause a lot of refactoring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats OK, I am not talking about hard restrictions alone. No need to change all the infra for this for now.

However, I request you to consider the possibility that the callbacks may - at sometime in future - may be per-object and plan any changes for this. Thanks @kcudnik

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also another side why all pointers are in switch, during warm boot of the platform, user is calling create_switch, and it can pass new pointers to the switch to receive notifications, this is importnt, since after platform warm boot, new notifications wii most likely have different memory address than previous boot, so notifications would need to be updated. and this could be done in a single call during create switch, if you would have pointers in objects , you would need to go and find that object and assign new notification on it or you would have invalid pointer, or null, depends how platform handles pointers in sdk


/**
* @brief End of attributes
*/
Expand Down
94 changes: 94 additions & 0 deletions inc/saitam.h
Original file line number Diff line number Diff line change
Expand Up @@ -910,6 +910,51 @@ typedef enum _sai_tam_telemetry_type_t

} sai_tam_telemetry_type_t;

typedef enum _sai_tam_tel_type_mode_t
{
/**
* @brief This TAM telemetry type supports to bound only one counter type
*/
SAI_TAM_TEL_TYPE_MODE_SINGLE_TYPE,

/**
* @brief This TAM telemetry type supports to bound multiple counter types
*/
SAI_TAM_TEL_TYPE_MODE_MIXED_TYPE,

} sai_tam_tel_type_mode_t;

/**
* @brief TAM telemetry type state of state machine
*/
typedef enum _sai_tam_tel_type_state_t
{
/**
* @brief Telemetry type is stopped
*
* In this stage, the recording stream should be stopped,
* and the configuration should be cleared.
*/
SAI_TAM_TEL_TYPE_STATE_STOP_STREAM,

/**
* @brief Telemetry type is started
*
* In this stage, the recording stream should be started,
* and the latest configuration should be applied.
*/
SAI_TAM_TEL_TYPE_STATE_START_STREAM,

/**
* @brief Telemetry type configuration is prepared,
*
* We expect the configuration to be generated in the feature,
* And notify the user by sai_tam_tel_type_config_change_notification_fn
*/
SAI_TAM_TEL_TYPE_STATE_CREATE_CONFIG,

} sai_tam_tel_type_state_t;

Copy link
Contributor

@JaiOCP JaiOCP Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpik, names of the attributes are not clear about the intent of the state machine.
I belive what you are trying to do is as follows
Default State: READY
Once all the subscription is completed, orchagent want to indiate to SAI that it can now start the creation of templates
Next State: CREATE
SAI adapter will do out of sync notification of template creation complete and once orchagent reads the templates then must transition to state to that hw should start streaming now
Next State: START
If there is any change in NOS or streaming need to be stopped then HW wil be instructed to stop streaming.
Next State: STOP
Once SAI adapter cleans its own data stucture etc, NOS will transition the state to ready
Next State: READY

FSM will look like this
READY -> CREATE - > [async notifcation] -> START -> STOP -> READY
There are no intermediate state transitions.

/**
* @brief Telemetry type attributes
*/
Expand Down Expand Up @@ -1076,6 +1121,35 @@ typedef enum _sai_tam_tel_type_attr_t
*/
SAI_TAM_TEL_TYPE_ATTR_COUNTER_SUBSCRIPTION_LIST,

/**
* @brief The mode of TAM telemetry type
*
* @type sai_tam_tel_type_mode_t
* @flags CREATE_ONLY
* @default SAI_TAM_TEL_TYPE_MODE_SINGLE_TYPE
* @validonly SAI_TAM_TEL_TYPE_ATTR_TAM_TELEMETRY_TYPE == SAI_TAM_TELEMETRY_TYPE_COUNTER_SUBSCRIPTION
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, it should be marked as:
@validonly SAI_TAM_TEL_TYPE_ATTR_TAM_TELEMETRY_TYPE == SAI_TAM_TELEMETRY_TYPE_COUNTER_SUBSCRIPTION

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea,thanks for your suggestion.

SAI_TAM_TEL_TYPE_ATTR_MODE,
Pterosaur marked this conversation as resolved.
Show resolved Hide resolved

/**
* @brief TAM telemetry type state
*
* @type sai_tam_tel_type_state_t
* @flags CREATE_AND_SET
* @default SAI_TAM_TEL_TYPE_STATE_STOP_STREAM
*/
SAI_TAM_TEL_TYPE_ATTR_STATE,

/**
* @brief Query IPFIX template
* SAI adapter will return error if COUNTER_SUBSCRIPTION_LIST and REPORT_ID is not configured.
* Return the IPFIX template for this telemetry type object.
*
* @type sai_u8_list_t
* @flags READ_ONLY
*/
SAI_TAM_TEL_TYPE_ATTR_IPFIX_TEMPLATES,

/**
* @brief End of Attributes
*/
Expand All @@ -1088,6 +1162,14 @@ typedef enum _sai_tam_tel_type_attr_t
SAI_TAM_TEL_TYPE_ATTR_CUSTOM_RANGE_END
} sai_tam_tel_type_attr_t;

/**
* @brief TAM telemetry state change callback
*
* @param[in] tam_tel_id Create Telemetry Object ID
*/
typedef void (*sai_tam_tel_type_config_change_notification_fn)(
_In_ sai_object_id_t tam_tel_id);

/**
* @brief Create and return a telemetry type object
*
Expand Down Expand Up @@ -2192,13 +2274,23 @@ typedef enum _sai_tam_counter_subscription_attr_t
* @brief Telemetry label
*
* Label to identify this counter in telemetry reports.
* If the report type is IPFIX, this label will be used as the element ID in the IPFIX template.
*
* @type sai_uint64_t
* @flags CREATE_ONLY
* @default 0
*/
SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_LABEL,

/**
* @brief Setting of read-clear or read-only for statistics read.
*
* @type sai_stats_mode_t
* @flags CREATE_ONLY
* @default SAI_STATS_MODE_READ
*/
SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STATS_MODE,

/**
* @brief End of Attributes
*/
Expand Down Expand Up @@ -2370,6 +2462,8 @@ typedef struct _sai_tam_api_t
sai_remove_tam_counter_subscription_fn remove_tam_counter_subscription;
sai_set_tam_counter_subscription_attribute_fn set_tam_counter_subscription_attribute;
sai_get_tam_counter_subscription_attribute_fn get_tam_counter_subscription_attribute;
sai_bulk_object_create_fn create_tam_counter_subscriptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a tam string as part of the type for consistency.

Copy link
Collaborator Author

@Pterosaur Pterosaur Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a convention of SAI to use sai_bulk_object_xxxx for bulk operation.

SAI/inc/sailag.h

Lines 395 to 396 in ad20062

sai_bulk_object_create_fn create_lag_members;
sai_bulk_object_remove_fn remove_lag_members;

SAI/inc/sainexthop.h

Lines 335 to 338 in ad20062

sai_bulk_object_create_fn create_next_hops;
sai_bulk_object_remove_fn remove_next_hops;
sai_bulk_object_set_attribute_fn set_next_hops_attribute;
sai_bulk_object_get_attribute_fn get_next_hops_attribute;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

sai_bulk_object_remove_fn remove_tam_counter_subscriptions;
} sai_tam_api_t;

/**
Expand Down
26 changes: 26 additions & 0 deletions inc/saitypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1835,6 +1835,32 @@ typedef struct _sai_stat_capability_list_t

} sai_stat_capability_list_t;

/**
* @brief Stat capability under the stream telemetry mode
*/
typedef struct _sai_stat_st_capability_t
{
/**
* @brief Typical stat capability
*/
sai_stat_capability_t capability;

/**
* @brief Minimal polling interval in nanoseconds
*
* If polling interval is less than this value, it will be unacceptable.
*/
uint64_t minimal_polling_interval;

} sai_stat_st_capability_t;

typedef struct _sai_stat_st_capability_list_t
{
uint32_t count;
sai_stat_st_capability_t *list;

} sai_stat_st_capability_list_t;

typedef enum _sai_stats_count_mode_t
{
/** Count packet and byte */
Expand Down
2 changes: 2 additions & 0 deletions meta/aspell.en.pws
Original file line number Diff line number Diff line change
Expand Up @@ -222,3 +222,5 @@ Wildcard
www
xconnect
TWAMP
config
sys