Skip to content

Data Templates Design

harishkumar gangula edited this page Oct 25, 2023 · 2 revisions

Introduction

As Obsrv continued to expand, the demand for demonstrations to clients, the marketing team, and tech teams became increasingly apparent. Given that Obsrv addresses various challenges across multiple domains and use cases, it has become imperative to showcase its vast potential and capabilities in a convenient and efficient manner. This involves providing quick and easily accessible demos, requiring minimal effort.

Background & Problem statement

Currently, the process of creating a demo for a specific use case or domain is time-consuming and resource-intensive. It involves:

  • Analyzing the events.
  • Developing scripts to generate data for all event types.
  • Ingesting the data.
  • Creating the necessary dashboards.

As the number of demos for various use cases continues to increase, the need to create individual scripts for each use case becomes inefficient in terms of both cost and developer time. This redundancy in script creation consumes valuable resources that could be better utilized.

Key Design problems

  • Data generation.
  • Scripts for each use case or domain.
  • Refreshing the data based on time.
  • Handling the complex data generation use case. Eg: For an event a field1 of integer type should be less than field2 value within a event.
  • Data should be generated with denorm values provided.
  • Data should be transformed to show the transformation use cases.

Assumptions

  • While we generate the millions of events here assuming the sample event in json format will be provided to create the data template
  • Also there are any denorm data that should also be provided to add it to the template.

Design

Data templates

It contains following sections in it

  • Id

  • Version

  • Name

  • Template

    • Schema
    • Conditions (optional)
    • Denorm config (optional)
{
"id" : uuid,
"name":  string,
"version": string,
"template": {
      "schema": {

       }, // JSON schema opf draft-4 version
      "conditions": {
         <path of the field> : {} // conditions
       }, // List of objects having conditions to apply for each field
      "denormConfig": [{
             "path": <string>, // path to resolve 
             "values": [{}] //  list of values to resolve 
      }] 
 }
}
Id: 

Unique id for the template 

Version:

 Semantic version of the template incase if template is updated 

Name:

 User readable name of the template 

Schema:

 The event will in the json so the json shema will be created using this event support JSON Schema draft-v4 and schema will be updated based on the type of event need to be generated

Example: if schema contains a field with name email as string then format is email updated in schema to generate random email address for each event

Conditions:

The conditions are optional and can be provided to generate the event with special conditions which the json schema is not supporting like generating a event of a field with duration format as per ISO 8601. Here format will be specified as condition to the field as duration which will update the filed with duration type.

Denorm config:

When there is need for the denormalization required for the data set then the denorm config will be added and the denorm config will be list of objects each containing the path and to search from the event and values is list of object in which one of the object will be taken randomly and to update the event.

values field can be enhanced to path to read the master data json from local file path or url.

Sample Template

{
  "id": "651d5350cb0cfb622d7ce9d6",
  "name": "vsk-students",
  "version": "1.0.0",
  "template": {
    "schema": {
      "$schema": "http://json-schema.org/draft-04/schema#",
      "type": "object",
      "properties": {
        "student_id": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "grade": {
          "type": "string",
          "enum": [
            "1",
            "2",
            "3",
            "4",
            "5"
          ]
        },
        "medium": {
          "type": "string",
          "enum": [
            "english",
            "kannada"
          ]
        },
        "joinedOn": {
          "type": "string",
          "format": "date"
        }
      },
      "required": [
        "student_id",
        "name",
        "grade",
        "medium",
        "joinedOn"
      ]
    },
    "condition": {
      "joinedOn": {
        "range": {
          "min": "2023-09-01T00:00:00.000Z",
          "max": "2023-10-01T00:00:00.000Z"
        }
      }
    },
    "denormConfig": [
      {
        "path": "grade",
        "values": [
          "Class 1",
          "Class 2",
          "Class 3",
          "Class 4"
        ]
      }
    ]
  }
}

Conclusion

TBD

References

https://github.com/json-schema-faker/json-schema-faker/blob/master/docs/USAGE.md#supported-keywords

https://json-schema.org/specification-links#draft-4