Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing JSON Schema for Singer Taps and Targets #94

Open
mporracindie opened this issue Jan 31, 2025 · 0 comments
Open

Introducing JSON Schema for Singer Taps and Targets #94

mporracindie opened this issue Jan 31, 2025 · 0 comments

Comments

@mporracindie
Copy link

Background & Problem Statement

Singer.io has become a widely used standard for building modular data integration pipelines. However, one of the major challenges in working with Singer taps and targets is the lack of a structured, standardized way to define and validate their configurations.

Currently, configuration files for taps and targets are typically defined as JSON objects with no formal schema. This leads to several issues:

  • Lack of Discoverability: Users and developers often struggle to understand what parameters are required, their expected data types, and their descriptions.

  • Inconsistent Implementations: Each tap or target defines its configuration ad hoc, leading to inconsistencies across implementations.

  • Error-Prone User Experience: Without a standardized schema, misconfigurations are common, and error messages are often uninformative.

  • Limited Tooling Support: It is difficult for tooling and orchestration platforms to automatically generate user interfaces, validation mechanisms, or documentation.

Proposed Solution

To address these challenges, we propose introducing optional JSON Schema support for defining tap and target configurations. JSON Schema is a widely accepted standard for specifying structured JSON data, including validation rules, data types, descriptions, and default values.

Key Features of the Proposed Schema

  • Definition of Required and Optional Fields: Clearly specify which parameters are required and which are optional.

  • Data Type Validation: Ensure correct types (e.g., string, integer, boolean, array) for each configuration parameter.

  • Human-Readable Documentation: Allow adding descriptions to parameters, making configurations self-explanatory.

  • Default Values: Provide sensible defaults to simplify user setup.

  • Enumerations and Constraints: Define constraints on valid inputs (e.g., "log_level" can only be "debug", "info", or "error").

  • Nested Configuration Support: Support complex configurations with nested structures where applicable.

Example JSON Schema for a Tap

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Example Tap Configuration",
  "type": "object",
  "properties": {
    "api_key": {
      "type": "string",
      "description": "API key for authentication",
      "minLength": 1
    },
    "start_date": {
      "type": "string",
      "format": "date-time",
      "description": "The earliest date for data extraction"
    },
    "batch_size": {
      "type": "integer",
      "description": "Number of records per batch",
      "default": 100,
      "minimum": 1
    },
    "log_level": {
      "type": "string",
      "enum": ["debug", "info", "warning", "error"],
      "description": "Logging level for the tap",
      "default": "info"
    }
  },
  "required": ["api_key", "start_date"]
}

Benefits

  • Improved Developer Experience: New developers can easily understand tap and target configurations.

  • Better Validation & Error Handling: Schema-based validation can prevent misconfigurations before execution.

  • Tooling Integration: UI-based tools can automatically generate configuration forms.

  • Enhanced Standardization: Encourages consistency across taps and targets in the Singer ecosystem.

  • Backward Compatibility: Since this proposal introduces schema support as an optional feature initially, it does not break existing implementations.

Implementation Plan

  • Define a Standard Location: Each tap and target can include a config.schema.json file at the root of the repository.

  • Encourage Adoption in Meltano and Singer SDK: Integrate JSON Schema validation in tooling like Meltano to drive adoption.

  • Gradual Adoption: Encourage the community to start adopting JSON Schema without making it mandatory.

  • Enhance Documentation: Update Singer's official documentation with best practices for schema definition.

Conclusion

By introducing JSON Schema as an optional but recommended standard for defining tap and target configurations, we can significantly improve usability, validation, and tooling support in the Singer ecosystem. This proposal aims to foster a more robust, user-friendly, and standardized approach to configuration management for Singer-based data pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant