-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: In each record to filter and transform, publish a local service field holding the original object the record is extracted from #214
base: main
Are you sure you want to change the base?
Conversation
At record extraction step, in each record add the service field $root holding a reference to: * the root response object, when parsing JSON format * the original record, when parsing JSONL format that each record to process is extracted from. More service fields could be added in future. The service fields are available in the record's filtering and transform steps. Avoid: * reusing the maps/dictionaries produced, thus avoid building cyclic structures * transforming the service fields in the Flatten transformation. Explicitly cleanup the service field(s) after the transform step, thus making them: * local for the filter and transform steps * not visible to the next mapping and store steps (as they should be) * not visible in the tests beyond the test_record_selector (as they should be) This allows the record transformation logic to define its "local variables" to reuse some interim calculations. The contract of body parsing seems irregular in representing the cases of bad JSON, no JSON and empty JSON. Cannot be unified as that that irregularity is already used. Update the development environment setup documentation * to organize and present the setup steps explicitly * to avoid misunderstandings and wasted efforts. Update CONTRIBUTING.md to * collect and organize the knowledge on running the test locally. * state the actual testing steps. * clarify and make explicit the procedures and steps. The unit, integration, and acceptance tests in this exactly version succeed under Fedora 41, while one of them fails under Oracle Linux 8.7. not related to the contents of this PR. The integration tests of the CDK fail due to missing `secrets/config.json` file for the Shopify source. See airbytehq#197
📝 WalkthroughWalkthroughThis pull request introduces a variety of updates across the Airbyte CDK. The JSON decoding logic has been improved with enhanced error logging and simplified JSON body parsing. Several extractor modules have been extended: DpathExtractor now features new methods for updating the response body and individual records, and service key management functions were added. A new extractor, DpathEnhancingExtractor, has been introduced along with its corresponding schema, factory method, parser mappings, and tests. In addition, pagination strategies, documentation, and dependency versions have been refined and updated. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Extractor as DpathEnhancingExtractor
participant Selector as RecordSelector
Client->>Extractor: Receive HTTP response
Extractor->>Extractor: update_body() (set parent context recursively)
Extractor->>Extractor: update_record() (add root service field)
Extractor-->>Client: Yield enhanced record
Client->>Selector: Submit record for filtering and normalization
Selector->>Selector: remove_service_keys() from record
Selector-->>Client: Return normalized record
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (17)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (3)
10-14
: Small typo in the documentationThere's a typo in "transormation" -> "transformation". Otherwise, the documentation clearly explains the convention and lifecycle of service fields.
-- The service fields are kept only during the record's filtering and transormation. ++ The service fields are kept only during the record's filtering and transformation.
21-27
: Consider removing type ignore and adding return type annotationThe
# type: ignore[no-untyped-def]
seems unnecessary here. What do you think about adding a return type annotation ofNone
since the function modifies the dict in place? wdyt?-def remove_service_keys(mapping: dict[str, Any]): # type: ignore[no-untyped-def] +def remove_service_keys(mapping: dict[str, Any]) -> None:
34-35
: Consider removing type ignore and improving error messageSimilar to above, the type ignore seems unnecessary. Also, would it be helpful to include the actual mapping in the assertion message to aid debugging? wdyt?
-def assert_service_keys_exist(mapping: Mapping[str, Any]): # type: ignore[no-untyped-def] - assert mapping != exclude_service_keys(mapping), "The mapping should contain service keys" +def assert_service_keys_exist(mapping: Mapping[str, Any]) -> None: + assert mapping != exclude_service_keys(mapping), f"The mapping should contain service keys, got: {mapping}"airbyte_cdk/sources/declarative/transformations/flatten_fields.py (1)
36-44
: Consider adjusting indentation for better readabilityThe logic looks good! The service keys are correctly preserved. Would you consider adjusting the indentation of the else clause to match the if condition for better readability? wdyt?
if not is_service_key(current_key): new_key = ( f"{parent_key}.{current_key}" if (current_key in transformed_record or force_with_parent_name) else current_key ) stack.append((value, new_key)) - else: # transfer the service fields without change - transformed_record[current_key] = value + else: + # transfer the service fields without change + transformed_record[current_key] = valueunit_tests/sources/declarative/transformations/test_flatten_fields.py (1)
50-67
: Consider adding test cases for nested service keysThe test coverage for service keys looks good! Would you consider adding a test case for nested objects containing service keys to ensure they're handled correctly at any depth? For example:
( { SERVICE_KEY_PREFIX + "root": { "nested": {SERVICE_KEY_PREFIX + "child": "value"} }, "regular": "field" }, { SERVICE_KEY_PREFIX + "root": { "nested": {SERVICE_KEY_PREFIX + "child": "value"} }, "regular": "field" } )unit_tests/sources/declarative/extractors/test_record_extractor.py (2)
62-67
: Consider catching specific assertion errorThe test looks good, but would it be safer to catch the specific AssertionError instead of using a bare except? This would prevent masking other unexpected exceptions. wdyt?
try: assert_service_keys_exist(original) success = False - except: # OK, expected + except AssertionError: # OK, expected success = True
71-76
: Consider adding edge cases for service key detectionThe basic test cases look good! Would you consider adding some edge cases to ensure robust handling of special characters and edge cases? For example:
def test_service_key_edge_cases(): assert is_service_key(SERVICE_KEY_PREFIX) # just the prefix assert is_service_key(SERVICE_KEY_PREFIX + "$nested") # double prefix assert not is_service_key("prefix" + SERVICE_KEY_PREFIX) # prefix in middleairbyte_cdk/sources/declarative/decoders/json_decoder.py (1)
38-41
: Consider making stack trace logging configurable?The debug logging with
stack_info=True
provides great debugging context but might be verbose in production. What do you think about making it configurable through a parameter, wdyt?- logger.debug("Response to parse: %s", response.text, exc_info=True, stack_info=True) + logger.debug( + "Response to parse: %s", + response.text, + exc_info=True, + stack_info=self.parameters.get("debug_stack_trace", False) + )airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (2)
23-29
: Consider using dict constructor for a cleaner copy?The implementation safely handles both dict and non-dict cases. For the dict copy, what do you think about using the dict constructor for a slightly cleaner approach, wdyt?
- copy = {k: v for k, v in record.items()} + copy = dict(record)
92-109
: Consider consolidating the yield statements?The implementation handles all cases well. What do you think about consolidating the yield statements for better maintainability, wdyt?
- if isinstance(extracted, list): - for record in extracted: - yield update_record(record, root_response) - elif isinstance(extracted, dict): - yield update_record(extracted, root_response) - elif extracted: - yield extracted - else: - yield from [] + if not extracted: + return + records = [extracted] if isinstance(extracted, dict) else extracted if isinstance(extracted, list) else [extracted] + yield from (update_record(record, root_response) for record in records)airbyte_cdk/sources/declarative/extractors/record_selector.py (1)
112-113
: Consider adding a docstring to explain the service fields removal step?The addition of service fields removal before normalization is a significant change in the data processing pipeline. Would you consider adding a brief docstring to explain why this step is necessary here? wdyt?
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)
196-201
: Consider extracting the service key validation into a helper function?The service key validation logic is repeated across test cases. Would it make sense to extract this into a helper function to improve maintainability? wdyt?
+def validate_service_keys(records): + for record in records: + if record != {}: + assert_service_keys_exist(record) + return [exclude_service_keys(record) for record in records] + def test_dpath_extractor(field_path: List, decoder: Decoder, body, expected_records: List): # ... - for record in actual_records: - if record != {}: - assert_service_keys_exist(record) - - actual_records = [exclude_service_keys(record) for record in actual_records] + actual_records = validate_service_keys(actual_records)unit_tests/sources/declarative/extractors/test_record_selector.py (1)
169-171
: Consider consistent whitespace around service key operations?The whitespace around service key operations differs between test methods. Would you like to make it consistent? For example, always having one blank line before and after? wdyt?
Also applies to: 244-245
unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py (1)
325-325
: Nice addition of deduplication logic! 👍The deduplication using dictionary comprehension with orjson.dumps as keys is elegant. Would you consider extracting this into a helper function since it's used twice in the test? Something like
_dedupe_records(records)
, wdyt?+def _dedupe_records(records): + return list({orjson.dumps(record): record for record in records}.values()) # Usage: -cumulative_records_state_deduped = list( - {orjson.dumps(record): record for record in cumulative_records_state}.values() -) +cumulative_records_state_deduped = _dedupe_records(cumulative_records_state)docs/CONTRIBUTING.md (2)
97-115
: Comprehensive test execution instructions! 🎯The detailed breakdown of test commands with explanations is very helpful. Consider adding a note about expected test execution times for the different commands to help developers plan their workflow, wdyt?
🧰 Tools
🪛 LanguageTool
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~108-~108: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. -poetry run ruff format
to format your Python code. ### Run Code ...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
167-167
: Minor: Missing comma in the sentence-On the **Settings / Domains** page find the subdomain +On the **Settings / Domains** page, find the subdomain🧰 Tools
🪛 LanguageTool
[uncategorized] ~167-~167: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...(AI_HYDRA_LEO_MISSING_COMMA)
docs/RELEASES.md (1)
60-61
: Clear instructions for testing pre-release versionsThe instructions for testing pre-release versions are well-structured. Consider adding a note about the expected time for the publishing pipeline to complete, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
package-lock.json
is excluded by!**/package-lock.json
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (17)
airbyte_cdk/sources/declarative/decoders/json_decoder.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(2 hunks)airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(3 hunks)airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py
(1 hunks)airbyte_cdk/sources/declarative/transformations/flatten_fields.py
(2 hunks)docs/CONTRIBUTING.md
(5 hunks)docs/RELEASES.md
(2 hunks)package.json
(1 hunks)unit_tests/sources/declarative/decoders/test_json_decoder.py
(1 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(5 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(5 hunks)unit_tests/sources/declarative/partition_routers/test_parent_state_stream.py
(1 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(2 hunks)unit_tests/sources/declarative/test_manifest_declarative_source.py
(2 hunks)unit_tests/sources/declarative/transformations/test_flatten_fields.py
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- package.json
- unit_tests/sources/declarative/test_manifest_declarative_source.py
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~101-~101: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~108-~108: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~167-~167: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~269-~269: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~287-~287: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
docs/RELEASES.md
[grammar] ~68-~68: Did you mean the noun “publishing”?
Context: ...ng Low-Code Python connectors Once the publish pipeline has completed, set the version...
(PREPOSITION_VERB)
🪛 Markdownlint (0.37.0)
docs/CONTRIBUTING.md
140-140: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
239-239: null
Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
86-86: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (14)
airbyte_cdk/sources/declarative/decoders/json_decoder.py (1)
47-50
: Nice simplification! 👍The use of
yield from
makes the code more elegant and Pythonic. Great improvement!airbyte_cdk/sources/declarative/requesters/paginators/strategies/cursor_pagination_strategy.py (1)
73-73
: Good defensive programming! 👍Adding a default empty dict to
next()
preventsStopIteration
exceptions when the decoder yields no items. This aligns well with the JsonDecoder changes.airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (2)
12-20
: Clean implementation of service field! 👍Good use of the service key prefix convention for the root field. The implementation aligns well with the PR objectives.
88-90
: Clear empty case handling! 👍The explicit empty dict check and the explanatory comment make the contract compliance clear.
unit_tests/sources/declarative/decoders/test_json_decoder.py (2)
16-27
: Test cases align well with the new behavior! 👍The parameter rename to
expected_json
and updated test cases clearly reflect the decoder's behavior.
34-48
: Great test coverage! 👍The additional test cases and descriptive IDs improve the test suite's comprehensiveness and readability.
airbyte_cdk/sources/declarative/extractors/record_selector.py (2)
11-14
: LGTM! Clean import organization.The imports are well-organized and the new
exclude_service_keys
import aligns with the PR's objective of managing service fields.
162-167
: LGTM! Clean implementation of service keys removal.The
_remove_service_keys
method is simple, focused, and effectively uses the importedexclude_service_keys
function. The implementation aligns well with the PR's objective of making service fields local to filtering and transformation.unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)
18-21
: LGTM! Well-organized imports.The new imports for service key management functions are properly organized.
40-94
: Great test coverage for edge cases!The test cases thoroughly cover various scenarios including empty strings, objects, arrays, and nested structures. The parameterization is well-organized and the test data is clear.
unit_tests/sources/declarative/extractors/test_record_selector.py (2)
12-19
: LGTM! Clean import organization.The imports are well-organized and properly grouped.
50-79
: Great test coverage for nested records!The test cases thoroughly cover scenarios with nested records and properly validate the behavior with and without filters.
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)
247-247
: LGTM! Clear state management assertions.The assertions properly verify that the state is updated to indicate sync completion after reading records.
Also applies to: 362-362
docs/CONTRIBUTING.md (1)
17-23
: Great addition of Python version requirements!The explicit Python version requirements and Fedora-specific installation instructions make it easier for contributors to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (4)
10-14
: Documentation looks great! Would you consider adding an example? 🤔The convention is well-documented. To make it even more helpful for developers, what do you think about adding a simple example of a service field, like
$root
? wdyt?
21-27
: The implementation looks solid! Quick question about typing 🤔The
# type: ignore[no-untyped-def]
comment seems unnecessary since the function has type hints. Could we remove it? wdyt?
30-31
: How about using str.startswith() for better readability? 🤔- return key.find(SERVICE_KEY_PREFIX) == 0 + return key.startswith(SERVICE_KEY_PREFIX)Both work the same way, but
startswith()
might be more intuitive. What do you think?
34-35
: The assertion looks good! A couple of suggestions to consider 🤔
- The
# type: ignore[no-untyped-def]
seems unnecessary since we have type hints- Would you like to make the assertion message more descriptive? Maybe something like:
assert mapping != exclude_service_keys(mapping), f"The mapping should contain at least one service key with prefix '{SERVICE_KEY_PREFIX}'"What do you think?
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (3)
23-29
: The implementation looks solid! A few enhancement ideas 🤔
- Would you like to add a docstring explaining the function's purpose and return type?
- We could narrow the return type using Union:
-def update_record(record: Any, root: Any) -> Any: +def update_record(record: Any, root: Any) -> Union[dict[str, Any], Any]:What are your thoughts on these suggestions?
88-91
: The empty response handling looks good! One suggestion for clarity 🤔What do you think about making the contract more explicit in the comment? Maybe:
- # An empty/invalid JSON parsed, keep the contract + # Return empty dict for empty/invalid JSON to maintain the contract that we always yield a mapping
92-109
: The extraction logic looks solid! A couple of suggestions to consider 🤔
- Would you like to add type hints for
root_response
andextracted
?- The
elif extracted:
condition on line 106 might be more explicit aselif not isinstance(extracted, (list, dict)) and extracted:
to clearly show the intent. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(2 hunks)airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/transformations/flatten_fields.py
(2 hunks)unit_tests/sources/declarative/transformations/test_flatten_fields.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- unit_tests/sources/declarative/transformations/test_flatten_fields.py
- airbyte_cdk/sources/declarative/transformations/flatten_fields.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/extractors/record_extractor.py (1)
17-18
: Clean and efficient implementation! 👍Nice use of dictionary comprehension and type hints.
airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (1)
12-20
: Clean imports and clear constant definition! 👍Good practice using the SERVICE_KEY_PREFIX to construct RECORD_ROOT_KEY.
Still #197 is a problem. It blocks the build. |
Created a DPath Enhancing Extractor Refactored the record enhancement logic - moved to the extracted class Split the tests of DPathExtractor and DPathEnhancingExtractor Fix the failing tests: FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters] FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario] FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario] They faile because of comparing string and int values of the page_size (public) attribute. Imposed an invariant: on construction, page_size can be set to a string or int keep only values of one type in page_size for uniform comparison (convert the values of the other type) _page_size holds the internal / working value ... unless manipulated directly. Merged: feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228) fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267) feat: combine slash command jobs into single job steps (airbytehq#266) feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256) feat: add help response for unrecognized slash commands (airbytehq#264) ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263) fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(low-code): add DpathFlattenFields (airbytehq#227) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Trigger a new build. Hopefully, the integration test infrastructure is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
airbyte_cdk/sources/declarative/extractors/dpath_enhancing_extractor.py (2)
72-108
: Performance caution when reshaping large JSON objects.
Recursively constructing nested dictionaries and adding theparent
service key could be computationally expensive for very large or deeply nested JSON objects. Perhaps we could optionally disable this recursive parent assignment for known large data sets, wdyt?
135-142
: Collision avoidance with existing 'root' fields.
update_record
sets a service key named"root"
. If the record already contains a key namedroot
, this might overwrite it or create confusion. Maybe prefix it with a special character or handle collisions differently, wdyt?airbyte_cdk/sources/declarative/extractors/__init__.py (1)
21-21
: Confirm__all__
alignment with project naming conventions.
DpathEnhancingExtractor
is appended to__all__
. This is consistent, though sometimes teams prefer alphabetical order. Would you like to reorder these entries to maintain a standard format, wdyt?airbyte_cdk/sources/declarative/requesters/paginators/strategies/page_increment.py (1)
34-42
: Simplify page_size initialization further?
The updated logic is easier to follow, but the flow around lines 34–42 could be extracted into a helper method for clarity. That might remove the nestedif
s and keep it tidy, wdyt?unit_tests/sources/declarative/extractors/test_dpath_enhancing_extractor.py (1)
43-340
: Great test coverage! A few suggestions to consider.The test suite is comprehensive and well-structured. Really nice job on covering edge cases! 👍
A few minor suggestions to consider:
- Would it be helpful to add docstrings to explain the purpose of complex test cases?
- What about adding some negative test cases to verify error handling?
- Consider grouping related test cases using pytest's
@pytest.mark.group
decorator for better organizationWhat are your thoughts on these suggestions?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1824-1842
: The implementation looks good, but should we add some validation?The implementation follows the pattern of other extractors, but would it be helpful to add validation for the field path to ensure it's not empty? wdyt?
def create_dpath_enhancing_extractor( self, model: DpathEnhancingExtractorModel, config: Config, decoder: Optional[Decoder] = None, **kwargs: Any, ) -> DpathEnhancingExtractor: + if not model.field_path: + raise ValueError("field_path cannot be empty") if decoder: decoder_to_use = decoder else: decoder_to_use = JsonDecoder(parameters={})airbyte_cdk/sources/declarative/extractors/record_extractor.py (1)
29-40
: The recursive implementation looks good, but should we add a depth limit?The
exclude_service_keys
function handles nested structures well, but would it be worth adding a max depth parameter to prevent potential stack overflow with deeply nested structures? wdyt?-def exclude_service_keys(struct: Any) -> Any: +def exclude_service_keys(struct: Any, max_depth: int = 100) -> Any: """ :param struct: any object/JSON structure + :param max_depth: maximum recursion depth to prevent stack overflow :return: a copy of struct without any service fields at any level of nesting """ + if max_depth <= 0: + return struct if isinstance(struct, dict): - return {k: exclude_service_keys(v) for k, v in struct.items() if not is_service_key(k)} + return {k: exclude_service_keys(v, max_depth - 1) for k, v in struct.items() if not is_service_key(k)} elif isinstance(struct, list): - return [exclude_service_keys(v) for v in struct] + return [exclude_service_keys(v, max_depth - 1) for v in struct] else: return structairbyte_cdk/sources/declarative/extractors/dpath_extractor.py (2)
73-75
: Consider adding a docstring to explain the empty body contract?The empty body handling looks good, but adding a docstring might help future maintainers understand why we need to maintain this contract, wdyt?
98-113
: The template methods look great! Consider adding @AbstractMethod?The update_body and update_record template methods are well-documented. Since they're meant to be overridden by subclasses, would you consider marking them with @AbstractMethod to make the contract more explicit, wdyt?
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)
44-93
: Comprehensive nested data test coverage!The test cases thoroughly cover various nested data structures and extraction patterns. Would you consider adding a test case for very deep nesting to verify any potential recursion limits, wdyt?
airbyte_cdk/sources/declarative/requesters/paginators/strategies/offset_increment.py (1)
55-58
: Consider adding type validation for page_size conversion?The current implementation converts integers to strings without validation. We could add a type check to ensure page_size is either an int or str before conversion, wdyt?
- self.page_size = str(self.page_size) if isinstance(self.page_size, int) else self.page_size + if not isinstance(self.page_size, (int, str, type(None))): + raise ValueError(f"page_size must be an integer or string, got {type(self.page_size)}") + self.page_size = str(self.page_size) if isinstance(self.page_size, int) else self.page_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
airbyte_cdk/__init__.py
(2 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/extractors/__init__.py
(2 hunks)airbyte_cdk/sources/declarative/extractors/dpath_enhancing_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/dpath_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_extractor.py
(1 hunks)airbyte_cdk/sources/declarative/extractors/record_selector.py
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)airbyte_cdk/sources/declarative/requesters/paginators/strategies/offset_increment.py
(1 hunks)airbyte_cdk/sources/declarative/requesters/paginators/strategies/page_increment.py
(1 hunks)airbyte_cdk/sources/declarative/transformations/flatten_fields.py
(2 hunks)docs/CONTRIBUTING.md
(5 hunks)unit_tests/sources/declarative/extractors/test_dpath_enhancing_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_dpath_extractor.py
(4 hunks)unit_tests/sources/declarative/extractors/test_record_extractor.py
(1 hunks)unit_tests/sources/declarative/extractors/test_record_selector.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- airbyte_cdk/sources/declarative/transformations/flatten_fields.py
- airbyte_cdk/sources/declarative/extractors/record_selector.py
- docs/CONTRIBUTING.md
- unit_tests/sources/declarative/extractors/test_record_extractor.py
- unit_tests/sources/declarative/extractors/test_record_selector.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (17)
airbyte_cdk/sources/declarative/extractors/dpath_enhancing_extractor.py (2)
66-71
: Looks clean and straightforward!
The__post_init__
method delegates seamlessly to the parent class, ensuring necessary initialization logic is still applied.
109-134
: Double-check handling ofNone
and empty values.
Currently,_set_parent
filters out values if they are dictionaries or lists (lines 124-125). Ifv
can be zero, empty string, orFalse
, are we confident none of those cases should be filtered out? Might we risk losing relevant boolean or numeric fields, wdyt?airbyte_cdk/sources/declarative/extractors/__init__.py (1)
5-7
: Importing the new extractor looks good!
HavingDpathEnhancingExtractor
in a separate file helps keep the code organization clear.airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1674-1674
: LGTM! The new extractor type is correctly integrated.The addition of
DpathEnhancingExtractor
to theextractor
union type inRecordSelector
is clean and follows the existing pattern.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
80-80
: LGTM! Clean import addition.The import is correctly placed with other extractor imports.
562-562
: LGTM! Mapping is correctly added.The new extractor is properly mapped to its creation method in the factory's initialization.
airbyte_cdk/sources/declarative/extractors/record_extractor.py (4)
10-15
: LGTM! Clear convention documentation.The convention for service fields is well-documented and the prefix constant is appropriately defined.
17-27
: LGTM! Clean utility function implementation.The
add_service_key
function is well-documented and follows immutable pattern by returning a new dict.
42-44
: LGTM! Simple and focused implementation.The
is_service_key
function is clean and follows single responsibility principle.
46-49
: LGTM! Generator pattern is well-utilized.The
remove_service_keys
function efficiently uses generator pattern to process records one at a time.airbyte_cdk/sources/declarative/extractors/dpath_extractor.py (1)
77-78
: LGTM! Smart way to preserve the root response.Storing the original response before modification and passing it through the update chain enables powerful contextual transformations.
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)
39-41
: Great test coverage for edge cases!The empty input test cases effectively verify the contract compliance for empty string, object, and array inputs.
125-128
: LGTM! Clear JSONL format test cases.The JSONL-specific test cases effectively verify the handling of array records and line-based parsing.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1458-1488
: Well-structured component definition!The DpathEnhancingExtractor component is well-documented and maintains consistency with other extractors. The service fields ($root, $parent) are clearly explained.
2806-2806
: LGTM! Clean integration with RecordSelector.The addition of DpathEnhancingExtractor to the RecordSelector's extractor property enables seamless integration.
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)
50-51
: LGTM! Clean addition of DpathEnhancingExtractor mappingThe new entry follows the established pattern and correctly maps the decoder to JsonDecoder.
airbyte_cdk/__init__.py (1)
95-95
: LGTM! Clean addition of DpathEnhancingExtractor to public APIThe new extractor is properly exposed in both the imports and all list, following the module's conventions.
Also applies to: 237-237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/CONTRIBUTING.md (4)
17-17
: The installation instructions are well-organized, but could we make the Python version requirement more prominent?The Python version requirement is crucial for development. What do you think about moving it to the very top of the setup instructions and adding a note about version compatibility? Something like:
-1. Clone the CDK repo. If you will be testing connectors, you should clone the CDK into the same parent directory as `airbytehq/airbyte`, which contains the connector definitions. -1. Make sure your Python version is 3.11 +## Prerequisites + +- Python 3.11 (required) +- Poetry 2.0 or higher + +## Setup Steps + +1. Clone the CDK repo. If you will be testing connectors, you should clone the CDK into the same parent directory as `airbytehq/airbyte`, which contains the connector definitions.wdyt? 🤔
Also applies to: 26-86
111-138
: The testing section is comprehensive, but could we improve the organization?The testing instructions are detailed but could be more structured. What do you think about grouping related commands under subsections with descriptive headers? For example:
### Run Unit Tests +#### Basic Test Commands - `poetry run pytest` to run all unit tests. - `poetry run pytest -k <suite or test name>` to run specific unit tests. + +#### Performance Optimized Testing - `poetry run pytest-fast` to run the subset of PyTest tests, which are not flagged as `slow`. (It should take <5 min for fast tests only.) + +#### Advanced Options - `poetry run pytest -s unit_tests` if you want to pass pytest options.This organization might make it easier for contributors to find the specific command they need. Thoughts? 🤔
🧰 Tools
🪛 LanguageTool
[style] ~116-~116: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~123-~123: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. -poetry run ruff format
to format your Python code. ### Run Code ...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
346-352
: Consider adding a note about container cleanup.When using Docker for MockServer testing, it might be helpful to add a note about cleaning up the container after testing. What do you think about adding:
docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0 +# To stop the MockServer container when done: +docker ps | grep mockserver | awk '{print $1}' | xargs docker stopwdyt? This would help prevent orphaned containers. 🐳
100-100
: Add language specifiers to code blocks.Some code blocks are missing language specifiers. This affects syntax highlighting and markdown linting.
-``` +```yaml connector_image: airbyte/source-shopify:dev to connector_image: airbyte/source-shopify:<version>Also, remove spaces inside code spans:
-` domain: nw0ipt-vr.myshopify.com` +`domain: nw0ipt-vr.myshopify.com`Also applies to: 184-184
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
100-100: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~116-~116: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~123-~123: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~287-~287: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~305-~305: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 markdownlint-cli2 (0.17.2)
docs/CONTRIBUTING.md
100-100: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
155-155: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
184-184: Spaces inside code span elements
null
(MD038, no-space-in-code)
254-254: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
docs/CONTRIBUTING.md (1)
8-8
: LGTM! Nice improvement in capitalization.The capitalization of "GitHub" has been corrected, which aligns with proper noun conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
docs/CONTRIBUTING.md (6)
16-81
: Setup instructions could be more explicit about Python version requirements, wdyt?The Python version requirement is mentioned but could be more explicit about whether this is a minimum or exact version requirement. Consider clarifying if versions > 3.11 are also supported.
-1. Make sure your Python version is 3.11 +1. Make sure your Python version is 3.11 or higher
93-101
: Consider adding a note about version synchronization, what do you think?The instructions about changing the connector image version could benefit from a note explaining why version synchronization is important and what problems it prevents.
where the **version** comes from airbyte/airbyte-integrations/connectors/source-shopify/metadata.yaml, as -the value of the **dockerImageTag** parameter. +the value of the **dockerImageTag** parameter. This ensures version consistency between the connector image and its metadata, preventing potential compatibility issues.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
95-95: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
106-124
: The testing section looks great, but what about adding examples? 🤔The testing instructions are clear, but could be even more helpful with example commands showing typical usage patterns. For instance, showing how to run a specific test suite or how to use the coverage report.
- `poetry run pytest` to run all unit tests. +Example: +```bash +# Run a specific test suite +poetry run pytest unit_tests/test_source.py + +# Run tests with coverage report +poetry run pytest --cov=airbyte_cdk +```🧰 Tools
🪛 LanguageTool
[style] ~111-~111: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. -poetry run ruff format
to format your Python code. ### Run Code ...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
179-181
: Fix code block formatting for better readability?The domain and shop examples are using inconsistent formatting. Consider using proper code blocks for better readability.
- domain: nw0ipt-vr.myshopify.com - shop: nw0ipt-vr +```yaml +domain: nw0ipt-vr.myshopify.com +shop: nw0ipt-vr +```🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
179-179: Spaces inside code span elements
null(MD038, no-space-in-code)
339-345
: Consider adding a troubleshooting section for MockServer setup?The MockServer setup instructions are good, but common issues and their solutions could be helpful. For example, what to do if the port is already in use or if Docker can't mount the volume.
HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file. To test this, the implementer either has to change the code that defines the base URL for the Python source or update the `url_base` from low-code. With the Connector Builder running in docker, you will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed within docker. +### Troubleshooting MockServer Setup + +- If port 8113 is already in use, you can change it to another port in both the docker command and your configuration +- If you see volume mount errors, ensure the `secrets/mock_server_config` directory exists and has proper permissions +- For Windows users using WSL2, you might need to use `%cd%` instead of `$(pwd)` in the volume mount
298-298
: Minor style suggestion: consider using American English consistently?For consistency with other documentation, consider using "take a look" instead of "have a look".
-Please have a look at the [Release Management](./RELEASES.md) guide +Please take a look at the [Release Management](./RELEASES.md) guide🧰 Tools
🪛 LanguageTool
[locale-violation] ~298-~298: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(4 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~111-~111: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~280-~280: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~298-~298: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 markdownlint-cli2 (0.17.2)
docs/CONTRIBUTING.md
95-95: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
150-150: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
179-179: Spaces inside code span elements
null
(MD038, no-space-in-code)
249-249: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
661-675
: Consider adding more descriptive documentation for the new extractor.The new
DpathEnhancingExtractor
class seems to be a key component for implementing the$root
service field feature mentioned in the PR objectives. However, its purpose and behavior aren't immediately clear from the documentation. Would you consider adding:
- A description of how it enhances the extraction process
- An example showing how it adds the
$root
service field- Documentation about any limitations or considerations when using this extractor
class DpathEnhancingExtractor(BaseModel): type: Literal["DpathEnhancingExtractor"] field_path: List[str] = Field( ..., - description='List of potentially nested fields describing the full path of the field to extract. Use "*" to extract all values from an array. See more info in the [docs](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/record-selector).', + description='''List of potentially nested fields describing the full path of the field to extract. Use "*" to extract all values from an array. + + This extractor enhances each extracted record by adding a `$root` service field containing a reference to: + - The root response object when parsing JSON format + - The original record when parsing JSONL format + + This enables advanced filtering and transformation capabilities by providing access to the complete context. + Note: Service fields are local to filtering and transformation steps and are cleaned afterward. + + See more info in the [docs](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/record-selector). + ''', examples=[ ["data"], ["data", "records"], ["data", "{{ parameters.name }}"], ["data", "*", "record"], ], title="Field Path", ) parameters: Optional[Dict[str, Any]] = Field(None, alias="$parameters")airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1865-1882
: Consider reducing code duplication with create_dpath_extractor?The implementation is correct, but I notice it shares a lot of logic with
create_dpath_extractor
. Would you consider extracting the common decoder initialization and field path handling into a shared private method to reduce duplication? Something like this perhaps:+ def _create_dpath_extractor_base( + self, + decoder: Optional[Decoder], + field_path: List[str], + config: Config, + parameters: Optional[Dict[str, Any]] = None, + ) -> Tuple[Decoder, List[Union[InterpolatedString, str]]]: + decoder_to_use = decoder if decoder else JsonDecoder(parameters={}) + model_field_path: List[Union[InterpolatedString, str]] = [x for x in field_path] + return decoder_to_use, model_field_path + def create_dpath_extractor( self, model: DpathExtractorModel, config: Config, decoder: Optional[Decoder] = None, **kwargs: Any, ) -> DpathExtractor: - if decoder: - decoder_to_use = decoder - else: - decoder_to_use = JsonDecoder(parameters={}) - model_field_path: List[Union[InterpolatedString, str]] = [x for x in model.field_path] + decoder_to_use, model_field_path = self._create_dpath_extractor_base( + decoder, model.field_path, config, model.parameters + ) return DpathExtractor( decoder=decoder_to_use, field_path=model_field_path, config=config, parameters=model.parameters or {}, ) def create_dpath_enhancing_extractor( self, model: DpathEnhancingExtractorModel, config: Config, decoder: Optional[Decoder] = None, **kwargs: Any, ) -> DpathEnhancingExtractor: - if decoder: - decoder_to_use = decoder - else: - decoder_to_use = JsonDecoder(parameters={}) - model_field_path: List[Union[InterpolatedString, str]] = [x for x in model.field_path] + decoder_to_use, model_field_path = self._create_dpath_extractor_base( + decoder, model.field_path, config, model.parameters + ) return DpathEnhancingExtractor( decoder=decoder_to_use, field_path=model_field_path, config=config, parameters=model.parameters or {}, )This would make the code more maintainable and reduce the risk of divergence in the future. What do you think?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1463-1493
: New Component Addition: DpathEnhancingExtractor
The newDpathEnhancingExtractor
component is added to support the inclusion of local service fields ($root
and$parent
) for use during the filter and transform phases. This aligns nicely with the PR objective of retaining the original object in each record. Would you consider adding more detailed examples or clarifying comments on how these service fields could be utilized (and later cleaned up) to further assist connector developers? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1685-1685
: LGTM!The update to include
DpathEnhancingExtractor
in theRecordSelector
's extractor field union type is correct and necessary for the new feature.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
80-80
: LGTM! Clean import addition.The DpathEnhancingExtractor import is well-placed within the existing extractors import group.
218-220
: LGTM! Clean model mapping integration.The DpathEnhancingExtractorModel import and constructor mapping are properly integrated into the factory's model mapping system.
Also applies to: 562-562
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2808-2811
: RecordSelector Update: Inclusion of DpathEnhancingExtractor
By adding the reference toDpathEnhancingExtractor
in theRecordSelector
component’sextractor
property, the PR now enables users to select this new extractor. Could you also verify that corresponding tests cover this new behavior—ensuring that the$root
field is correctly populated and later removed during transformations? wdyt?
Trigger a new build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (23)
docs/CONTRIBUTING.md (23)
16-22
: Python Version Requirement Added:
Great job specifying that contributors must use Python 3.11. Would it be helpful to include a link to Python 3.11 release notes or a brief note on compatibility issues when using different versions? wdyt?
24-30
: Pip Installation Instructions for Fedora:
The new instructions to install pip on Fedora are clear. Would you consider adding a quick note on verifying the installed pip version (for example, by runningpip --version
) to help less experienced users? wdyt?
32-47
: Poetry Installation Details:
The alternative methods for installing Poetry (via pip and via curl) are nicely detailed. Would you consider including a brief comment on when one method might be preferable over the other or adding a link to further documentation? wdyt?
50-63
: Using the Python 3.11 Environment:
The section showing how to use a Python 3.11 environment with Poetry is very clear. Would you consider adding a troubleshooting tip (for example, what to do if the correct environment isn’t activated) to further assist contributors? wdyt?
65-65
: Cloning the CDK Repository:
The instruction to clone the CDK repo is concise. Do you think adding an examplegit clone
command or a reference link to our Git documentation might further help newcomers? wdyt?
66-70
: Installing Unit Test Prerequisites:
The instruction to runpoetry install --all-extras
is straightforward. Would you consider mentioning briefly why the--all-extras
flag is required, so that contributors understand its purpose? wdyt?
73-81
: RHEL Module Setup Instructions:
These instructions to execute commands for RHEL (or compatible OS) users are clear. Would you consider adding a short explanation of what theiptable_nat
module does and why it’s needed? wdyt?
83-87
: 'See Also' References:
The “See also” section with links (including Dagger/Podman integration and issue #197) is useful. Could you please verify that these links are current and consider removing any redundant references if necessary? wdyt?
91-97
: Editing Shopify Connector Config:
The guidance on editing the acceptance test configuration for the Shopify connector is clear. Would you consider providing a brief sample diff or additional explanation to help users understand the required changes? wdyt?🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
95-95: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
109-113
: Running Unit Tests:
The updated unit test commands are concise and useful. Would you consider varying the sentence structure slightly to avoid repeated phrasing? wdyt?🧰 Tools
🪛 LanguageTool
[style] ~111-~111: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
114-119
: Code Formatting Commands:
In this section, several sentences start similarly. Would you consider rewording a couple of these sentences for improved readability, or adding transitions between them? wdyt?🧰 Tools
🪛 LanguageTool
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. -poetry run ruff format
to format your Python code. ### Run Code ...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
126-133
: More Tools and Options:
This section is quite helpful. Would it add clarity to include a short explanation of when to use each tool before listing the commands? wdyt?
134-139
: Local Testing Against Connector Changes:
The instructions for testing locally with CDK changes are detailed. Would you consider including a brief troubleshooting tip for common issues encountered during this process? wdyt?
150-189
: Integration Test Preparation Steps:
This lengthy block is very detailed and informative. Would you consider breaking it into sub-sections (for example, “Project Setup,” “Configuration Changes,” and “File Creation”) to enhance readability for contributors? wdyt?🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
150-150: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
179-179: Spaces inside code span elements
null(MD038, no-space-in-code)
199-211
: Running Connector Tests:
The step-by-step commands for executing connector tests are clear. Would adding a note on how to interpret test outcomes or what to do when a test fails be beneficial? wdyt?
213-223
: Shopify Connector Example:
The example provided for the Shopify connector is practical. Would you consider clarifying the placeholder<connector name>
(or providing an example) so that the instructions are completely unambiguous? wdyt?
233-243
: Extended Acceptance Test Examples:
This block provides various command options (including debug and timeout). Would it be helpful to label these options explicitly (e.g., "With Debug Option", "With Timeout Option") for even easier reference? wdyt?
251-267
: airbyte-ci CLI and Build Commands:
The instructions for installing theairbyte-ci
CLI and building the connector image using the local CDK are well organized. Would you consider adding a link or further reference to detailed documentation onairbyte-ci
for contributors who want to dive deeper? wdyt?
272-276
: Working with Poe Tasks:
This section explains Poe Tasks clearly. Would you consider including a short troubleshooting tip or common pitfalls when using Poe tasks to further support contributors? wdyt?
280-285
: Auto-Generating the Declarative Schema File:
The instructions are clear, though the phrasing “you may need to regenerate them” could be more expressive. Would you consider rewording it to, for example, “please regenerate the schemas to ensure they align with the latest changes”? wdyt?🧰 Tools
🪛 LanguageTool
[style] ~280-~280: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...(DO_ACHIEVE)
297-299
: Release Management Section Wording:
In the Release Management section, the instruction “please have a look at the Release Management guide” might sound friendlier if changed to “please take a look at...”. Would you consider this minor tweak to align with American English usage? wdyt?🧰 Tools
🪛 LanguageTool
[locale-violation] ~298-~298: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
311-316
: GitHub Slash Commands Formatting:
The list of GitHub slash commands is very useful. Would you consider adjusting the formatting of the/autofix
entry—perhaps by ensuring a consistent separation between the command and its description—as done with/poetry-lock
? wdyt?
347-351
: Connector Acceptance Tests Header:
The header for “Running Connector Acceptance Tests for a single connector in Docker” is clear. Would you consider adding a brief note on any prerequisites (if not mentioned elsewhere) for running these tests? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/CONTRIBUTING.md
(4 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~111-~111: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~280-~280: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~298-~298: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 markdownlint-cli2 (0.17.2)
docs/CONTRIBUTING.md
95-95: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
150-150: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
179-179: Spaces inside code span elements
null
(MD038, no-space-in-code)
249-249: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (15)
docs/CONTRIBUTING.md (15)
8-8
: Clarify Contribution Avenues:
You updated the bullet point to include “GitHub discussions” (with proper capitalization) as a way to contribute. Would you please confirm that this revised wording aligns with our intended communication and style guidelines? wdyt?
88-88
: Docker Installation Note:
The brief note on ensuring Docker is installed locally is succinct. Is this meant to serve only as a reminder, or would adding a link to more detailed Docker installation instructions improve clarity? wdyt?
120-125
: Code Linting Instructions:
The linting commands are clearly listed and align well with our best practices. Great work here!
140-141
: GitHub Pipelines Integration Note:
The note that the GitHub pipelines run integration tests against the Shopify source is concise and useful. Thank you for the clarity!
144-148
: Assumptions Section:
The assumptions about checkout locations and project structure are clearly laid out. This adds valuable context—great job!
225-231
: Acceptance Tests Locally:
The instructions for running acceptance tests locally are straightforward and well explained. Excellent work!
245-246
: Reverting Test Changes Reminder:
The reminder to revert test changes is a nice touch to ensure cleanup after testing. Looks good to me!
247-250
: Test CDK Changes Section Header:
The header and subsequent preparation steps for testing CDK changes against connectors are clear. Thank you for the detailed guidance!🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
249-249: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
269-271
: Local CDK Build Note:
The explanation that the local CDK is injected at build time is concise and clear. Great job!
277-277
: Duplicate Ruff Configuration Paragraph:
It appears that the paragraph about the Ruff configuration is duplicated. Would you consider removing the redundant copy to improve document clarity? wdyt?
294-296
: docs-generate Task Mapping:
The explanation of how thedocs-generate
task is mapped and what it produces is clear and useful. Nice work here!
302-305
: FAQ on Maintainers:
The FAQ entry explaining who the “maintainers” are is clear and informative. No changes needed here—great job!
323-327
: MockServer Introduction:
The introduction about using MockServer in place of direct API access is clear and sets the context well. Thanks for including this helpful information!
339-343
: Docker Command for MockServer:
The Docker command provided for running MockServer is detailed and should be very useful for our testing. Everything looks good here!
345-346
: HTTP Requests Explanation:
The follow-up explanation that HTTP requests tolocalhost:8113/data
will return the defined body is clear. No issues here—great work!
At record extraction step, in each record add the service field $root holding a reference to: * the root response object, when parsing JSON format * the original record, when parsing JSONL format that each record to process is extracted from. More service fields could be added in future. The service fields are available in the record's filtering and transform steps. Avoid: * reusing the maps/dictionaries produced, thus avoid building cyclic structures * transforming the service fields in the Flatten transformation. Explicitly cleanup the service field(s) after the transform step, thus making them: * local for the filter and transform steps * not visible to the next mapping and store steps (as they should be) * not visible in the tests beyond the test_record_selector (as they should be) This allows the record transformation logic to define its "local variables" to reuse some interim calculations. The contract of body parsing seems irregular in representing the cases of bad JSON, no JSON and empty JSON. Cannot be unified as that that irregularity is already used. Update the development environment setup documentation * to organize and present the setup steps explicitly * to avoid misunderstandings and wasted efforts. Update CONTRIBUTING.md to * collect and organize the knowledge on running the test locally. * state the actual testing steps. * clarify and make explicit the procedures and steps. The unit, integration, and acceptance tests in this exactly version succeed under Fedora 41, while one of them fails under Oracle Linux 8.7. not related to the contents of this PR. The integration tests of the CDK fail due to missing `secrets/config.json` file for the Shopify source. See airbytehq#197 Polish Integrate the DpathEnhancingExtractor in the UI of Airbyte. Created a DPath Enhancing Extractor Refactored the record enhancement logic - moved to the extracted class Split the tests of DPathExtractor and DPathEnhancingExtractor Fix the failing tests: FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters] FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario] FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario] They faile because of comparing string and int values of the page_size (public) attribute. Imposed an invariant: on construction, page_size can be set to a string or int keep only values of one type in page_size for uniform comparison (convert the values of the other type) _page_size holds the internal / working value ... unless manipulated directly. Merged: feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228) fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267) feat: combine slash command jobs into single job steps (airbytehq#266) feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256) feat: add help response for unrecognized slash commands (airbytehq#264) ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263) fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(low-code): add DpathFlattenFields (airbytehq#227) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237) formatted Update record_extractor.py Trigger a new build. Hopefully, the integration test infrastructure is fixed. Update CONTRIBUTING.md Trigger a new build
docs/CONTRIBUTING.md unit_tests/sources/declarative/transformations/test_flatten_fields.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (14)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
33-35
: LGTM! Would you consider adding a docstring? 🤔The enhanced function signature with default values looks great! For better maintainability, maybe we could add a docstring explaining the parameters? wdyt?
def generate_csv( encoding: str = "utf-8", delimiter: str = ",", should_compress: bool = False ) -> bytes: + """Generate CSV test data with configurable encoding, delimiter and compression. + + Args: + encoding (str, optional): Character encoding. Defaults to "utf-8". + delimiter (str, optional): CSV field separator. Defaults to ",". + should_compress (bool, optional): Whether to gzip compress output. Defaults to False. + + Returns: + bytes: Generated CSV data, optionally compressed + """
180-200
: Great test coverage! Maybe we could make the test data DRY? 🤔The parameterized test looks excellent! One tiny suggestion: since the expected data is identical to what's in
generate_csv
, would you consider extracting it to a shared constant to avoid duplication? wdyt?+TEST_CSV_DATA = [ + {"id": "1", "name": "John", "age": "28"}, + {"id": "2", "name": "Alice", "age": "34"}, + {"id": "3", "name": "Bob", "age": "25"}, +] + def generate_csv( encoding: str = "utf-8", delimiter: str = ",", should_compress: bool = False ) -> bytes: - data = [ - {"id": "1", "name": "John", "age": "28"}, - {"id": "2", "name": "Alice", "age": "34"}, - {"id": "3", "name": "Bob", "age": "25"}, - ] + data = TEST_CSV_DATA ... def test_composite_raw_decoder_csv_parser_values(requests_mock, encoding: str, delimiter: str): ... - expected_data = [ - {"id": "1", "name": "John", "age": "28"}, - {"id": "2", "name": "Alice", "age": "34"}, - {"id": "3", "name": "Bob", "age": "25"}, - ] + expected_data = TEST_CSV_DATAdocs/CONTRIBUTING.md (12)
109-120
: Local Development – Unit Test Commands
The “Run Unit Tests” section now provides several command options (including using pytest’s -k and -s flags), which is very useful for different testing situations. Might adding a one-line explanation for when to use each option assist less experienced contributors? wdyt?🧰 Tools
🪛 LanguageTool
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. -poetry run pytest-fast
to run the subset of PyTest tests, which a...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
131-135
: Additional Tools and Options Reference
The “More tools and options” section offers quick access to useful commands. Would linking to a more detailed description of each tool be beneficial, or is brevity preferred here? wdyt?
143-148
: Assumptions and Pipeline Use
The note that GitHub pipelines run the integration tests along with clear assumptions about directory structure provides valuable context. Have you considered adding a simple diagram to visually represent this structure? wdyt?
172-180
: Configuring Secrets for Integration Tests
The detailed instructions—complete with an example for the Shopify connector—for creating thesecrets/config.json
file are very practical. Would you consider revising the formatting of the in-line example (e.g., extra indentation) for clarity? wdyt?
198-209
: Running Connector Test Commands
The “Steps:” section with detailed commands for running connector tests is comprehensive. Do you think it would help to annotate which command verifies which aspect of the connector (spec, check, discover, read)? wdyt?
211-220
: Shopify Connector Test Example
The example provided for the Shopify source (including running spec, check, discover, and read commands) is clear and instructive. Would a note on expected outputs help ensure users know when a test passes? wdyt?
229-238
: Extended Acceptance Test Options
Providing examples with debug and timeout options is highly useful for troubleshooting. Would it be beneficial to include a brief explanation of when to use the timeout option versus the debug option? wdyt?
242-257
: Testing CDK Changes Against Connectors
The detailed instructions using theairbyte-ci
CLI for building connector images and running tests are very well defined. Would a pointer to common errors during the build/test process be useful for contributors? wdyt?🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
256-256: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
273-279
: Generating API Reference Documentation
The section on generating API docs using pdoc is detailed and helpful. Should we perhaps mention how to preview the generated docs locally if issues arise? wdyt?
280-282
: Release Management Guidance
The directive to “have a look” at the Release Management guide is clear. Given regional language preferences, would you consider revising it to “Please take a look at…” for consistency with American English? wdyt?
283-299
: FAQ Section Clarity and Consistency
The FAQ is well structured and covers key questions. A couple of suggestions:
• Consider varying the starting words of consecutive sentences to enhance readability.
• The bullet for “/autofix” could use consistent spacing (e.g., adding a space after/autofix
).
Would these refinements help improve the flow? wdyt?🧰 Tools
🪛 LanguageTool
[style] ~287-~287: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...(DO_ACHIEVE)
302-334
: Advanced Topics and Appendix
The advanced topics—covering using MockServer and running connector acceptance tests in Docker—are comprehensive and provide useful real-world guidance. Would including a brief troubleshooting checklist or common pitfalls in these sections further assist contributors? wdyt?🧰 Tools
🪛 LanguageTool
[locale-violation] ~305-~305: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....(HAVE_A_LOOK)
[uncategorized] ~317-~317: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...What GitHub slash commands are available and who can run them? Only Airbyte CDK mai...(COMMA_COMPOUND_SENTENCE_2)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/pypi_publish.yml
(1 hunks)airbyte_cdk/sources/declarative/auth/oauth.py
(2 hunks)airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py
(1 hunks)docs/CONTRIBUTING.md
(4 hunks)unit_tests/sources/declarative/auth/test_oauth.py
(3 hunks)unit_tests/sources/declarative/decoders/test_composite_decoder.py
(2 hunks)unit_tests/sources/declarative/retrievers/test_simple_retriever.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/pypi_publish.yml
🧰 Additional context used
🪛 LanguageTool
docs/CONTRIBUTING.md
[style] ~118-~118: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... unit tests. - poetry run pytest-fast
to run the subset of PyTest tests, which a...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~125-~125: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ting issues. - poetry run ruff format
to format your Python code. ### Run Code ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~184-~184: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~287-~287: Consider a more expressive alternative.
Context: ...or, you may need to regenerate them. To do that, you can run: ```bash poetry run ...
(DO_ACHIEVE)
[locale-violation] ~305-~305: In American English, “take a look” is more commonly used.
Context: ...Actions. ## Release Management Please have a look at the [Release Management](./RELEASES....
(HAVE_A_LOOK)
🪛 markdownlint-cli2 (0.17.2)
docs/CONTRIBUTING.md
102-102: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
157-157: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
186-186: Spaces inside code span elements
null
(MD038, no-space-in-code)
256-256: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (32)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (2)
257-260
: LGTM! The state completion assertion looks good.The added assertion verifies that the state correctly indicates sync completion after reading all records. This aligns well with the PR objective of enhancing record filtering and transformation.
372-375
: LGTM! The state completion assertion looks good.The added assertion verifies that the state correctly indicates sync completion after reading all records. This aligns well with the PR objective of enhancing record filtering and transformation.
unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
37-39
: Nice improvements to the test data and CSV generation! 👍The changes look great:
- Using strings for all fields is more CSV-friendly
- Clean separation of compression logic
- Configurable delimiter enhances test coverage
Also applies to: 42-43, 48-53
airbyte_cdk/sources/declarative/auth/oauth.py (2)
234-237
: LGTM! Nice defensive programming.The addition of the access token initialization check before returning the expiry date helps prevent potential issues with uninitialized tokens.
239-241
: LGTM! Clean and focused helper method.The method follows the single responsibility principle and has a clear, descriptive name.
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)
264-266
: LGTM! Good edge case handling.The addition of the empty value check with token expiry validation provides a graceful fallback mechanism.
unit_tests/sources/declarative/auth/test_oauth.py (2)
317-344
: LGTM! Great test coverage for the new functionality.The test case thoroughly verifies the behavior when no access token is present but the expiry date is in the future.
304-304
: LGTM! Good test enhancement.Adding the access token value parameter improves the test coverage.
docs/CONTRIBUTING.md (24)
8-8
: Refine Contribution Instructions
The updated bullet now explicitly lists “reporting bugs, posting GitHub discussions, opening issues…” which broadens the contribution options. Was this expansion intended to better guide potential contributors? wdyt?
16-17
: Specify Python Version Requirement
Introducing “1. Make sure your Python version is 3.11” establishes a clear baseline. Have you considered whether contributors using other versions might need additional guidance or troubleshooting steps? wdyt?
18-23
: Fedora Instructions for Python Installation
Including “Fedora 41:” with a dedicated code block to install Python 3.11 is very helpful for users on that platform. Did you verify these commands on recent Fedora releases? wdyt?
24-31
: Pip Installation Guidance on Fedora
The step “2. Install pip” along with Fedora-specific commands provides clear direction. Would you consider adding a brief note on what to do if the pip package name or command changes in future Fedora releases? wdyt?
32-47
: Multiple Methods for Installing Poetry
The section now offers two alternatives for installing Poetry (via pip and via curl) as well as a Fedora-specific block. This flexibility is great. Would you like to mention which method is preferred in our official recommendations, or should we let users choose freely? wdyt?
48-49
: Poe Tasks Note Clarity
The note explaining that “Poe” tasks can be used for common actions is informative. Is the linked “Poe the Poet” documentation up to date with these instructions? wdyt?
50-63
: Guidance on Using the Python 3.11 Environment
The clear instructions and code block for verifying and switching the Poetry environment to Python 3.11 will help prevent environment mismatches. Would you consider adding a brief troubleshooting tip if the command fails? wdyt?
65-70
: Repository Cloning and Test Prerequisites
The instructions to clone the CDK repo and install the unit test prerequisites are neatly detailed. Do you think a note on branch or version management could further assist contributors? wdyt?
74-88
: RHEL Module Loading Instructions
The steps for RHEL (or compatible OS) to load the IP table NAT module—with both a temporary and persistent solution—are very clear. Should we add a reminder that these commands typically require root privileges? wdyt?
92-93
: Additional Reference Links
The “See also:” links to the Dagger-Podman integration and CDK Issue 197 offer useful extra context. Do you think any further annotations are needed, or is the current level of detail sufficient? wdyt?
95-98
: Docker Installation Recommendation
Emphasizing that Docker should be installed (instead of Podman) is a strong clarification. Is this recommendation documented elsewhere as well, or should we consolidate related notes here? wdyt?
100-107
: Acceptance Test Config Update
The instructions for editing the Shopify acceptance-test configuration—with the before/after code block—are precise. Would it help to remind users to verify that no other fields are unintentionally modified during this update? wdyt?🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
102-102: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
121-124
: Code Formatting Instructions
The step-by-step formatting commands using ruff (both checking and auto-fixing) are clear and concise. Do these instructions match our latest formatting guidelines closely? wdyt?
126-129
: Linting and Type-checking Guidance
Including commands for both Poe-based linting and mypy type-checking provides thorough coverage of code quality. Are there any additional common pitfalls we should warn about in this section? wdyt?
137-141
: Integration Testing Scenarios
The description of various scenarios for running integration tests (e.g., running a connector that uses the new feature) is very practical. Would it be helpful to explicitly list prerequisites for these tests? wdyt?
150-156
: Acceptance Test Preparation
The “Preparation steps” for running connector acceptance tests (including entering the correct directory and installing dependencies) are straightforward. Would a brief troubleshooting tip (e.g., common errors) be useful here? wdyt?
158-163
: Updating Test Connector Configuration
The instructions to modify the<test connector>/pyproject.toml
file and replace the airbyte_cdk line are clear. Do you think it’s necessary to mention potential version mismatches during this update? wdyt?
165-170
: Reinstalling airbyte_cdk for Testing
The step-by-step commands for reinstalling airbyte_cdk in the test connector directory are concise. Are there any caching issues or additional cleanup steps we might need to warn about? wdyt?
183-192
: Example JSON Configuration for Secrets
The JSON snippet is well-formatted and serves as a good reference for contributors. Do you think adding comments within the JSON (if permitted) would further aid understanding? wdyt?🧰 Tools
🪛 LanguageTool
[uncategorized] ~184-~184: Possible missing comma found.
Context: ...file. - On the Settings / Domains page find the subdomain of myshopify.com a...(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
186-186: Spaces inside code span elements
null(MD038, no-space-in-code)
194-196
: Reference Links for Testing Resources
Including links to the Acceptance Tests Reference and Connector Acceptance Tests documentation adds valuable context. Would you like to include a brief summary of what each link covers? wdyt?
222-227
: Local Acceptance Test Execution
The command block for running acceptance tests locally is straightforward. Do you feel that adding information on environment prerequisites (e.g., necessary Docker settings) might prevent future confusion? wdyt?
240-240
: Test Changes Reversion Reminder
A simple reminder to revert test changes is a neat touch to keep the repository clean. Does that cover all scenarios, or should we expand on how to revert if using version control? wdyt?
258-263
: Working with Poe Tasks
The explanation regarding the use of Poe tasks (and the option to pre-install Poe via pipx) is clear and friendly. Do you think a short example of a commonly used Poe command might further enhance this section? wdyt?
264-272
: Auto-Generating the Declarative Schema File
The instructions for regenerating the schema file from the YAML spec are concise and easy to follow. Would linking to an example of a schema change be helpful for context? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1882-1899
: Consider reducing code duplication withcreate_dpath_extractor
?The implementation is almost identical to
create_dpath_extractor
. What do you think about extracting the common logic into a private helper method to reduce duplication? Something like this, wdyt?+ def _create_dpath_extractor_base( + self, + decoder: Optional[Decoder], + model_field_path: List[Union[InterpolatedString, str]], + config: Config, + parameters: Optional[Dict[str, Any]], + extractor_class: Type[Union[DpathExtractor, DpathEnhancingExtractor]] + ) -> Union[DpathExtractor, DpathEnhancingExtractor]: + decoder_to_use = decoder if decoder else JsonDecoder(parameters={}) + return extractor_class( + decoder=decoder_to_use, + field_path=model_field_path, + config=config, + parameters=parameters or {}, + ) + def create_dpath_extractor( self, model: DpathExtractorModel, config: Config, decoder: Optional[Decoder] = None, **kwargs: Any, ) -> DpathExtractor: - if decoder: - decoder_to_use = decoder - else: - decoder_to_use = JsonDecoder(parameters={}) model_field_path: List[Union[InterpolatedString, str]] = [x for x in model.field_path] - return DpathExtractor( - decoder=decoder_to_use, - field_path=model_field_path, - config=config, - parameters=model.parameters or {}, - ) + return self._create_dpath_extractor_base( + decoder=decoder, + model_field_path=model_field_path, + config=config, + parameters=model.parameters, + extractor_class=DpathExtractor + ) def create_dpath_enhancing_extractor( self, model: DpathEnhancingExtractorModel, config: Config, decoder: Optional[Decoder] = None, **kwargs: Any, ) -> DpathEnhancingExtractor: - if decoder: - decoder_to_use = decoder - else: - decoder_to_use = JsonDecoder(parameters={}) model_field_path: List[Union[InterpolatedString, str]] = [x for x in model.field_path] - return DpathEnhancingExtractor( - decoder=decoder_to_use, - field_path=model_field_path, - config=config, - parameters=model.parameters or {}, - ) + return self._create_dpath_extractor_base( + decoder=decoder, + model_field_path=model_field_path, + config=config, + parameters=model.parameters, + extractor_class=DpathEnhancingExtractor + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
Implement airbytehq/airbyte#50395 discussed in airbytehq/airbyte#49971
Implemented a new DpathEnhancingExtractor that navigates through a path of field names to reach the records to extract (exactly as the DpathExtractor does) and:
Published the DpathEnhancingExtractor to the Builder UI.
These service fields are accessible in the filter and transform steps, after which they are removed from the records.
This allows the filter and transformation steps to navigate to fields and structures that are left out of the selected records.
Examples:
More service fields could be added in the future.
The service fields are available in the record's filtering and transform steps.
This design avoids:
Explicitly clean the service field(s) after the transform step makes the service fields:
This allows the record transformation logic to define its "local variables" to reuse
some interim calculations.
The contract of body parsing seems irregular in representing the cases of bad JSON, no JSON and empty JSON.
It cannot be unified as that irregularity is already used.
Updated the development environment setup documentation
Updated CONTRIBUTING.md to:
The unit, integration, and acceptance tests in this exact version fail due to pipeline configuration issues, not related to the change themselves.
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Documentation
Tests
Chores