Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2175: Add ability to configure how spark handles dates in parquet files #2184

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

TebaleloS
Copy link
Collaborator

@TebaleloS TebaleloS commented Mar 23, 2023

Notes

  • Initialized DEFAULT_PARQUET_DATETIME_READ_MODE and DEFAULT_PARQUET_DATETIME_WRITE_MODE with a default value CORRECTED.
  • Added option condition for assessing the value of datetime read/write mode

Closes #2175

@TebaleloS TebaleloS marked this pull request as ready for review March 23, 2023 07:41
Copy link
Collaborator

@miroslavpojer miroslavpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pulled
  • code review
  • command generation test

Used command-line calls:

  • run_conformance.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run
  • run_standardization.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run
  • run_standardization_conformance.sh --rest-api-credentials-file ~/.ssh/menasCredential.properties --dataset-name test --dataset-version 1 --report-date 2020-03-03 --dry-run

Only bash files file were tested.
There is required to test last cmd one.

Copy link
Contributor

@dk1844 dk1844 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  • code reviewed
  • pulled
  • prepared dummy Enceladus env to see the spark-submit command
  • run

@dk1844
Copy link
Contributor

dk1844 commented Mar 24, 2023

I have tested both the sh and cmd script to see that they correctly pass the datetime configuration fields down to the spark-submit. I have not run the actual job.

sh

./run_standardization.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

./run_conformance.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

./run_standardization_conformance.sh ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

cmd

.\run_standardization.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY ->
spark-submit ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

.\run_conformance.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY ->
spark-submit --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ✔️

.\run_standardization_conformance.cmd ... --parquet-datetime-read-mode XX --parquet-datetime-write-mode YY -> spark-submit" ... --conf spark.sql.parquet.datetimeRebaseModeInRead=XX --conf spark.sql.parquet.datetimeRebaseModeInWrite=YY --conf spark.sql.parquet.int96RebaseModeInRead=XX --conf spark.sql.parquet.int96RebaseModeInWrite=YY ... ✔️

@dk1844 dk1844 added the PR:tested Only for PR - PR was tested by a tester (person) label Mar 24, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just read the code, given that two reviewers tested it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:tested Only for PR - PR was tested by a tester (person)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to configure how Spark handles dates in parquet files.
4 participants