Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make data type mappings for fast-sync and singer configurable #389

Closed
EamonKeane opened this issue Apr 22, 2020 · 2 comments
Closed

Make data type mappings for fast-sync and singer configurable #389

EamonKeane opened this issue Apr 22, 2020 · 2 comments

Comments

@EamonKeane
Copy link

EamonKeane commented Apr 22, 2020

Thanks for the great tool, you're saving people a king's ransom vs fivetran pricing.

While I'm unfortunately not in a position to contribute, I would like to just put this feature suggestion out there. It would be useful to have a yaml input of data type mappings which is flexible and consistent across fast-sync and singer.

In my use case with mysql to snowflake for example tinyint(1) wasn't always boolean, so needed to be patched in both fast-sync and singer. Mediumint wasn't present in the list either.

@koszti
Copy link
Contributor

koszti commented Jun 27, 2020

hey @EamonKeane thanks for the feedback and sorry for the late response.

Regarding the tinyint(1) -> boolean behaviour, it was inherited from the original and forked tap-mysql singer connector. You can read more details about the decision at singer-io/tap-mysql#82 .

You're right, the custom column type mapping would be useful but it needs to be consistent across fast-sync and singer. Unfortunately making it configurable in the yaml is not trivial. That's because only fastsync has direct source-to-target datatype mappings. Singer requires two different mappings, a source-to-JSON and a JSON-to-target mappings.. and for each connector type. At the moment I don't know how we could make it configurable in YAML and what format would be the best and easy to use. Please share if you have any idea.

Regarding the missing MEDIUMINT in the list. That's a bug and we'll fix it. Thanks!

@EamonKeane
Copy link
Author

EamonKeane commented Jun 29, 2020

Thanks for the reply @koszti .

I hadn't seen that issue on tap-mysql. Good point on the multiple places it needs to be changed, sounds tricky. This might just be a one-off issue with a mysql deficiency, in which case as long as the column types remain consistent across fast-sync and singer (no versioned columns after first incremental load), it might not be worth the effort for yaml input.

I guess in theory a fast-sync flag just for mysql-tinyint could be added and two versions of pipelinewise-tap-mysql maintained if it impacted enough folks. Or else a note could just be added in the docs somewhere along the lines of:

Tinyint by convention is used to store 0/1 for mysql booleans. In keeping with the upstream singer mysql tap, pipelinewise uses this convention when creating target schemas. If your tinyint columns contain values greater than 1, this will lead to errors when loading data. To get around this, please patch the appropriate pipelinewise fast-sync target(s) and pipelinewise-tap-mysql (circa one line change to each).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants