Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparksql engine spec #244

Merged
merged 3 commits into from
Aug 9, 2019

Conversation

se7entyse7en
Copy link
Contributor

@se7entyse7en se7entyse7en commented Aug 8, 2019

Closes #227.

By connecting to gsc through sparksql://gsc:10000/default, sourced-ui is currently using BaseEngineSpec because that engine is the fallback for databases whose backend doesn't have a corresponding engine spec. In this case the backend for the gitbase Database is sparksql that doesn't have a corresponding engine spec. All the engines specs are registered here.

This PR adds a SparkSQLEngineSpec that inherits from HiveEngineSpec and sets the correct engine name.

Using the SparkSQLEngineSpec fixes the bug #227, but broke the listing of the available views in the SQL Editor tab. These views are listed by calling the get_views_name name of the corresponding engine spec, but in this case, it's shadowed by the inherited classes (more details here). This has been fixed by redefining the get_views_name class method in HiveEngineSpec.

Additionally, the SparkSQL dialect doesn't support passing charset (nor presto, nor hive), but it shouldn't be required as thrift server uses utf8 by default.

Copy link
Contributor

@smacker smacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Copy link
Contributor

@carlosms carlosms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Let's also add an entry to the changelog, please.

@carlosms
Copy link
Contributor

carlosms commented Aug 9, 2019

It looks like some hive spec tests are now failing in travis

@se7entyse7en
Copy link
Contributor Author

se7entyse7en commented Aug 9, 2019

Okay, I took a look at the test and it is expected to fail due to the change made. We have then two options:

  1. fix the test,
  2. implement the get_views_name function only in SparkSQLEngineSpec.

The second option also reduces our changes against upstream. What do you think is better?

@carlosms
Copy link
Contributor

carlosms commented Aug 9, 2019

I don't have a strong preference, both work for our purposes. Maybe 2 will make it easier to merge new changes from upstream until this lands in their repo. At the same time going with 1 can be better if it will be reused later to send a PR upstream.

I'd like to hear you opinion too @smacker.

@smacker
Copy link
Contributor

smacker commented Aug 9, 2019

  1. It's not clear for me why tests are failing. If you want to implement get_view_names only for SparkSQLEngineSpec the method should be defined there. If you think this change should be made in HiveEngineSpec, the tests must be updated.
  2. I have don't have deep knowledge in hive. From my very lame view, it looks like we need to push support for sparksql, not changes in hive engine. But they are going to stay in our fork because there is no python driver for sparksql (except our fork again).

@se7entyse7en
Copy link
Contributor Author

It's not clear for me why tests are failing. If you want to implement get_view_names only for SparkSQLEngineSpec the method should be defined there. If you think this change should be made in HiveEngineSpec, the tests must be updated.

The tests are failing because of the get_view_names method returning an empty list for HiveEngineSpec without using the passed arguments. In the test, these arguments are mocks that are not expected to be used, but with these changes, they're being used.

For our purposes we can simply put it in SparkSQLEngineSpec, that reason why I put it in HiveEngineSpec is that it is actually available also for the Hive dialect and our SparkSQL dialect uses that same function as it inherits from it.

I'll just put the method in the SparkSQLEngineSpec to simplify things.

Signed-off-by: Lou Marvin Caraig <[email protected]>
Signed-off-by: Lou Marvin Caraig <[email protected]>
Signed-off-by: Lou Marvin Caraig <[email protected]>
@se7entyse7en se7entyse7en force-pushed the add-sparksql-engine-spec branch from 35ffa68 to 38459a4 Compare August 9, 2019 13:13
@se7entyse7en
Copy link
Contributor Author

se7entyse7en commented Aug 9, 2019

I force-pushed the changes.

@se7entyse7en se7entyse7en merged commit c8ff53b into src-d:master Aug 9, 2019
@se7entyse7en se7entyse7en deleted the add-sparksql-engine-spec branch August 9, 2019 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp type not mapped correctly from Spark SQL to Hive
3 participants