Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tweetset_loader.py to use Spark DataFrame API #128

Closed
dolsysmith opened this issue Jul 28, 2021 · 1 comment · Fixed by #138
Closed

Refactor tweetset_loader.py to use Spark DataFrame API #128

dolsysmith opened this issue Jul 28, 2021 · 1 comment · Fixed by #138

Comments

@dolsysmith
Copy link
Contributor

References #122, #84, #117

  • Upgrading to Spark 2.4
  • Using Spark SQL to transform Tweet JSON for Elasticsearch indexing
  • Using Spark SQL to create full extracts at time of load
@dolsysmith dolsysmith added this to the 2.2 milestone Jul 28, 2021
@dolsysmith
Copy link
Contributor Author

Upgrading to Spark 3 in order to accommodate Python 3.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant