You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
try this: option("codec","org.apache.hadoop.io.compress.GzipCodec")
I use this method, data.repartition(50).write.mode("overwrite").format('tfrecords').option("codec", "org.apache.hadoop.io.compress.GzipCodec").save(path), but the file seems not to be small. the option did not take effect.
See similar git issues here:--
tensorflow/ecosystem#61 (comment)
tensorflow/ecosystem#61
tensorflow/ecosystem#106
This how I'm writing a PySpark dataframe to tf-records to an S3 bucket:---
This creates a new key/"directory" on S3 with the following path : s3://Shuks/dataframe_tf_records/
And under this directory are all the tf-records.
How do I specify compression type during conversion?
The text was updated successfully, but these errors were encountered: