Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayType(ArrayType(DoubleType,true),true) Not Supported When Writing Dataframe to TFRecords #56

Open
xemcerk opened this issue Jul 29, 2022 · 2 comments

Comments

@xemcerk
Copy link

xemcerk commented Jul 29, 2022

Hi, I try to write my dataframe to tfrecords but encounter the error, log is as below

Caused by: java.lang.RuntimeException: Cannot convert field to unsupported data type ArrayType(ArrayType(DoubleType,true),true) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$.org$tensorflow$spark$datasources$tfrecords$serde$DefaultTfRecordRowEncoder$$encodeFeature(DefaultTfRecordRowEncoder.scala:144) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$$anonfun$encodeExample$1.apply(DefaultTfRecordRowEncoder.scala:64) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$$anonfun$encodeExample$1.apply(DefaultTfRecordRowEncoder.scala:61) at scala.collection.immutable.List.foreach(List.scala:392) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$.encodeExample(DefaultTfRecordRowEncoder.scala:61) at org.tensorflow.spark.datasources.tfrecords.DefaultSource$$anonfun$2.apply(DefaultSource.scala:59) at org.tensorflow.spark.datasources.tfrecords.DefaultSource$$anonfun$2.apply(DefaultSource.scala:56) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) ... 10 more

I presume such feature is already supported though it's not specifically addressed in README

@junshi15
Copy link
Contributor

I have never tested ArrayType(ArrayType(DoubleType). I think FloatType should work.
You can try to cast to Float first.
Make sure you use SequenceExamples.

@xemcerk
Copy link
Author

xemcerk commented Jul 30, 2022

Thank you for your timely reply, I will try then. Now my work around is to save it as string, and parse it with tf.strings functions,seems not a big overhead so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants