Consume TensorFlowInferSchema from Java Spark 2.4.0 #34

yeshbash · 2021-09-04T03:38:27Z

This is not an issue and is more of a request for guidance. I'd like to just use the TensorFlowInferSchema functionality in my Java spark job. Below is the sample snippet

JavaRDD<Example> exampleRdd =
          jsc.parallelize(
              Arrays.asList(
                  Example.newBuilder()
                      .setFeatures(
                          Features.newBuilder()
                              .putFeature(
                                  "Test",
                                  Feature.newBuilder()
                                      .setInt64List(Int64List.newBuilder().addValue(10).build())
                                      .build())
                              .build())
                      .build()));
StructType schema = TensorFlowInferSchema.apply(rdd.rdd(), ???);

However, the apply method expects a second argument - implicit evidence$1 : scala.reflect.runtime.universe.TypeTag[T].

Below is the compiled Java class

def apply[T](rdd : org.apache.spark.rdd.RDD[T])(implicit evidence$1 : scala.reflect.runtime.universe.TypeTag[T]) : org.apache.spark.sql.types.StructType = { /* compiled code */ }

I'm not familiar with Scala and it appears that it's not possible to create TypeTag[Example] in java.

Appreciate if you could share your thoughts on the below

Is it safe (and possible) to just consume TensorFlowInferSchema in Java without going via spark.read.format("tfrecord").option("recordType", "Example")?
What to pass for the TypeTag[T] argument?
Is there a Java example for this use case

Thanks for maintaining this project and highly appreciate your help!

The text was updated successfully, but these errors were encountered:

junshi15 · 2021-09-04T07:12:15Z

"TensorFlowInferSchema" only infers the schema. Is that what you need?
I am not familiar with Spark Java API, but can you call scala function from Java?
If you want to read/write TFRecord with Java API, I assume something like this will work.
Dataset<Row> usersDF = spark.read().format("tfrecord").load("examples/src/main/resources/users.tfrecord"); usersDF.select("name", "favorite_color").write().format("tfrecord").save("namesAndFavColors.tfrecord");

yeshbash · 2021-09-04T14:15:32Z

Thanks for the example on loading - Yes, that should help to read tfrecord files into a DataFrame. However, as you point out, I just need automatic schema inference as we already build a JavaRDD<Example>. My understanding is Java classes can consume Scala classes and vice versa but I'm not familiar enough to understand if TensorFlowInferSchema can be invoked.
I was hoping to get some help there.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume TensorFlowInferSchema from Java Spark 2.4.0 #34

Consume TensorFlowInferSchema from Java Spark 2.4.0 #34

yeshbash commented Sep 4, 2021 •

edited

Loading

junshi15 commented Sep 4, 2021

yeshbash commented Sep 4, 2021

Consume TensorFlowInferSchema from Java Spark 2.4.0 #34

Consume TensorFlowInferSchema from Java Spark 2.4.0 #34

Comments

yeshbash commented Sep 4, 2021 • edited Loading

junshi15 commented Sep 4, 2021

yeshbash commented Sep 4, 2021

yeshbash commented Sep 4, 2021 •

edited

Loading