-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YtTableSparkSettings.StringToUtf8 #26
YtTableSparkSettings.StringToUtf8 #26
Conversation
// There may be a bug when user explicitly specifies schema for YTsaurus Interval type, which doesn't support | ||
// arrow yet, in this case we can suggest to use spark.read.disableArrow option | ||
if (ytType.arrowSupported) { | ||
if (WriteSchemaConverter().ytLogicalTypeV3(field.dataType).arrowSupported) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the YtPartitionReaderFactory
would be a better place for this logic, because the YT-types of source would already be known there.
import tech.ytsaurus.spyt.types.YTsaurusTypes | ||
import tech.ytsaurus.ysontree.{YTree, YTreeNode} | ||
|
||
case class WriteSchemaConverter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you write it as case class and not just as ordinary class? I think we're not going to use it in pattern matching or it's other Product features.
I'd rather write it as an ordinary class and extract all overloaded constructors to apply(...) methods of the companion object
@@ -272,6 +272,72 @@ class SchemaConverterTest extends AnyFlatSpec with Matchers | |||
|
|||
} | |||
|
|||
it should "write string by default" in { | |||
withConf(SparkYtConfiguration.Schema.ForcingNullableIfNoMetadata, false){ | |||
val df_0 = spark.createDataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is named in python underscore style df_0
, and not just df
?
import tech.ytsaurus.ysontree.YTreeNode | ||
|
||
case class TestTableSettings(schema: StructType, | ||
isDynamic: Boolean = false, | ||
sortOption: SortOption = Unordered, | ||
writeSchemaHint: Map[String, YtLogicalType] = Map.empty, | ||
otherOptions: Map[String, String] = Map.empty) extends YtTableSettings { | ||
override def ytSchema: YTreeNode = SchemaConverter.ytLogicalSchema(schema, sortOption, writeSchemaHint) | ||
override def ytSchema: YTreeNode = WriteSchemaConverter(Map.empty, typeV3Format = false).ytLogicalSchema(schema, sortOption) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you explicitly specify constructor parameters when you have defaults of the same values?
@@ -28,10 +26,9 @@ class YtDynamicTableWriter(richPath: YPathEnriched, | |||
|
|||
override val path: String = richPath.toStringPath | |||
|
|||
private val schemaHint = options.ytConf(WriteSchemaHint) | |||
private val typeV3Format = options.ytConf(WriteTypeV3) | |||
private val writeSchemaConverter = new WriteSchemaConverter(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should revisit all occurences of WriteSchemaConverter creation, because in come cases you create it via new, or via apply, sometimes you explicitly specify default values of arguments, sometimes not, and so on
764c12d
to
25a4234
Compare
25a4234
to
83e2039
Compare
@alextokarew has imported your pull request. If you are a Yandex employee, you can view this diff. |
✅ This pull request is being closed because it has been successfully merged into our internal monorepository. |
`WriteSchemaConverter` was extracted from `SchemaConverter`. A couple tests of the new feature were added to the `SchemaConverterTest`. --- Pull Request resolved: #26 commit_hash:94a3511e9d498ec49de3043f440902a9bcd232c0
WriteSchemaConverter
was extracted fromSchemaConverter
.A couple tests of the new feature were added to the
SchemaConverterTest
.