Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YtTableSparkSettings.StringToUtf8 #26

Closed

Conversation

faucct
Copy link
Collaborator

@faucct faucct commented Sep 23, 2024

WriteSchemaConverter was extracted from SchemaConverter.
A couple tests of the new feature were added to the SchemaConverterTest.

// There may be a bug when user explicitly specifies schema for YTsaurus Interval type, which doesn't support
// arrow yet, in this case we can suggest to use spark.read.disableArrow option
if (ytType.arrowSupported) {
if (WriteSchemaConverter().ytLogicalTypeV3(field.dataType).arrowSupported) {
Copy link
Collaborator Author

@faucct faucct Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the YtPartitionReaderFactory would be a better place for this logic, because the YT-types of source would already be known there.

@faucct faucct requested a review from alextokarew September 23, 2024 15:10
import tech.ytsaurus.spyt.types.YTsaurusTypes
import tech.ytsaurus.ysontree.{YTree, YTreeNode}

case class WriteSchemaConverter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you write it as case class and not just as ordinary class? I think we're not going to use it in pattern matching or it's other Product features.

I'd rather write it as an ordinary class and extract all overloaded constructors to apply(...) methods of the companion object

@@ -272,6 +272,72 @@ class SchemaConverterTest extends AnyFlatSpec with Matchers

}

it should "write string by default" in {
withConf(SparkYtConfiguration.Schema.ForcingNullableIfNoMetadata, false){
val df_0 = spark.createDataFrame(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is named in python underscore style df_0, and not just df?

import tech.ytsaurus.ysontree.YTreeNode

case class TestTableSettings(schema: StructType,
isDynamic: Boolean = false,
sortOption: SortOption = Unordered,
writeSchemaHint: Map[String, YtLogicalType] = Map.empty,
otherOptions: Map[String, String] = Map.empty) extends YtTableSettings {
override def ytSchema: YTreeNode = SchemaConverter.ytLogicalSchema(schema, sortOption, writeSchemaHint)
override def ytSchema: YTreeNode = WriteSchemaConverter(Map.empty, typeV3Format = false).ytLogicalSchema(schema, sortOption)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you explicitly specify constructor parameters when you have defaults of the same values?

@@ -28,10 +26,9 @@ class YtDynamicTableWriter(richPath: YPathEnriched,

override val path: String = richPath.toStringPath

private val schemaHint = options.ytConf(WriteSchemaHint)
private val typeV3Format = options.ytConf(WriteTypeV3)
private val writeSchemaConverter = new WriteSchemaConverter(options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should revisit all occurences of WriteSchemaConverter creation, because in come cases you create it via new, or via apply, sometimes you explicitly specify default values of arguments, sometimes not, and so on

@faucct faucct requested a review from alextokarew September 24, 2024 08:28
@faucct faucct force-pushed the feature/YtTableSparkSettings.StringToUtf8 branch from 764c12d to 25a4234 Compare September 24, 2024 10:59
@faucct faucct force-pushed the feature/YtTableSparkSettings.StringToUtf8 branch from 25a4234 to 83e2039 Compare September 24, 2024 10:59
Copy link

robot-magpie bot commented Sep 24, 2024

@alextokarew has imported your pull request. If you are a Yandex employee, you can view this diff.

Copy link

robot-magpie bot commented Sep 24, 2024

✅ This pull request is being closed because it has been successfully merged into our internal monorepository.
Your changes will be pushed to this repository soon. Thank you for your contribution!

@robot-magpie robot-magpie bot closed this Sep 24, 2024
robot-piglet pushed a commit that referenced this pull request Sep 24, 2024
`WriteSchemaConverter` was extracted from `SchemaConverter`.
A couple tests of the new feature were added to the `SchemaConverterTest`.

---

Pull Request resolved: #26
commit_hash:94a3511e9d498ec49de3043f440902a9bcd232c0
@faucct faucct deleted the feature/YtTableSparkSettings.StringToUtf8 branch September 25, 2024 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants