Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable datagen fallbacking with full set of options #3640

Merged
merged 87 commits into from
Aug 10, 2023
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Jul 5, 2023

Toward #2683

@sffc sffc requested review from dminor and zbraniecki as code owners July 5, 2023 15:45
@sffc sffc requested review from robertbastian and removed request for dminor and zbraniecki July 5, 2023 15:45
@sffc
Copy link
Member Author

sffc commented Jul 5, 2023

The errors are because I need to gate this on feature = "alloc"

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me how this is useful for both us and general users. Maybe you can just include this in the datagen PR?

components/locid_transform/src/fallback/mod.rs Outdated Show resolved Hide resolved
components/locid_transform/src/fallback/mod.rs Outdated Show resolved Hide resolved
@sffc
Copy link
Member Author

sffc commented Jul 5, 2023

OK I'll not merge this separately but continue working on the deduplication in this PR.

@sffc sffc marked this pull request as draft July 5, 2023 16:26
@sffc sffc changed the title Add pub fn sort_locales_into_groups Enable datagen fallbacking with full set of options Jul 25, 2023
@sffc
Copy link
Member Author

sffc commented Jul 25, 2023

The two most recent commits don't change behavior but have most of the new configuration options and docs.

provider/datagen/src/options.rs Outdated Show resolved Hide resolved
provider/core/src/datagen/mod.rs Outdated Show resolved Hide resolved
provider/core/src/datagen/mod.rs Outdated Show resolved Hide resolved
provider/core/src/datagen/mod.rs Show resolved Hide resolved
provider/core/src/datagen/mod.rs Outdated Show resolved Hide resolved
provider/datagen/src/options.rs Outdated Show resolved Hide resolved
provider/core/src/datagen/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, not approving my own changes though

provider/datagen/src/options.rs Show resolved Hide resolved
provider/datagen/src/source.rs Outdated Show resolved Hide resolved
@robertbastian
Copy link
Member

robertbastian commented Aug 8, 2023

I generated the icu_testdata diff vs 1.2.0, but can't post it because the diff itself is over 1MB and I haven't found a service that would accept that. It seems a bit excessive, but I don't really care, the Postcard file is 5.8MB vs 4.4MB before.

The only removals are sr-Cyrl, which is a slight diff for no-fallback providers, and then there are ~18k additions.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this, just some small things

@@ -88,15 +88,21 @@ exit_on_error true
if array_is_empty ${@}
exec --fail-on-error cargo build -p icu_datagen --no-default-features --features rayon,provider_baked,bin,use_wasm,networking,icu_casemap,icu_displaynames,icu_relativetime,icu_compactdecimal --release
bin = set "target/release/icu4x-datagen"
verbose = set false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this seems useful for you, when making changes to a component we probably don't want to be verbose, and I think the logging slows down datagen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people run cargo make bakeddata which is usually faster than cargo make bakeddata component/foo even without this. But for the sake of expediency I'll put this in a separate PR.

@@ -70,7 +70,7 @@ syn = {version = "2", features = ["parsing"], optional = true }

# Other external dependencies
displaydoc = { version = "0.2.3", default-features = false }
elsa = "1.7"
elsa = { git = "https://github.com/Manishearth/elsa", rev = "56d05375eea36596432a0843721d26edf3b0ec75" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you either move this to dev dependencies, or use a mutex in the test, or get Manish to do a release? I don't want to submit git hashes.

provider/datagen/src/lib.rs Outdated Show resolved Hide resolved
provider/datagen/src/lib.rs Show resolved Hide resolved
@@ -16,6 +16,6 @@
"path": "blob.postcard"
}
},
"fallback": "Legacy",
"fallback": "Hybrid",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be RuntimeManual, as this is literally used for testing manual fallback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no diff if I make this change, so I'll make it because if anything it helps cover RuntimeManual.

However, note that locales: All, fallback: Hybrid is the way to run a pass-through datagen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to run pass-through datagen though. We're testing LocaleFallbackAdapter, which should be tested with data generated for manual fallback.

}
// TODO: Consider figuring out the cases where we don't need the fallbacker.
// We need it most of the time, so just pre-compute it here.
provider.source.fallbacker = Some(LocaleFallbacker::try_new_unstable(&provider)?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in #2683 (comment) we decided that preresolved+explicit would not include ancestors and descendants. There should be at least one mode that allows you to exactly control which locales get included. We're hitting this problem with hello world, clients will hit it as well.

provider/datagen/src/lib.rs Show resolved Hide resolved
sffc added 6 commits August 9, 2023 23:07
Conflicts:
	components/datetime/data/data/macros/datetime_skeletons_v1.data.rs
	provider/datagen/tests/data/postcard/fingerprints.csv
	tools/make/data.toml
@sffc sffc requested a review from robertbastian August 10, 2023 07:41
}
None => (),
};
option_iter.get_or_insert_with(|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return the error here if you don't use get_or_insert_with, then you don't have to deal with it in three other locations.

@sffc
Copy link
Member Author

sffc commented Aug 10, 2023

@robertbastian Do you approve this PR other than the last comment you made about code style? If I fix that comment during the day can I merge this with a rubber stamp from Manish?

@robertbastian
Copy link
Member

Yes

robertbastian
robertbastian previously approved these changes Aug 10, 2023
sffc added 3 commits August 10, 2023 14:10
Conflicts:
	components/datetime/data/data/macros/datetime_skeletons_v1.data.rs
@sffc sffc merged commit 3806e3a into unicode-org:main Aug 10, 2023
@sffc sffc deleted the fb2 branch August 10, 2023 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants