Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow indices and prevent-feeding to work together. #281

Merged
merged 4 commits into from
Nov 15, 2023
Merged

Conversation

joanise
Copy link
Collaborator

@joanise joanise commented Aug 31, 2023

This is not optimized yet, more a proof of concept. Consider this a draft PR, but I'd like feedback on it now anyway please.

@joanise joanise marked this pull request as draft August 31, 2023 16:26
@joanise joanise requested a review from roedoejet August 31, 2023 16:27
@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (daf33b8) 92.46% compared to head (8363cfa) 92.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
+ Coverage   92.46%   92.48%   +0.01%     
==========================================
  Files          16       16              
  Lines        2284     2290       +6     
  Branches      515      516       +1     
==========================================
+ Hits         2112     2118       +6     
  Misses         98       98              
  Partials       74       74              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joanise
Copy link
Collaborator Author

joanise commented Aug 31, 2023

Calling strip_index_notation() for each character being processed is obviously inefficient, but I'd hoping to get feedback on the concept, and if the concept is OK I can optimize.

@roedoejet
Copy link
Owner

Looks reasonable to me! Tbh I might also be fine with just raising an exception if someone has indices and prevent-feeding=True - this might be a YAGNI. But the implementation looks good - I would be a bit worried about the performance I guess.

@joanise
Copy link
Collaborator Author

joanise commented Sep 1, 2023

No, we can't make this incompatible, I'm about to use if to fix kwk case preservation!

Performance: I did some tests yesterday, and I'm no longer worried: strangely enough, using a cache to make sure I process each rule's output only once made 0 difference on the performance! I guess it's trivial compared to the cost of applying all the rules in the first place. It's only done on the final output of the rules that were actually used, so it kinda makes sense, it's a trivial blip at the end of the pipeline.

@joanise joanise marked this pull request as ready for review September 1, 2023 13:40
@joanise
Copy link
Collaborator Author

joanise commented Sep 1, 2023

Turning this back into a non-draft PR: with the tests I did yesterday, I now think this is ready to merge as is.

@joanise joanise marked this pull request as draft September 1, 2023 15:33
@joanise
Copy link
Collaborator Author

joanise commented Sep 1, 2023

never mind, this breaks other stuff, it needs more work...

@joanise
Copy link
Collaborator Author

joanise commented Sep 1, 2023

I think the problem I noticed is just due to #283 and this PR is OK as is, but I'll wait until 283 is fixed before I finalize this PR.

@joanise joanise marked this pull request as ready for review November 10, 2023 22:38
@joanise
Copy link
Collaborator Author

joanise commented Nov 10, 2023

I need this PR to finish my kwk work now, so I've rebased it and improved the unit tests a bit. In my opinion this is ready to merge.

Copy link
Owner

@roedoejet roedoejet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @joanise thanks! I noticed that we had the functionality of strip_index_notation in a few places, so I refactored it out everywhere I could find it and just add the commit to this PR, hope you don't mind!

@joanise
Copy link
Collaborator Author

joanise commented Nov 15, 2023

Looks good to me @joanise thanks! I noticed that we had the functionality of strip_index_notation in a few places, so I refactored it out everywhere I could find it and just add the commit to this PR, hope you don't mind!

I think that's an excellent idea, thanks! I just think it belongs in g2p.mappings.utils, alongside things like normaqlize() and unicode_escapes(), instead of being all alone in a new g2p.utils.

joanise and others added 4 commits November 15, 2023 16:57
@joanise joanise merged commit 8363cfa into main Nov 15, 2023
@joanise joanise deleted the dev.issue-280 branch November 15, 2023 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants