-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WordSeg] Add inline documentation #566
Conversation
Replace various verbs with “returns”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some preliminary comments in preparation for our session today.
Rename DownloadDetails to ReferenceArchive Combine URL with filename and extension Update main summary in WordSegDataset Add blank line before doc comments in WordSegDataset
Remove explicit parameter descriptions and add them to summaries. Handle errors instead of throwing. Remove CharacterErrors.nonUtf8Data. Update attribute names in dataset tests.
Update summary in Phrase to include parameter names.
Remove explicit parameter descriptions and add them to summaries. Conform parameter names to Swift conventions.
public let plainText: String | ||
|
||
/// A sequence of text in numeric form, derived from `plainText`. | ||
public let numericalizedText: CharacterSequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember wanting to mention this before, but if “Character” doesn't mean what in Swift is called Character
, we should look for other names, e.g. “glyph.”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a start, I created #600 for using Character
instead of String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't exactly address this, but heads in the direction of cleaning up that design overall.
public let numericalizedText: CharacterSequence | ||
|
||
/// Creates an instance containing both raw (`plainText`) and processed | ||
/// (`numericalizedText`) forms of a sequence of text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always thought “numericalized” read terribly awkwardizedly. I get the impression this is a term of art, but we should discuss whether it's the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, let's sync up via GVC; I didn't get to the end, but there's loads of great material for the seminar on Friday. Let's talk about exactly how much/whether you want to fix things up before then. It might be best to just get your brain going on some of the questions so that the pace of the dialogue Friday is high, but leave all the issues in place.
/// | ||
/// - Note: Omits any part of the dataset that cannot be converted to | ||
/// `CharacterSequence`. | ||
private static func convertDataset(_ dataset: [String], alphabet: Alphabet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, you're right, there's loads to work on here; I have 1000 thoughts about this: is it just numericalizing the strings in dataset, and don't we just want to name the function after that? Should it not be an extension on Array<String>
? We should say what "using" means, …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to numericalizeDataset
Change members to lets
Co-authored-by: Dave Abrahams <[email protected]>
Co-authored-by: Dave Abrahams <[email protected]>
Remove unnecessary additional `load()` Remove unnecessary optional from `testingPhrases` and `validationPhrases` Simplify optional filename handling in init() Remove extra `)` from training loss output Add test for loading only training file
Co-authored-by: Dave Abrahams <[email protected]>
Simplify and remove redundant init code
alphabet: alphabet, appendingEoSTo: trimmed)) | ||
var phrases = [Phrase]() | ||
|
||
for data in dataset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dabrahams Same question here about how to compose this in a way that removes the raw loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could I use compactMap
here if I need to include the original text? If I want to base my inclusion on whether CharacterSequence()
results in nil, how can I include the original text in composing Phrase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a symptom of awkward design. I believe rethinking CharacterSequence
will make these acrobatics unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// untested of course
return dataset.compactMap { data in
let trimmed = data.split(separator: " ", omittingEmptySubsequences: true).joined()
let numericalizedText = try? CharacterSequence(alphabet: alphabet, appendingEoSTo: trimmed)
return numericalizedText.map { Phrase(plainText: String(data), numericalizedText: $0 }
}
Rename downloadableArchive to source Preserve intermediate array type
Co-authored-by: Dave Abrahams <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for diligently working on this. Will take as number-one documentation sample in future!
Primarily adds inline documentation to
WordSeg
components. Also:CharacterErrors
into CharacterSequence.swift, where it is primarily used.WordSegRecord
->Phrase
WordSegDataset
DownloadDetails
->DownloadableArchive
andconvertDataset()
->numericalizeDataset()