-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TrainingLoop: refactor progress printer and add CSVLogger #668
Conversation
2b74741
to
cb35bd4
Compare
TrainingLoop/TrainingLoop.swift
Outdated
let statisticsRecorder = StatisticsRecorder(liveStatistics: true, metrics: [.loss] + metrics) | ||
let progressPrinter = ProgressPrinter(liveStatistics: true) | ||
self.statisticsRecorder = statisticsRecorder | ||
self.progressPrinter = progressPrinter | ||
self.callbacks = [ | ||
statisticsRecorder.record, | ||
progressPrinter.print, | ||
] + callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine cases where we don't want to record per-batch or per-epoch statistics. By building statistics gathering directly into the training loop, rather than having it be a user-provided component, can we account for those cases where we want to strip down the training loop as much as possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please see latest change where I make both whether to include default callbacks and when to record stats configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the first file. Please revise (and apply anything you learned from doing so to the other files) and I can look again. Thanks!
import Foundation | ||
import ModelSupport | ||
|
||
/// A callback-based handler for logging the statistics to CSV file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice doc comment. English nit:
/// A callback-based handler for logging the statistics to CSV file. | |
/// A callback-based handler for logging statistics to a CSV file. |
I would consider not leading with “callback-based.” In fact, "logging" and "CSV File" are kind of implied by the name. So the best description would explain what's being logged. “Statistics” is good, but what kind fo statistics? Training statistics, maybe? Maybe this should be thought of as “a log file” rather than “a logger?”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the doc comment. Any reason why this is LogFile? (Logger makes sense to me since its a logger not a file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thing can only be constructed to log to a path in the filesystem, and when we invoke its one public method, log
, it appends to that file. Properly used, if you have multiple instances, each instance logs to a different file. So instances have a 1-1 correspondence to log files. The word Logger
doesn't imply any of those things. A more general Logger
might be an interesting abstraction, but it would have a different API.
If I see this code, I know exactly what's happening.
// No argument label needed, arguably, because when you construct a thing called “File” with
// a string, the string is obviously a path.
LogFile eventLog(s)
eventLog.append(blah, blah, blah)
With Logger
and log
, it's less clear.
"\n" + ([epoch, batch] + stats.map { String($0.1) }).joined(separator: ", ") | ||
).data(using: .utf8)! | ||
do { | ||
try foundationFile.append(dataRow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it important to do this atomically if there are multiple writers to the same log file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does FileHandle support atomically write ? (Also, we won't expect multiple files written at same time I think)
1c043d4
to
c3b0082
Compare
|
||
/// Create an instance that log statistics during the training loop. | ||
public init(withPath path: String = "run/log.csv") { | ||
self.path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you “resolve” this comment?
/// The last input fed to the model. | ||
var lastInput: Input? { get set } | ||
var lastStepInput: Input? { get set } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you see repetition like this, the instinct should be to replace it with something like:
var lastStep: Step?
// MARK: - Epoch-level data | ||
|
||
/// The number of epochs we are currently fitting for. | ||
var epochCount: Int? { get set } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these optional? Usually, 2-phase initialization indicates a design error.
|
||
// MARK: - Others | ||
|
||
/// The log for last statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last what? We have last epoch, last step, …
// Callbacks | ||
/// The callbacks used to customize the training loop. | ||
public var callbacks: [TrainingLoopCallback<Self>] = [] | ||
/// The metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do better please
|
||
// MARK: - The callbacks used to customize the training loop. | ||
|
||
public var callbacks: [TrainingLoopCallback<Self>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comments missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock progress but there are lots of things to fix, including many that are simply mechanical (missing doc comments).
Thanks! Will come with new PRs to address:
|
…nsorflow#668)" This reverts commit adda08e.
This is the first step addressing of #663. Highlights of the change: