-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learner test build #75
Conversation
…p, this prevents from passing data through ray object store
…ray object store instead of having each actor loading the data.
… as a slice. i.e. if __getitem__(4) was called, would return everything from 0 to 4. This was crashing torch DataLoader. Issue was fixed by re-creating a dataframe when being asked single idx.
- stack during forward was not well executed - target shape was incorrect
validation=TorchDataset( | ||
config_path=config_path, | ||
csv_path=data_path, | ||
# Configure trainable with resources and dataset parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly why, but I think initially I put them here to avoid the need for reencoding each time it is tuned.
@@ -138,13 +139,31 @@ def tuner_initialization( | |||
|
|||
logging.info(f"PER_TRIAL resources -> GPU: {self.gpu_per_trial} CPU: {self.cpu_per_trial}") | |||
|
|||
# Pre-load and encode datasets once, then put them in Ray's object store | |||
@ray.remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh so ray.remote is the correct way to do it :)
|
||
# Put datasets in Ray's object store | ||
datasets_ref = create_datasets.remote(data_config_path, data_path, encoder_loader) | ||
self.config["_training_ref"] = training_ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like how your thought process evolved and went back XD
Btw, why you chose to use config again, instead of remote function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm :)
added test for learner and made sur it runs end to end,
TODO: add proper debugging tool to assess size missmatch