Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N_TILINGS with getQ and getQb #10

Open
stephenkgu opened this issue Jan 9, 2020 · 6 comments
Open

N_TILINGS with getQ and getQb #10

stephenkgu opened this issue Jan 9, 2020 · 6 comments

Comments

@stephenkgu
Copy link

stephenkgu commented Jan 9, 2020

With the last for loop in getQ and getQb of agent.cpp, i is from N_TILINGS to 3N_TILINGS, I think it should be from 2N_TILINGS to 3*N_TILINGS.

Is it a bug or I just miss something?

Thanks. @tspooner

@tspooner
Copy link
Owner

tspooner commented Jan 9, 2020

Hey,

You're absolutely right. I'm not sure why, but this seems to be yet another discrepancies between my private development repo and this public one - as in your issue from before (#9).

Rather than do a small patch like last time, I will go through all the code this weekend and make sure everything is in sync. Thanks for bringing this to my attention and sorry for the hassle.

Regards,
Tom

@stephenkgu
Copy link
Author

Thanks, I tried multithreaded training, but I failed, the data in agent just crashed.

Is this multithreading feature workable? @tspooner

@tspooner
Copy link
Owner

tspooner commented Jan 9, 2020

Honestly, I stopped using multi-threaded training quite some time before the main results of the paper were found. It doesn't surprise me much that it is broken.

I realise that's not ideal but you will probably have to do some modification yourself. This would include using mutexes to ensure that write-access is unique at any one time etc.

@stephenkgu
Copy link
Author

I noted on-policy R learning seems perform better than the other counterparts in the paper, so is OnlineRLearn the on-policy R learning?

Which learning method do you recommend with market making?
@tspooner

Thanks

@tspooner
Copy link
Owner

Yeah, OnlineRLearn is the on-policy R-learning algorithm that was introduced by Sutton. It's the equivalent of Q-learning for continuing tasks - i.e. it solves for a different objective: the expected average return as opposed to the expected discounted return.

In general, there is no "right" algorithm for market making. It really depends on what type of solutions you want and what assumptions you want to make about the setting; one can formulate market making as an episodic task (i.e. day-to-day trading), or as one with no terminal time. These yield different results, but it's not clear if one is necessarily better than the other. We certainly found in our experiments that it performed well, but that could also be said for Expected SARSA.

It also depends on whether you intend to use a discretised action-space or not...

Given all this, I would strongly suggest you start with Q-learning and SARSA and branch out from there. Until you try it for yourself and what the results are, where the limitations of on- vs off-policy methods lie etc, it's hard to gain proper insight. This makes for more effective solution development, in my experience.

@stephenkgu
Copy link
Author

stephenkgu commented Jan 16, 2020

I tried on-policy R learning with a dataset, the resulting Pnl curve just goes downwards, which is unprofitable. I guess that on-policy methods just learn a near optimal policy, which is unprofitable.

Well, the variance seems really small, which is nice for market making, still unprofitable makes it unusable for market making, ;P

@tspooner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants