-
Notifications
You must be signed in to change notification settings - Fork 84
0. Related Work
Nevmyvaka, Yuriy, Yi Feng, and Michael Kearns. "Reinforcement learning for optimized trade execution." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
Source: https://www.cis.upenn.edu/~mkearns/papers/rlexec.pdf
Nevmyvaka, Yuriy, et al. "Electronic trading in order-driven markets: efficient execution." E-Commerce Technology, 2005. CEC 2005. Seventh IEEE International Conference on. IEEE, 2005.
Method
- Expected execution price
- x-axis: limit order relative to its own side of the market
- y-axis: “return” (difference between the execution price and the mid-spread price at the beginning of the time period) e.g. return = mid-spread - (ex.price/mid-spread)
- Risk
- x-axis: every limit order price
- y-axis: Standard deviation of returns
- Market order: sweep the sell book for the entire size at once
- Marketable limit order: transact with top of the sell book and then leave the residual shares sitting on top of of the buy book.
- Efficient Pricing Frontier
- Markowitz efficient frontier: shows trade-off between risk and return in an investment
- Risk-return profile: every possible execution strategy on a two-dimensional graph
- x-axis: standard deviation
- y-axis: returns
- Efficient pricing frontier --> top part of risk-return
Results
- Order size
- More expensive to trade lager orders
- More risky (not getting executed) to trade lager orders
- Large orders require more aggressive pricing
- Possible improvement by splitting into several pieces
- Time Window
- Shorter time interval is more expensive as it requires more aggressive order pricing
- Longer time interval is less expensive but riskier
- Time of the day
- Only relevant if transacting over a long time period
- Otherwise generalization impossible
- Market Conditions
- Cheaper to trade on high-volume days, but also riskier (surges in volume -> higher volatility -> adverse price movements more likely)
- More aggressive pricing on low-volume days
- Depth of a book may not be as significant as volume, when it comes to limit order pricing
(M.Kearns)
Source: http://www.eecs.harvard.edu/~cat/cs/diss/paperlinks/ectutorial2006.pdf
Source: https://docs.google.com/presentation/d/1bsK-3GTvgtpE0WJOrue1u7ZsacftzSi_JGNSnLTdayY/edit#slide=id.p
Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets
Lim, Marcus, and Richard J. Coggins. "Optimal trade execution: an evolutionary approach." Evolutionary Computation, 2005. The 2005 IEEE Congress on. Vol. 2. IEEE, 2005.
Impact cost:
Moving the price up by executing large buy orders (e.g. down by sell orders) at once. By splitting up a big order (e.g. V shares) into smaller pieces and spreading the execution over a time horizon H the impact cost can be lessened.
Opportunity cost:
Arises when the price moves against our favour while splitting a big order into pieces and delaying execution. Therefore the opportunity to execute at a better price.
Trade execution strategy:
Optimizes trade-off between impact cost and opportunity cost and therefore trying to find best execution.
Measuring execution performance:
- Bid-ask mid-spread at t of execution initialization [Kearns]
- Volume Weighted Average Price (VWAP): vwap = sum(price*volume) / sum(volume)
Backtesting: Process of executing a given strategy on historical data do determine what its performance would have been had it been used on a certain time t in past.
- Price-only would not incorporate volume and limit orders (liquidity) available.
- Limit orders allow for an educated guess whereby it is assumed that trades are filled by those and therefore ignores the time priority of all other limit orders at the same price level
Kim, Adlar J., Christian R. Shelton, and Tomaso Poggio. "Modeling Stock Order Flows and Learning Market-Making from Data." (2002).
T: 1hr basis
Purpose:
Investigates currency order books to find patterns which can be exploited with the aim of forecasting movement. SVM classification techniques with different kernels along with two Multiple Kernel Learning (MKL) techniques, SimpleMKL, are being used.
Simulating and analyzing order book data: The queue-reactive model
“Market making” in an order book model and its impact on the spread
Source: https://arxiv.org/pdf/1706.10059v2.pdf
https://arxiv.org/pdf/1612.01277v5.pdf
Purpose: Buy/Hold/Sell
Valuable Information
- Presence of large amounts of noise and non-stationarity in the datasets, which could cause severe problems for a value function approach.
- Recurrent reinforcement learning
- provides immedieate feedback to optimize the strategy
- has ability to produce real valued actions or weights naturally without resorting to the discretization (which is necessary for value function approaches)
- Sharpe Ration and Downside Deviation Ratio can be formulated to enable on-line learning with recurrent RL
- Uses gradient ascent to optimize
- LSTM handles deep structure on feature learning and the time expansion parts
- Agent
- Risk-adjusted return using Sharp Ratio (return / std(return), given trading period t) or Downside Deviation Ratio
Purpose
- We propose the use of the conditional value-at-risk (CVaR) of the execution cost as risk measure, which allows to take into consideration only the unfavorable part of the return distribution, or, equivalently, unwanted high cost.
- Due to the parameter estimation errors in the price model, the naive strategies given by the nominal problem may perform badly in the real market, and hence it is extremely important to take such parameters estimation errors into consideration. To deal with this, we extend both the traditional mean-variance approach and our proposed CVaR approach to their robust design counterparts.
Statements
Variance:
- However, the variance has been recognized not to be practical since it is a symmetric measure of risk and, hence, penalizes the low-cost events.
- However, it is well known that variance is not an appropriate risk measure when dealing with financial returns from non-normal, negatively skewed, and leptokurtic distributions [22]
Value-at-risk:
- VaR is also known to have the limitations of lacking subadditivity and not properly describing the losses in the tail of concern [22].
- In order to overcome the inadequacy of variance or VaR, Conditional VaR (CVaR, also known in the literature as Expected Shortfall, Expected Tail Loss, Tail Conditional Expectation, and Tail VaR) has been proposed as an alternative risk measurement [23] which has the desired properties e.g., convexity and coherence, [22], and thus has been employed significantly in financial engineering, see [24]–[27] for portfolio or risk management
Source: http://www.wildml.com/2018/02/introduction-to-learning-to-trade-with-reinforcement-learning/
- Sharp ratio or Drawdown as reward functions.
- Reinforcement Learning allows for end-to-end optimization and maximizes (potentially delayed) rewards.
- a strategy may work well in a bearish environment, but lose money in a bullish environment. Partly, this is due to the simplistic nature of the policy, which does not have a parameterization powerful enough to learn to adapt to changing market conditions.
- However, if we explicitly modeled the other agents in the environment, our agent could learn to exploit their strategies. In essence, we are reformulating the problem from “market prediction” to “agent exploitation”. This is much more similar to what we are doing in multiplayer games, like DotA.
- in the trading case, most states in the environment are bad, and there are only a few good ones. A naive random approach to exploration will almost never stumble upon those good state-actions pairs. A new approach is necessary here.
- There are many ways to speed up the training of Reinforcement Learning agents, including transfer learning, and using auxiliary tasks. For example, we could imagine pre-training an agent with an expert policy, or adding auxiliary tasks, such as price prediction
Source: https://www.hardikp.com/2018/02/11/why-is-machine-learning-in-finance-so-hard/
http://parasec.net/transmission/order-book-visualisation/
Roughly speaking, algorithmic trading is based on two different time scales: the daily or weekly scale, and a smaller (ten to hundred seconds) time scale. The first step is to optimally slice big orders into smaller ones on a daily basis with the goal to minimize the price impact and/or to maximize the expected utility; the second step is to optimally place the orders within seconds. The former is the well-known optimal execution problem and the latter is the much less-studied optimal placement problem.
Hwang, Ted, et al. "Deep Reinforcement Learning for Pairs Trading."
Necchi, Pierpaolo G. "Reinforcement Learning For Automated Trading."
Du, Xin, Jinjian Zhai, and Koupin Lv. "Algorithm Trading using Q-Learning and Recurrent Reinforcement Learning." positions 1 (2016): 1.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=935097
https://editorialexpress.com/cgi-bin/conference/download.cgi?db_name=res_phd_2013&paper_id=271