AI for investors : Reinforcement learning for portfolio managers : Vineet Naik

Using reinforcement learning for portfolio management has great potential, but asset managers need to think carefully about its weaknesses as well as its strengths. Vineet Naik writes.

A new paper, ‘Adversarial Deep Reinforcement Learning in Portfolio Management’ has suggested reinforcement learning could be used to help with portfolio management by investment firms. The research, conducted at Sun Yat-sen University in China, used the machine learning paradigm to model investing in the Chinese stock market. It found that, used correctly it could deliver positive results, however there are considerable risks that portfolio managers need to be aware of.

Reinforcement learning is typically modelled with an algorithm that interacts with an environment, for example an equity market, and improves its performance of a task by trial-and-error. The algorithm used is often referred to as the ‘agent’ exploring the space of possibilities. Reinforcement learning has been successfully used in robotics to carry out tasks with minimal human supervision, particularly when combined with deep neural networks, to deliver deep reinforcement learning (DRL).


“It is pretty easy to take a large amount of data and identify patterns that are stable and repeatable using machine learning methods. Reinforcement learning is about closing this loop and automating the process,” says Dr Thomas Hill, senior director for Advanced Analytics at TIBCO.

DRL is gaining popularity in portfolio management. It has been found that DRL techniques can discern patterns in market movements even with very limited access to data. Financial market data is often complicated, non-linear, and noisy – much like human speech and digital imagery – data which DRL has successfully handled historically.


The recent paper from Sun Yat-sen University in China investigates the application of deep reinforcement learning in portfolio management. The research was conducted by Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang and Yanran Li investigated three different algorithms and the influence of different optimisers and network structures on their performance.

“Utilising deep reinforcement learning in portfolio management is gaining popularity in the area of algorithmic trading,” the authors note. “However, deep learning is notorious for its sensitivity to neural network structure, feature engineering, etc.”

In order to get a closer look at the performance and potential pitfalls of reinforcement learning algorithms as a class, they have chosen three mainstream algorithms and have performed intensive experimentation by varying the various parameters and optimisers within the algorithms.

First, some simplifications and generalisations are made. The algorithm/agent represents a trader investing his resources into a set of instruments and reallocating his assets at the end of each day in order to maximise his profit. A continuity assumption is made, that the closing price of each asset at the end of each day is the same as its opening price the next day. So, each day, the agent observes and analyses the stock market, and performs reallocations at the end of the day. The model also takes into consideration transaction costs, which are assumed to be percentages of the transaction amounts.

The prices themselves are modelled as states in a Markov decision process (MDP). A Markov chain is a probabilistic model for a sequence of events where the probability of any event is only dependent on the immediately preceding event, and an MDP is an extension to this idea by the addition of actions (allowing for choice), and of rewards (providing motivation). If these two variables are set as constant, the MDP reduces to a conventional Markov chain.

The three algorithms chosen in the study were the deep deterministic policy gradient (DDPG), proximal policy optimisation (PPO), and policy gradient (PG). The DDPG algorithm uses a particular framework that allows greater stability during the training process, and allows for greater data sampling efficiency. PPO also tries to increase the stability of the learning process, however it does so by limiting changes to the strategy at each step. Nevertheless, the study found that the simpler PG algorithm outperformed both of them, with DDPG and PPO not being able to find the optimal policy even within the labelled training data.

According to the authors, there are significant advantages in using DRL. Firstly, by taking market information as input and asset allocation as output, DRL strategies have the potential to be autonomous and self-improving. Secondly, compared with conventional reinforcement learning, deep reinforcement learning builds strategies by using neural networks which can include both the flexibility of having tailored neural network structure but also prevent the so-called ”curse of dimensionality” which comes with complex multivariable environments like financial markets. This would enable large-scale portfolio management.

However, there are a few factors that make portfolio management particularly challenging for learning algorithms. First, stock data is very noisy – it has a lot of additional meaningless information – which leads to distorted pricing. The noisy market hypothesis suggests that observations of stock prices and financial indices may not reflect the true underlying value of the stocks. For learning algorithms, such inefficiencies in data can lead to disastrous failure in performance. Indeed, in this particular study, the data used has been taken from the Chinese stock market which the paper suggests is far more volatile than its American counterpart.

Managing DRL for effective trading and investment

The issues highlighted by the study are technical challenges, but there are broader concerns around the use of machine learning in decision making. Many fund managers have said that they are not comfortable with the implementation of an automated model if they cannot understand how the predictions are being made.

In a 2017 white paper entitled Artificial intelligence and machine learning in financial services, market developments and financial stability implications, regulatory body the Financial Stability Board observed, “One issue is that AI and machine learning may reinforce biases. Some commentators point to the potential of big data analytics to entrench existing biases in college applications, job selection, prison sentencing, and credit provision.”

Many of the reinforcement learning methods apply “black box” deep learning methods – they are complex learning mechanisms that connect inputs to some expected rewards, and then extract strategies and policies. For example, in the field of computer vision, an algorithm can be taught to recognise a photo which has a cat in it by showing it many such photos. However, it is difficult if not impossible to actually single out which properties of the photo have led the network to this conclusion.

“When we let these technologies loose and deal with consumers, be that in marketing or be that in the investment world, you want to demonstrate that you control consumer risk – you want to demonstrate that you don’t violate any of the boundaries that society is concerned about surrounding discrimination and prejudice”, says Dr Hill.

Another problem, specific to reinforcement learning, is having the ability to create accurate models with less data.

“One way this could be addressed is by the use of adversarial nets that play each other and explore a vast space of possibilities of inputs”, says Dr Hill.

Adversarial nets generate data themselves by exploration and by playing against each other and have wide ranging applications, from sharpening photographs to creating innovative designs for aerofoils.

The trial and error model is also potentially very slow in being used, and capital markets tools may need to deliver results quickly. In terms of applying these ideas to investment strategies, being able to learn concepts much more quickly and detecting concept drift – the gradual change of target variable properties over time – would be invaluable.

Dr Hill says, “Instead of just predicting outcomes, it would be more effective to find latent or hidden states – configurations of inputs within historical data. If we can find these and implant them into the agent, it would cut down the realm of possibilities that need to be explored, and therefore make the learning process more efficient in terms of the amount of data being used. This would be necessary to making these systems fully autonomous”.

One wider issue is that useful trading signals and strategies derived from AI, decay over time, as technology and data become more widely available.

“This has meant that the process has become incredibly efficient and as a result, it has gotten a lot harder to reliably make money,” says Dr Hill.

While there is a great deal of potential for research in the application of machine learning in portfolio management, there is much work yet to be done before fund managers start relying entirely on these tools to carry out such delicate work. The paper from Sun Yat-sen University notes that due to the sensitivity of neural networks to structure and data, their performance in portfolio management cannot yet be compared to their performance in other areas like robotics.

Algorithmic strategies will need to be able to deal with highly noisy and scarce data, and with other issues such as concept drift by utilising techniques such as adversarial learning in order to generate data. They will also need to become more transparent, before fund managers set them loose even semi-autonomously on the financial markets.



Leave a Reply

Your email address will not be published. Required fields are marked *