The hardest thing about AI
Leading the charge in weaponising data, JP Morgan Asset Management’s head of equity data science outlines the objectives and obstacles he faces.
Kristian West, global head of equity trading and equity data science at JP Morgan Asset Management.
What defines your role?
When we set out on this journey, we defined four core objectives.
The first relates to data. Principally getting access to data, including new and alternative sources of data. It also meant making sure that internal data was more readily available and in an accessible form. It’s not just about making the data available, it has to be usable too.
The second core objective relates to the environment. Data science-related work requires a non-traditional technology environment. Getting access to the best compute ultimately led us down the cloud path. Whilst we do have on-premise (on-prem) super computer-type machines for proprietary work, the reality is getting access to public or shared resource is extremely powerful in this space. Getting access to cloud and having a robust tech environment and a scalable environment is very important.
The third objective involves the science. That’s to say, having people do the analysis that are skilled in hard core machine learning capabilities, as well as more traditional quantitative and financial modelling skills. We want to make sure we’re challenging some of the existing processes and models with new techniques.
The last objective, and arguably one of the most important, is training. When you look at the demographics of a business such as ours, the people that are really skilled in these areas are the more junior individuals. Those that have come fresh out of university. A lot of these skills and tools are only a few years old. It’s therefore very important for us to educate existing teams. You can only really get adoption if there is an understanding of what it actually means.
What is your objective?
The objective is to create new robust insights from these new data sources and techniques. It’s to leverage the new machine-learning algorithms and compute power that will allow us to actually act upon these new insights. Applying new techniques to existing processes should be part of the remit; to challenge the status quo.
How mature is understanding of artificial intelligence and machine learning within your firm?
Given the scale of our firm we have a lot of people, with a lot of technology resources and a lot of different processes. We’re not new to large sources of data and we’ve had to get good at processing large amounts of data. That’s what we do. Algorithms and analytical processes continue to evolve but in recent years the breakthroughs and advances in artificial intelligence, machine learning and compute power, especially through cloud services, really allow us to take advantage of new techniques and tools on a scale that you couldn’t do just three years ago. So I would say the concept is relatively well understood but the application is still in its relative infancy.
Is that also true of the wider industry?
I think that’s probably true of the broader community. Within the industry we’re definitely at the early stages of machine learning, artificial intelligence and the understanding of application although I suspect this space will accelerate enormously as younger people come into the working environment and carry these skills and interests with them. If you look at quant analytics and analysis, which is the closest thing to this, there hasn’t been a significant amount of change over a number of years. It’s really the advance of machine learning that has changed and that’s come from outside the financial industry, from technology companies taking advantage of that in social engineering and profiling of individuals.
What have been the most surprising things that you have learned in this role so far?
One thing that surprised me the most is just how many vendors and firms are out there trying to sell you data. You can get data on just about anything and everything. If you wanted to get more specific you can start stitching data together. Some vendors may say, oh it’s completely anonymised, but when you start stitching different datasets together it’s not quite so anonymous anymore. Getting access to good data remains a challenge. That’s a surprise.
Then, in order to make these tools and data services work requires domain expertise. You can’t do data science in isolation so having a close working relationship with existing researchers, portfolio managers and having a very bottom up iterative working process, is fundamental to the success of data science. It’s not an ivory tower exercise. It has to be ground up. Therefore the techniques we use and the adoption of it has to be very flexible and iterative because the speed of change is rapid. The way you interact with investment teams and portfolio managers therefore has to be very iterative.
The last surprise worth highlighting is, as I mentioned before, how many of these techniques have come from outside our industry. When you look at open source examples, a lot of what you see in other industries is actually very applicable to finance. For example, an individual in our group participated in a Kaggle competition looking to identify whale’s noises in a literal sea of noise. They were able to later leverage that same approach in de-noising financial time series data. This is a great example of where diversity of thought becomes critical to a data science group.
What are your goals over the short to medium term?
Short term, it’s really to re-visit and enhance existing processes. We’re definitely not saying machine learning is going to reinvent the wheel. That’s not the case at all. For example in trading execution automation, we apply machine learning to help make trade decisions on timing, aggression and so on. In the same instance we apply machine learning to the reading of technical documentation. So it’s about taking existing processes that work well now but making them much more scalable and much more efficient. Overall, it’s about identifying and executing small projects that, when added up over time, could have a big impact on your process whether that be process automation or as part of the portfolio construction mosaic.
The long term goal is around training. Making sure that this isn’t a skill or something that a select few do which, across the industry, is largely the case at present. We want this to become the norm, making it a skill for as many people as possible. The other long-term goal is about increasing accessibility i.e. making compute power, data, insights, etc, available to as many people as possible. If someone has a piece of data or makes a decision, we want everyone to be able to see this so they can learn from it.
Is there a clear direction of travel for the investment management industry or is its evolution unpredictable?
On one side the industry has quantitative processes and platforms and at the other end of the spectrum you have fundamental. I think machine learning brings them much closer together. Where a quantitative process is structured and can generate factors or signals, that skill set can act as an overlay to a fundamental process in terms of safeguards, ESG signals, creating red flags or just creating a screening process to a more fundamental or traditional process.
In the same way, when you have fundamental analysts and portfolio managers interacting with data scientists, to help refine and iterate those models, you get a lot of domain expertise added to that process which makes the models better. That’s the biggest difference between a quant process and a machine learning process. The machine learning process learns whereas a quant process is quite static and linear. And that’s where the two come together. People talk about “quantamental” being the space in the middle. I think that’s true. Over a period of time there won’t be as much distinction between quantitative processes and fundamental.
Is your role’s focus on the data environment about data quality, or more about the analytics that allow you to see pictures in the data?
There are three core themes. The first is that if we get data into this building, we want to make it accessible and shared by as many people as possible because the more people that can get access, the more people can learn from it. The second thing is making sure that once data is in the building, it is actually in a usable format and it’s not so bespoke and so tailored to one specific use case that therefore it can’t be used. The third is about giving the data the right structure. There needs to be tools around it so that non-data scientists, or people with some understanding of how to code or manage data can use it.
Which types of machine learning models are you applying and which tasks are you applying them to?
We use a wide swathe of them, there is no one super-model, we have to use the tool most appropriate for the job. Across trading for example, we’d use penalised linear models for basket shaping, reinforcement learning for broker / algo selection and unsupervised clustering as an overlay. Our use of deep learning has been more focussed on the analysis of text where these models are state of the art.
In terms of the tasks we are applying them to, it has varied significantly. At one end of the spectrum it has been simple automation in places where people are not value additive. At the other end of the spectrum though it has been around integrating models into people’s workflows to augment some aspect of it, be that alerting to changes, identifying outliers or generating investment ideas.
One thing which is not talked about but is very relevant is that, in reality, as time goes on, the distinction between models like reinforcement and deep learning is narrowing. Increasingly models use both. As a result I think the distinction between the two will probably narrow over time.
Different technical elements need to come together from building algos, to cloud provisioning to data itself; what are the biggest logistical or operational challenges in managing those elements?
There are two core challenges. The first relates to people. You can find a data scientist but finding individuals that have a financial background is a challenge. Attracting such talent requires you to think differently when hiring however we’re committed to having a diverse hiring approach which helps.
The other challenge is around cloud. We have a lot of proprietary data so our natural instinct is to manage on-prem and do everything on a proprietary basis. However, in this space, that doesn’t necessarily set you up for success. Cloud and the tools that cloud offers is definitely a recipe for success but that does require careful planning.
So platform design, the control from on-prem to cloud and cloud back to on-prem requires a lot of careful consideration. Then there is data pre-processing, model training and fine-tuning before you can even apply, that’s also a challenge.
Is financing these projects challenging?
The financial costs, really, are people and data. If you want to spin up a server on a cloud platform and wanted significant compute, it will cost you US$10-15 an hour. It’s not that expensive. However using cloud is relatively new so the process to get acceptance or adoption of cloud can take time.
Everyone wants to use cloud, but how does one feel secure when assessing a cloud provider, if they might have ambitions to enter finance for example?
One of the things we’ve done is recruit people from cloud providers. So we recruited Apoorv Saxena, who was Google’s head of machine learning. He has a team in San Francisco and is someone that we interact with daily. Understanding those environments and maximising their use is very important, notwithstanding the ambitions of the providers.
One must have very strict controls and processes around using external public cloud and have a robust framework around management of getting data in and out. Realistically for you to be successful in this space you’ve got to be using the cloud in some shape or form.
What is your opinion of third party support for AI, and what is the balance between internal and external development?
That’s a moving target in terms of build versus buy. The team’s strategy is to ensure that our own resources are working to optimise the areas that are value additive, not on a platform we cannot add value to. So, for example, we’ll tend to outsource where we can make the biggest efficiency gains. In terms of AI support, whilst there are vendors out there, we’re not convinced many are quite at the state of the art or have the background to apply it in this arena.
Third party providers, for us, tend to be mostly in the data provision space. There is a great deal of open source or cloud tools that we can use, for example to parse documentation or translate documents into another language. There are tools out there that can do that, so we wouldn’t necessarily want to build that ourselves. So in terms of vendor providers most of it’s in the data space and customisation will either be done in prem or in the cloud, depending on the task at hand.
Are there other challenges you are facing using AI?
Finding data scientists with a financial understanding is a challenge, as I mentioned before. I think the second one is that it requires a very steep learning curve for multiple functions because it’s a very technical and detailed area. Especially when you try to apply in practice as it becomes both a technical and practical challenge. You not only have to get the data scientists to understand the financial environment, but you need to get the businesses to understand data science and then you need technology to understand both. You also need control functions across risk, compliance and so on to understand how it works.
Which challenge is hardest to overcome?
One of the most fundamental things, which is rarely commented on, is model interpretability. When you have a machine that is effectively learning it’s more difficult to articulate.
It’s not sufficient to have a machine learning engine that will give you strong quantitative efficacy. It’s about understanding how the model works. And that, again, is a technical challenge. It’s a good thing to do because it helps not only educate people but it helps people that are building these models understand and be challenged by domain experts. I think too little focus is given around interpretability as it’s fundamental to adoption.
Is there a tipping point where machines optimising human activity tips into people optimising machine activity?
People refer to it as ‘augmented intelligence’. I think there’s a tipping point between where people say I understand, from a quantitative perspective, what this model will do under different circumstances to an understanding that a machine is learning on a daily, hourly or a print-by-print basis, based on new information. It is evolving and requires a mindset change.
Do you have a perspective on the idea of AI fully managing a portfolio?
There are already pure quant products. I think as this space matures and machine learning algorithms improve, as data becomes more readily available for them to build upon, I suspect it’s a natural progression. The question then becomes at what point would those models be considered AI?
How can output of a ML algo or AL model be integrated within existing investment models?
Again it’s iterative, depends on the product, investment team and style. But I think, in simplest terms, it could become an additional factor or signal. ESG is a classic example. You can use a lot of alternative data to create signals or scores around certain themes.
Another example could be outlier detection, drawing attention to scenarios, instances or events that a pre-existing model wouldn’t have picked up on or scored differently.
Then there’s the post-processing angle where, again you can have a model or a process, where you can apply machine learning to sanity check that process. Similarly, another approach applied by a number of firms already is around risk management and control functions. Using it as a downside protection or alerting functionality in terms of reducing idiosyncratic risk and trying to account for certain events.
Could machine learning lead to a rethink of basic investment ideas such as modern portfolio theory (MPT)?
MPT is not new to criticism. We have seen that machine learning is able to observe, learn and highlight the interrelatedness of certain assets or events, based on their learned interactions.
There was a recent study looking at signals for dementia candidates. The study had tracked a cohort of nuns who had been first screened for dementia several decades ago. A machine learning algorithm picked up on events that took place 30 years earlier and referenced those as being the key signals for identifying dementia candidates 30 years later. A machine learning algorithm was able to pick up on this where a human wouldn’t necessarily. And I think that’s also true of machine learning in terms of how you apply it to portfolio construction.
In the trading space there’s so much data that, for example, if a single trader is trying to trade 50 orders, at the point the order arrives they start looking at certain data points. And they’ve got to refresh their mind on those data points throughout the entire life of the order across 50 names and the interrelatedness of those securities, sector risk, skew and so on. That’s tough! If you can have a machine that’s understanding and listening to certain signals and trying to alert you to certain events or risks that may apply to that list, that’s powerful. I think that’s definitely something which will be invaluable to, I guess, portfolio theory going forward.
The interaction that machine learning creates between the quantitative element and the fundamental creates a virtuous circle and so, as a result, the distinction between the two will likely shrink over time.
Kristian West, managing director, is global head of equity trading and equity data science at JP Morgan Asset Management. An employee since 2008 his focus is on the evolution and management of the trading platform and building out data science capabilities across the equity business. Prior to joining the firm he was head of Equity Execution Services at Barclays Capital, building global electronic and voice execution services for the hedge fund community. Prior to that, he worked at Goldman Sachs and was responsible for trading within their Electronic Transaction Services (ETS) department. He was also a US Sales Trader at Spear Leeds & Kellogg. West serves on the board of Cboe Europe, as a non-executive director. He obtained a BSc (Hons) in Management & Design in Engineering at London City University, School of Engineering and an MBA from the University of Cambridge.