Recent Reddit-fueled market volatility calls for better predictive capacities. Markets see unprecedented moving patterns and participants are on the lookout for new forecasting tools. Sentiment analysis is an increasingly popular method but it’s challenging to act on. The key to increase reliability in forecasting while reducing errors is a more holistic approach that dynamically combines various real-time datasets, from markets and alternative sources, with social media sentiments being only one among many factors.
After day traders pushed the shares of GameStop Corp. (NYSE:GME) to astronomical new heights at the end of January 2021, the shares lost most of their value and dropped from over $400 to below $40 by February 19. A few days later, on February 24, 2021, history seemed to repeat itself. Once again the shares of GameStop Corp. were up more than 100%.
The real surprise here is not the fact that day traders are able to juice certain shares but that fund managers and institutional investors appeared helpless and even at the mercy of fate. While the huge losses on the side of some hedge funds speak for themselves, the spike in demand for sentiment analysis and predictive trading solutions illustrates that the use of advanced real-time predictive analytics by institutional traders and prop shops is still underdeveloped.
This seems to be changing, however. Now, as recently commented by Mark DeCambre from MarketWatch, “a group of data providers are wagering that financial markets will never be the same again and that deep-pocketed investors will shell out big bucks to monitor discussions on message boards like Reddit’s r/wallstreetbets”. What DeCambre describes here is sentiment analysis, which is based on natural language processing and determines whether a piece of information transports a positive, a negative or a neutral message. This information usually comes in the form of unstructured data, a type of data that is expanding at 55-65% per year (Bernard Marr, Forbes 2019). And it can’t be easily stored in a traditional column-row database or spreadsheet. It can cover everything from text data and audio, to images and video, and might overlap with real-time data. Unstructured data almost always contains valuable information on any topic including the financial world but it’s often neglected today because it’s still resource intensive to process. Other challenges pertaining to sentiment analysis include that no information is given about the exact nature and extent of influence on the price of an asset. Moreover, time horizons of certain “signals” are not defined and up to the user to contextualize. Last but not least, a sentiment analysis alone is not a sufficient signal to build full-fledged trading strategies.
Real-time & unstructured
Technological progress in the financial world is rapid. While the use of real-time predictive analytics in trading seems underdeveloped, the majority of firms use some sort of machine learning software to evaluate and contextualize historical data such as prices, volumes or liquidity, with the goal of discovering repetitive patterns. The technology is deployed by most trading organizations to develop strategies, ideally allowing them to successfully forecast future stock prices. According to Guy De Blonay, fund manager at Jupiter Asset Management, 80% of daily volume in the U.S. is done by machines (CNBC, 2018). However, when machines rely mostly on structured, historical data, the output is usually noisy, neglecting the influence of real-time events. Adding some proprietary sentiment algorithms might help to get better data, but the development of efficient solutions remains challenging with common approaches. The question we need to answer to get more accurate predictions, is how to cope with the ever increasing amount of another data category, real-time data. Real-time data is currently growing five-fold per year and will constitute more than 30% of the total global data sphere by 2025 (IDC, 2020).
The latest developments at U.S. stock markets call for an integration of both, (real-time) market and unstructured data into predictive analytics models. Hidden data features such as volatility or orderbook pressure are only meaningful in real-time. And modern forecasts call for integrating a wide range of data sources - from markets, social media and other sources - with different characteristics. The future winners among predictive analytics tools in trading will not be decided based on single criteria such as social media chatter but on the ability to dynamically integrate data - structured and unstructured - in real-time and to produce actionable signals. The fact that already today some AI-based hedge funds outperform industry benchmarks, as illustrated by Preqin in 2019, indicates the potential of machine-supported trading in the future.
Our research as well as our work with clients clearly show that forecasting error reductions by up to 50% with highest reliability levels are possible with so-called temporal mixture ensemble models. Tian Guo and Nino Antulov-Fantulin, both Co-Founders of aisot, developed this method, and it holds the potential to dynamically combine different data sources with mixed characteristics. In other words, the method is capable of adaptively exploiting different sources of data, such as transactions, news and social media patterns and providing a target point estimate, as well as its uncertainty for the forecast. Temporal mixture models enable the decipherment of the time-varying effects of order book features on volatility and, among other solutions, deliver directly actionable signals for clearly defined time horizons, outperforming a variety of statistical and machine learning baselines as well as industry standards. As markets from traditional to crypto will continue to experience new moving patterns originating from old and new sources, robust and broad predictive models are crucial to not getting caught off guard.
Do you wonder how you can integrate signals derived from temporal mixture models in your business, get in touch!