Project 3: Summary

Algorithmic Trading with Artificial Intelligence


Uncovering Market Trends in SEC Filings with Machine Learning and NLP.

Summary

Our research at Texas State University aimed to use artificial intelligence to predict stock market movements by analyzing SEC 10-K financial reports. We focused on whether NLP techniques could uncover predictive insights from the "Management’s Discussion and Analysis" section. Our team, including Jack Burt, Andrew Hocher, and myself, under the guidance of Professor Tahir Ekin, sought to leverage big data and AI to test the Efficient Market Hypothesis.

We structured our investigation around the Efficient Market Hypothesis (EMH), which suggests that financial markets reflect all available information, making it difficult to achieve returns above the market average consistently. Our goal was to test this theory by applying NLP techniques to 10-K reports to see if we could uncover hidden predictive insights. We formulated two hypotheses: the null hypothesis (H₀) stating that NLP analysis of 10-K reports would not provide insights surpassing market performance, and the alternative hypothesis (Hₐ) proposing that NLP could indeed yield predictive insights capable of outperforming the market.

Our methodology involved sourcing 10-K reports from 2000 to 2023, focusing on Item 7, which discusses management’s perspective on the company’s financial health and operations. We preprocessed the data by removing HTML tags, replacing contractions, and tokenizing the text. We then applied various vectorization techniques and used several NLP and machine learning models, including FinBert, TextBlob, Chat GPT API, Naive Bayes, Random Forest, and XGBoost, to analyze the sentiment and predict stock movements.

The results of our study showed that traditional machine learning models, particularly XGBoost and Naive Bayes, outperformed sentiment analysis models. However, the overall stock market still performed better than all the models tested. The ensemble models, which combined multiple model outputs, showed potential but also exhibited higher investment risk. These findings reinforce the Efficient Market Hypothesis, indicating that markets efficiently incorporate public information into stock prices, making it challenging to achieve superior returns through 10-K report analysis alone.

In conclusion, our research demonstrated that while certain AI models showed slight predictive capability, they did not consistently outperform the market. This outcome highlights the efficiency of financial markets and the limitations of using NLP and machine learning to predict stock movements based on 10-K reports. For future research, we recommend exploring more advanced computational techniques, incorporating diverse data sources such as social media and news articles, and collaborating with experts in statistics and finance to further enhance the predictive power of AI in financial market analysis.


Below is an embedded version of the web app I built for this project. This app takes your selection of a stock and grabs thier most recent 10k. From there it applies our models and makes a prediction based on our Naive Bayes Meta Model. For best performace open the tool in a new tab with the "Fullscreen" button in the bottom of the embed. Once you enter your stock it may take a minute or two, to output your results. Link to tool: https://trading-ai.streamlit.app/   

The Full Project Analysis can be found here: Full Analysis

The code for the project can be found here: Github