Skip to content

Natural Language Processing to analyze sentiment and narrow down a list of candidate stocks. Technical analysis included.

Notifications You must be signed in to change notification settings

StrawhatRA/NLP-Tweet-Investor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 2: Investing in BTFD!

When's the best time to buy that beaten down stock?


Link to Presentation Slide


Using Fintech tools to identify Stock Investment Opportunities

Poor Trader Joe. After learning in Project #1 that he’ll never be able to afford a house by solely completing passive investments, he’s decided to try his hand in speculative trading! Joe is looking to buy when there's 'blood in the streets'by using Natural Language Processing (NLP's) to analyze sentiment and narrow down a list of candidate stocks. Then, Joe will run technical analysis to see if there are anomalies or outlier deviations from moving averages. Lastly, if the potential stock passes both of these filters, Joe will utilize a classification model to help him predict if the stock will go up or down the next day. By using this 'goldilocks' approach as a pre-trade checklist, Joe takes control of his destiny instead of relying on blind recommendatioon from gossip or news articles!

Joe's Approach

Joe's on the prowl for hot-of-the-press bearish news of pummeled stocks. Unfortunately, Joe finds traditional media too unreliable and always late to the party! Instead, Joe turns to Twitter. Reasoning that any new info would arrive much quicker than average news articles, Joe decides to look through the tweets of some big name traders. Joe builds a list of traders he respects, then begins his NLP analysis to narrow down a list of "beaten down" stocks. Once he narrows the list, he'll use technical analys and a custom decision tree to further identify potential performance of the stock. Maybe this way he'll be able to post his 'gainz' on r/WSB!

Process

  1. NLP Analysis : First, Joe needs to find a nice and bloody stock! He'll utilize the Twitter API to create a custom dataframe of tweets containing negative sentiment.
  2. Technical Analysis : Next, Joe will perform Technical Analysis and compare the stock price vs. the 21/50/200 EMAs. He'll look for extreme outliers, generate signals, then compare the general perfomance of the signals 1 month after relative to SPY.
  3. Decision Tree Mode; : Lastly, what does the decision tree model say about this stock? Will the model forecast the next trading session to move up or down?

Data Source

We utilized tthe Twitter API and Yahoo Finance.

Libraries

This project required the following libraries: pandas, numpy ,tweepy, hvplot, graphviz, matplotlib, nltk, sklearn & yfinance.

Twitter API

Joe is nervous about choosing the right stocks to buy (afterall, Wall Street looks scary when you live with your mom!) The only thing Joe is sure of is that the traditional news outlets are too slow to give him an edge. Instead, Joe uses Twitter to find out what's going on in the markets. He seeks the advice and chatter of top traders to help him make sense of it all. After curating a handful of the most trusted and popular active traders, he builds a collection of their tweets and uses the NLTK library to derive their sentiments. For Joe's 'Buy-the-Dip' strategy, ultra negative sentiment represents a valued investment opportunity! Filtering through the data using NLTK is a much better use of his time (even if he did appreciate all the pictures of food!)

Success! He's found a stock with negative sentiment: $OLLI

Technical Analysis

Joe pulls 5 years of data of $OLLI and $SPY. He then plots the closing prices and the 21, 50, and 200 EMAs. He builds a dataframe comparing the current price vs. the three EMAs, standardizes the deltas, and looks for anomolies. He uses a 1.5 std.dev equivalent (or 13th percentile) to find the extreme outliers of %moves away from the EMAS. From here, he can see that the range of price moves away from the 21EMA.

He builds a dataframe, initiates a signal, then plots his findings.

In order to backtest the performance of his signal, Joe plots the average returns of his stock 1 trading month after the signal. Is the price higher or lower? How about %return relative to the SPY?

Lastly, Joe wants to try his hand at linear regression. He's looking to see if there are any correlations between the "%away from the 21EMA" compared to the average return of the stock 20 days later. Unfortunatley, the r-squared value is incredibly low, and therefore his model is unreliable!

Decision Tree Model

Joe chooses to build a classification model to help him determine if the stock's price will go up or down during the next trading session. Joe uses 7 years of historical pricing and volume data to build technical indicator features and train his model. He then uses a PCA method to reduce the dimensionality of his features in an effort to improve the model's effectiveness

Conclusion

Joe was able to identify a few 'beat down' stocks from Twitter based on sentiment analysis. OLLI was considered the best option avaiilable (because it was the worst!), so Joe ran his pre-trade checklist BTFD system. His NLP analysis found the stock, then his TA analysis measured some EMA vs. price anomolies, and finally his decision tree model provided the green light to investing! As for the accuracy of his system, although his TA analysis gave relatively good returns, unfortunately his Decision Tree model needs fine tuning. The classification model only yielded a 53% accuracy, proving it's not very reliable. Joe needs to take a better look at his feature selection, engineering, and probably use more data to better train his model. You live you learn (and hopefully earn along the way!)

About

Natural Language Processing to analyze sentiment and narrow down a list of candidate stocks. Technical analysis included.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%