Data Analysis / Personal Project

Sentiment Analysis Results

September 03, 20233 min read
Second Markdown Post
Photo from Unsplash

Sentiment Analysis

Twitter Sentiment Analysis on Elon Musk Tweets for Trading TSLA Stocks

I initiated this project earlier this year before the takeover of Twitter by Elon Musk and subsequent changes to the Twitter API. I utilized a third-party scraper called snscrape to extract tweets. However, due to time constraints during my final semester at the university, I couldn't delve deeper into the data analysis after cleaning the data and creating a dictionary with the VADER (Valence Aware Dictionary for Sentiment Reasoning) for sentiment analysis.

My initial plan was to construct a model based on the sentiments extracted from the tweets and correlate it with the average daily adjusted close price of TSLA stocks. To expedite the project within the given time frame, I opted for a Hugging Face Roberta Model trained on Twitter data. This allowed me to generate a database of compound sentiment scores, weighted for each day. I then visualized this data as a time series after normalizing the values.

As it stands, the project is relatively straightforward and can be executed easily. The hypothesis test I conducted leads to the conclusion that there is no significant correlation between Elon Musk's tweet sentiment and the performance of TSLA stock from November 26, 2021, to November 25, 2022.

Shortcomings

One of the primary limitations of this project is the insufficiency of data to establish a meaningful correlation and conduct a comprehensive hypothesis test. After merging the data, I had a maximum of 170 data points. The number of tweets varied from day to day, making it necessary to recalculate the daily average sentiment depending on the tweet frequency.

Future Directions

To expand this project further, I plan to develop an actual model regardless of the outcome of the hypothesis test. This will allow me to compare the model's results with the actual data. The goal is to explore whether a neural network can identify any patterns despite the limited number of data points for learning. I will provide an update on this endeavor in next week's post.

The results of this project will be published in the GitHub readme, and I will also create a Flask application to enable checking and running the model on recent tweets.

The graphs will be uploaded to the blog post in the near future for reference. At this time the graphs can be viewed when browsing the jupyter notebook under the folder Python Files/Notebooks/Data_Cleaning.ipynb.

Personal ProjectBlogData ScienceData Analysis