[PLUS] Examining Correlations using Principal Component Analysis (PCA)

What is Principal Component Analysis?
PCA is applied in a variety of ways across a myriad of tutorials and examples. But for our purposes, we will use Principal Component Analysis (PCA) to infer correlation relationships between cryptocurrencies in the resulting plot that we will make (called a biplot). For background (and at a high level), PCA is a mathematical process that helps machine learning models decide which variables (feature sets) are important to include in the model. When more than one variable is correlated to another variable, this often becomes a problem for the model called multicollinearity, and hence why knowing inter-variable correlations is important for modeling.

Our Process
We will use 6 cryptocurrency daily time series from FTX (BTC, LTC, BNB, ETH, LINK, XRP) to demo how to setup PCA analysis and interpret the results. Each of these data sets can be found on our on the FTX page. After we load the csv files into Pandas DataFrames, we will transform the 6 individual frames into one master dataframe that only contains the percentage changes day over day in the columns. We will use the returns from the last 200 days, or roughly 6 month period for the analysis.

Discussion of Results
We are using PCA analysis to determine correlations between our cryptocurrencies, and you can see the results of the principal component analysis in the below graph (called a biplot). Along the X axis the first principal component (PC1), and it accounts for 82% of the overall variation. This is good! Generally, when building a machine learning model, you would want and expect 90%+ of the variation to be explained in the first 2 principle components. Next: How do we interpret the biplot?
1) BTC & ETH have a positive correlation close to 1 as the directional impact lines are almost on top of each other (and the labels are hard to read)
2) XRP & BTC have little correlation value as their impact lines form a right, 90 degree angle.
3) LTC is more closely correlated with BNB than any of the other cryptocurrencies in our dataset.

If we had hundreds of cryptocurrencies (or variables/features), we may want to remove the ones that are extremely correlated to one another to prevent multicollinearity and overfitting the model; PCA is one tool to identify correlations and assist in variable selection.

This is a premium post. Create Plus+ Account to view the live, working codebase for this article.

Notice: Information contained herein is not and should not be construed as an offer, solicitation, or recommendation to buy or sell securities. The information has been obtained from sources we believe to be reliable; however no guarantee is made or implied with respect to its accuracy, timeliness, or completeness. Author does not own the any crypto currency discussed. The information and content are subject to change without notice. CryptoDataDownload and its affiliates do not provide investment, tax, legal or accounting advice.

This material has been prepared for informational purposes only and is the opinion of the author, and is not intended to provide, and should not be relied on for, investment, tax, legal, accounting advice. You should consult your own investment, tax, legal and accounting advisors before engaging in any transaction. All content published by CryptoDataDownload is not an endorsement whatsoever. CryptoDataDownload was not compensated to submit this article. Please also visit our Privacy policy; disclaimer; and terms and conditions page for further information.

Latest Posts
Follow Us
Notify me of new content