[PLUS] Examining Correlations using Principal Component Analysis (PCA)
What is Principal Component Analysis?
PCA is applied in a variety of ways across a myriad of tutorials and examples. But for our purposes, we will use Principal Component Analysis (PCA) to infer correlation relationships between cryptocurrencies in the resulting plot that we will make (called a biplot). For background (and at a high level), PCA is a mathematical process that helps machine learning models decide which variables (feature sets) are important to include in the model. When more than one variable is correlated to another variable, this often becomes a problem for the model called multicollinearity, and hence why knowing inter-variable correlations is important for modeling.
We will use 6 cryptocurrency daily time series from FTX (BTC, LTC, BNB, ETH, LINK, XRP) to demo how to setup PCA analysis and interpret the results. Each of these data sets can be found on our on the FTX page. After we load the csv files into Pandas DataFrames, we will transform the 6 individual frames into one master dataframe that only contains the percentage changes day over day in the columns. We will use the returns from the last 200 days, or roughly 6 month period for the analysis.
Discussion of Results
We are using PCA analysis to determine correlations between our cryptocurrencies, and you can see the results of the principal component analysis in the below graph (called a biplot). Along the X axis the first principal component (PC1), and it accounts for 82% of the overall variation. This is good! Generally, when building a machine learning model, you would want and expect 90%+ of the variation to be explained in the first 2 principle components. Next: How do we interpret the biplot?
1) BTC & ETH have a positive correlation close to 1 as the directional impact lines are almost on top of each other (and the labels are hard to read)
2) XRP & BTC have little correlation value as their impact lines form a right, 90 degree angle.
3) LTC is more closely correlated with BNB than any of the other cryptocurrencies in our dataset.
If we had hundreds of cryptocurrencies (or variables/features), we may want to remove the ones that are extremely correlated to one another to prevent multicollinearity and overfitting the model; PCA is one tool to identify correlations and assist in variable selection.
This is a premium post. Create Plus+ Account to view the live, working codebase for this article.
Notice: Information contained herein is not and should not be construed as an offer, solicitation, or recommendation to buy or sell securities. The information has been obtained from sources we
believe to be reliable; however no guarantee is made or implied with respect to its accuracy, timeliness, or completeness. Author does not own the any crypto currency discussed. The information
and content are subject to change without notice. CryptoDataDownload and its affiliates do not provide investment, tax, legal or accounting advice.
This material has been prepared for informational purposes only and is the opinion of the author, and is not intended to provide, and should not be relied on for, investment, tax, legal,
accounting advice. You should consult your own investment, tax, legal and accounting advisors before engaging in any transaction. All content published by CryptoDataDownload is not an
THE PERFORMANCE OF TRADING SYSTEMS IS BASED ON THE USE OF COMPUTERIZED SYSTEM LOGIC. IT IS HYPOTHETICAL.
PLEASE NOTE THE FOLLOWING DISCLAIMER.
CFTC RULE 4.41: HYPOTHETICAL OR SIMULATED PERFORMANCE RESULTS HAVE CERTAIN LIMITATIONS. UNLIKE AN ACTUAL
PERFORMANCE RECORD, SIMULATED RESULTS DO NOT REPRESENT ACTUAL TRADING. ALSO, SINCE THE TRADES HAVE NOT BEEN
EXECUTED, THE RESULTS MAY HAVE UNDER-OR-OVER COMPENSATED FOR THE IMPACT, IF ANY, OF CERTAIN MARKET FACTORS,
SUCH AS LACK OF LIQUIDITY. SIMULATED TRADING PROGRAMS IN GENERAL ARE ALSO SUBJECT TO THE FACT THAT THEY ARE
DESIGNED WITH THE BENEFIT OF HINDSIGHT. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY
TO ACHIEVE PROFIT OR LOSSES SIMILAR TO THOSE SHOWN. U.S. GOVERNMENT REQUIRED DISCLAIMER: COMMODITY FUTURES
TRADING COMMISSION. FUTURES AND OPTIONS TRADING HAS LARGE POTENTIAL REWARDS, BUT ALSO LARGE POTENTIAL RISK.
YOU MUST BE AWARE OF THE RISKS AND BE WILLING TO ACCEPT THEM IN ORDER TO INVEST IN THE FUTURES AND OPTIONS MARKETS.
DON’T TRADE WITH MONEY YOU CAN’T AFFORD TO LOSE. THIS IS NEITHER A SOLICITATION NOR AN OFFER TO BUY/SELL FUTURES
OR OPTIONS. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFITS OR LOSSES
SIMILAR TO THOSE DISCUSSED ON THIS WEBSITE. THE PAST PERFORMANCE OF ANY TRADING SYSTEM OR METHODOLOGY IS NOT
NECESSARILY INDICATIVE OF FUTURE RESULTS.