Loading...

[PLUS] Applying Logistic Regression to Make Predictions About the Next Day in Python


Logistic Regression Prediction
Logistic regression is a statistical method for modeling the probability of a binary outcome, making it a popular choice for binary classification problems. It is particularly useful when the response variable is categorical (e.g., yes/no, pass/fail, win/lose) and you are interested in predicting the likelihood of one of these categories based on one or multiple predictor variables. Logistic regression is also robust to small noise in the dataset and less prone to over-fitting. What is our binary outcome in this example? We will predict whether or not the next day is positive, above 0% return, or not. Our data set will come from BTC on-chain block meta statistics (available for our members as part of Plus+ Files).

**Model feature selection is crucial to model performance - what are our features? Just the on-chain block data. Model performance metrics could be improved by identifying (or removing) features. **


The Code
The following is the progression of the working Python code to create a logistic regression model based on data and ultimately make predictions. First, load all of the relevant data sets and format into correct features (removing unnecessary values like "block_number"). Next we setup our target and features variables using our Pandas dataframe. If our data is split with the target variable removed, we will pick a portion to set aside for testing and then train the rest of the data. In this case, we set split % to 20%. So 80% will be used for training, and then 20% will be used for evaluating the model. Next we define our scaler for our numerical values (this shrinks range of values for ML model to digest better) - we can use either a MinMaxScaler which scales everything between 0 and 1. The other type of scaler is a StandardScaler, which standardizes the distribution of the data.

The next big thing we do is we setup a pipeline. Essentially, we define a few steps that our model will do every time, like do scaling and then create the model. Then we will pass this pipeline into a Randomized Search for the best hyperparameters to use that will result in best model performance. This model is defined for us as the "best model". We will then use this model to make predictions on the unseen portion of data, which is known as our test set. Probabilities will then be calculated and model performance assessed. As always, every line of code is commented for your benefit. Feel free to modify and implement as you wish.

** Model does not guarantee results **

This is a premium post. Create Plus+ Account to view the live, working codebase for this article.




Notice: Information contained herein is not and should not be construed as an offer, solicitation, or recommendation to buy or sell securities. The information has been obtained from sources we believe to be reliable; however no guarantee is made or implied with respect to its accuracy, timeliness, or completeness. Author does not own the any crypto currency discussed. The information and content are subject to change without notice. CryptoDataDownload and its affiliates do not provide investment, tax, legal or accounting advice.

This material has been prepared for informational purposes only and is the opinion of the author, and is not intended to provide, and should not be relied on for, investment, tax, legal, accounting advice. You should consult your own investment, tax, legal and accounting advisors before engaging in any transaction. All content published by CryptoDataDownload is not an endorsement whatsoever. CryptoDataDownload was not compensated to submit this article. Please also visit our Privacy policy; disclaimer; and terms and conditions page for further information.

THE PERFORMANCE OF TRADING SYSTEMS IS BASED ON THE USE OF COMPUTERIZED SYSTEM LOGIC. IT IS HYPOTHETICAL. PLEASE NOTE THE FOLLOWING DISCLAIMER. CFTC RULE 4.41: HYPOTHETICAL OR SIMULATED PERFORMANCE RESULTS HAVE CERTAIN LIMITATIONS. UNLIKE AN ACTUAL PERFORMANCE RECORD, SIMULATED RESULTS DO NOT REPRESENT ACTUAL TRADING. ALSO, SINCE THE TRADES HAVE NOT BEEN EXECUTED, THE RESULTS MAY HAVE UNDER-OR-OVER COMPENSATED FOR THE IMPACT, IF ANY, OF CERTAIN MARKET FACTORS, SUCH AS LACK OF LIQUIDITY. SIMULATED TRADING PROGRAMS IN GENERAL ARE ALSO SUBJECT TO THE FACT THAT THEY ARE DESIGNED WITH THE BENEFIT OF HINDSIGHT. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFIT OR LOSSES SIMILAR TO THOSE SHOWN. U.S. GOVERNMENT REQUIRED DISCLAIMER: COMMODITY FUTURES TRADING COMMISSION. FUTURES AND OPTIONS TRADING HAS LARGE POTENTIAL REWARDS, BUT ALSO LARGE POTENTIAL RISK. YOU MUST BE AWARE OF THE RISKS AND BE WILLING TO ACCEPT THEM IN ORDER TO INVEST IN THE FUTURES AND OPTIONS MARKETS. DON’T TRADE WITH MONEY YOU CAN’T AFFORD TO LOSE. THIS IS NEITHER A SOLICITATION NOR AN OFFER TO BUY/SELL FUTURES OR OPTIONS. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFITS OR LOSSES SIMILAR TO THOSE DISCUSSED ON THIS WEBSITE. THE PAST PERFORMANCE OF ANY TRADING SYSTEM OR METHODOLOGY IS NOT NECESSARILY INDICATIVE OF FUTURE RESULTS.

Latest Posts
Follow Us
Notify me of new content