Ideas For Extending Length of Timeseries History
Insufficient Length of Timeseries Data
When developing models (machine learning, statistical, linear etc), most practitioners prefer to utilize as much data history as possible. This is prioritized so that the model can be calibrated across a variety of trading environments and market regimes, making the model more robust. When seeking historical cryptocurrency data, the length of the data history per exchange can present some challenges. For starters, not all exchanges were founded and came into existence at the same time. Binance, for example, was not on the scene until 2017 - so it is impossible to get BTC/USDT history from Binance before they existed! Other exchanges have been around since 2014 (Gemini), and so a user may wonder how can he extend his Binance data using the Gemini dataset. Given this is a common problem that others will encounter, we will outline how to extend the Binance timeseries using Gemini's data, and how to blend the new Gemini volumes into sync with Binance volumes.
Blending the Data
At first glance, it may seem to be a very easy solution --> Start with the Binance data, and then just add the Gemini data to it! Very easy indeed. But most models will want to use the Volume field as a feature / variable to the model; and the volumes between Binance and Gemini are drastically different. These differences reflect a different size user base that trade on each of the respective exchanges. What might be a way to resolve this? Or said another way, "How can we convert Gemini volumes into Binance volumes?"
1) Create a new column for the ratio between Binance volumes and Gemini Volumes
2) For each time interval, divide the Binance volume by the Gemini Volume to arrive at a ratio. These ratios will change daily (as volumes and participation change daily), and so you will want to decide on a window in time of a month or greater
3) Take a simple average of all the calculated ratios over the window period to come up with one ratio
4) For each of the Gemini volumes, apply the calculated ratio to the proxied Gemini data volumes by multiplying the Gemini volume by the ratio
Congratulations, you've now extended your Bitcoin timeseries data history by 3+ years (and many many time intervals if you are using minute data!) Please note that there is one consideration that is not accounted for here when blending the data: volumes between exchanges can cause the ratio to shift over time as more (or less) users use a particular exchange. For this reason, you may want to set your ratio window to be around the period of time where the two distinct data sets are joined together.
Notice: Information contained herein is not and should not be construed as an offer, solicitation, or recommendation to buy or sell securities. The information has been obtained from sources we
believe to be reliable; however no guarantee is made or implied with respect to its accuracy, timeliness, or completeness. Author does not own the any crypto currency discussed. The information
and content are subject to change without notice. CryptoDataDownload and its affiliates do not provide investment, tax, legal or accounting advice.
This material has been prepared for informational purposes only and is the opinion of the author, and is not intended to provide, and should not be relied on for, investment, tax, legal,
accounting advice. You should consult your own investment, tax, legal and accounting advisors before engaging in any transaction. All content published by CryptoDataDownload is not an