Loading...

Fixing Data Issues With Volumes in Timeseries


Dealing With Erroneous Volume Metrics
We store data for cryptocurrencies and organize by the respective, individual exchanges. This data is often directly extracted from the exchange's API, and so it should be highly reliable. However, when dealing with timeseries data, it is almost a certainty to encounter bad metrics from time to time. Sometimes there are extremely plausible explanations for why an interval data point may be corrupted. For cryptocurrencies that trade 24/7, exchanges may be unavailable for trading due to system maintenance (scheduled or unscheduled). In the timeseries data on CryptoDataDownload, these "errors" often appear as data points with volumes of zero. While often legitimate, the missing volume becomes a potential problem when trying to fit a model to the data.

Exploring Imputation Solutions
There are several techniques that can be used to either backfill (impute) the missing volume metrics, or smooth them out by interpreting them from the overall volume distribution. Each technique has pros and cons that vary, and some require a deeper level of technical calculation. The list below is a non-exhaustive use of techniques that may be used to impute missing volume metrics in cryptocurrency data timeseries.

    1) Take simple average between periods. If there is only one volume data point missing between 2 others, you can take a simple "average" between the valid points to smooth out the missing. This method is far the simplest.
    2) Carry-forward the previous time intervals' volume data point. This is also a simple, straight forward solution that carries forward the previous valid volume metric to fill in the missing point.
    3) Discard the missing timeseries. Certain models may not need continuity between time intervals in order to fit the model. If this is the case, it may make sense to discard the missing values rather than training the model with a "synthetic" volume metric.
    4) Use quantiles from the distribution to fit the missing data point(s). This technique is by far the most involved compared to the previous three. The goal is to find the 50th percentile from the volume distribution in order to find the most "central" or average value to use. This requires you to use a parametric function on the entire distribution of returns, and find the percentile that corresponds to the middle of the distribution (50%). Then you would use this value to impute the missing volumes.




    Notice: Information contained herein is not and should not be construed as an offer, solicitation, or recommendation to buy or sell securities. The information has been obtained from sources we believe to be reliable; however no guarantee is made or implied with respect to its accuracy, timeliness, or completeness. Author does not own the any crypto currency discussed. The information and content are subject to change without notice. CryptoDataDownload and its affiliates do not provide investment, tax, legal or accounting advice.

    This material has been prepared for informational purposes only and is the opinion of the author, and is not intended to provide, and should not be relied on for, investment, tax, legal, accounting advice. You should consult your own investment, tax, legal and accounting advisors before engaging in any transaction. All content published by CryptoDataDownload is not an endorsement whatsoever. CryptoDataDownload was not compensated to submit this article. Please also visit our Privacy policy; disclaimer; and terms and conditions page for further information.

    THE PERFORMANCE OF TRADING SYSTEMS IS BASED ON THE USE OF COMPUTERIZED SYSTEM LOGIC. IT IS HYPOTHETICAL. PLEASE NOTE THE FOLLOWING DISCLAIMER. CFTC RULE 4.41: HYPOTHETICAL OR SIMULATED PERFORMANCE RESULTS HAVE CERTAIN LIMITATIONS. UNLIKE AN ACTUAL PERFORMANCE RECORD, SIMULATED RESULTS DO NOT REPRESENT ACTUAL TRADING. ALSO, SINCE THE TRADES HAVE NOT BEEN EXECUTED, THE RESULTS MAY HAVE UNDER-OR-OVER COMPENSATED FOR THE IMPACT, IF ANY, OF CERTAIN MARKET FACTORS, SUCH AS LACK OF LIQUIDITY. SIMULATED TRADING PROGRAMS IN GENERAL ARE ALSO SUBJECT TO THE FACT THAT THEY ARE DESIGNED WITH THE BENEFIT OF HINDSIGHT. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFIT OR LOSSES SIMILAR TO THOSE SHOWN. U.S. GOVERNMENT REQUIRED DISCLAIMER: COMMODITY FUTURES TRADING COMMISSION. FUTURES AND OPTIONS TRADING HAS LARGE POTENTIAL REWARDS, BUT ALSO LARGE POTENTIAL RISK. YOU MUST BE AWARE OF THE RISKS AND BE WILLING TO ACCEPT THEM IN ORDER TO INVEST IN THE FUTURES AND OPTIONS MARKETS. DON’T TRADE WITH MONEY YOU CAN’T AFFORD TO LOSE. THIS IS NEITHER A SOLICITATION NOR AN OFFER TO BUY/SELL FUTURES OR OPTIONS. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFITS OR LOSSES SIMILAR TO THOSE DISCUSSED ON THIS WEBSITE. THE PAST PERFORMANCE OF ANY TRADING SYSTEM OR METHODOLOGY IS NOT NECESSARILY INDICATIVE OF FUTURE RESULTS.

Latest Posts
Follow Us
Notify me of new content