Backtesting Cryptocurrency Trading With BigQuery

We have made available a free open source BitBank dataset hosted on Google BigQuery

Ill go over how to query the dataset to gain insights into the cryptocurrency markets.

The data is described more in our API docs and Bulk Data docs

Volatility!

Cryptocurrencies ordered by having the most volatile 120 minutes
select currency_pair, MAX(float(future_wstd_120) / ((best_bid_price + best_Ask_price) / 2.0) * 100.0) as wstd_percent from [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets]  
where has_future_data='1'  
group by currency_pair  
order by wstd_percent desc  
most volatile markets
SELECT STDDEV( (best_bid_price + best_Ask_price) / 2.0) as std_of_midpoint_price, currency_pair,  
avg( (best_bid_price + best_Ask_price) / 2.0) as avg_of_midpoint_price,  
(STDDEV( (best_bid_price + best_Ask_price) / 2.0) / avg( (best_bid_price + best_Ask_price) / 2.0)) as volatility
 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
group by currency_pair  
order by volatility desc  
LIMIT 1000  

It appears BTC_OMNI is a volatile pair again over this time period with stddev: 1% over this day

Correlation!

Checking how correlated price move is with future price movements
SELECT CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5,  
CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30,  
CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60,  
CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120  
 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
where has_future_data='1'  
LIMIT 1000  

results:

price_correlation5 price_correlation30 price_correlation60 price_correlation120
0.31553591396391 0.03814593626787945 0.04183696129029081 0.03856468233799804

This means the change in 5 minute price is related to the next 5 minutes in price but not so much the next 5-30, 30-60 or 60-120 minutes.

Price change in 30 minutes correlation to future price:
SELECT CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5,  
CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30,  
CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60,  
CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120  
 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
where has_future_data='1'  
LIMIT 1000  

results:

price_correlation5 price_correlation30 price_correlation60 price_correlation120
0.7973442128 0.03708915493 0.0720097425 0.07091767196

This means price changes in the last 30 minutes are strongly correlated with the next 5 minutes and still not much but more so with the 30-60 and 60-120 minute prices.

How correlated are BitBank.nz forecasts to what actually happened
SELECT CORR( estimated_future_wavg_5 , (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5,  
CORR( estimated_future_wavg_30 , (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30,  
CORR( estimated_future_wavg_60 , (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60,  
CORR( estimated_future_wavg_120 , (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120  
 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
where has_future_data='1'  
LIMIT 1000  
price_correlation5 price_correlation30 price_correlation60 price_correlation120
0.3771947898 0.1059980775 0.08641589135 0.0694471175

Although correlated this highlights there's much more work to be done optimising training and denoising our forecasts to more reliably correlate with the actual market.

BEGINRANT;
Given the amount of training and work that's gone into our algorithm it also shows how its hard to reliably predict what the market will do, especially further into the future. A common trap is to trust companies that occasionally forecast 7+ days into the future, unfortunately the compounding uncertainty can mean they are extremely inaccurate and with a lack of data on how often their forecasts are correct they lack the tools and transparency to build trust and self improve.
I do believe it is possible for high frequency trading bots with up to date forecasts to get things right on average but they don't need very long term forecasts.
ENDRANT;

Orderbook imbalance
SELECT CORR( power_imbalance , (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5,  
CORR( power_imbalance , (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30,  
CORR( power_imbalance , (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60,  
CORR( power_imbalance , (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120  
 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
where has_future_data='1'  
LIMIT 1000  
price_correlation5 price_correlation30 price_correlation60 price_correlation120
0.1668656568 0.1486553967 0.1342865412 0.1274909794

Orderbook buy pressure is correlated with upward price movement, a correlation that holds well with time

Backtesting trading strategy

To summarise our forecasts, price movement and power_imbalance are all correlated with upward price movement, so can we simply buy when they are all up and sell when they are all down?

SELECT (best_bid_price + best_Ask_price) / 2.0 as midpoint, date,  
best_bid_price, best_Ask_price,

(wavg_distance_to_midpoint_percent60min < 0 
and wavg_distance_to_midpoint_percent5min < 0  
and wavg_distance_to_midpoint_percent30min < 0  
and power_imbalance < 1  
and estimated_future_wavg_5 < 1  
and estimated_future_wavg_30 < 1  
and estimated_future_wavg_60 < 1  
and estimated_future_wavg_120 < 1  
) as should_sell

 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] 
where has_future_data='1'  
and currency_pair='BTC_ETH' 

and  
((wavg_distance_to_midpoint_percent60min > 0 
and wavg_distance_to_midpoint_percent5min > 0  
and wavg_distance_to_midpoint_percent30min > 0  
and power_imbalance > 1  
and estimated_future_wavg_5 > 1  
and estimated_future_wavg_30 > 1  
and estimated_future_wavg_60 > 1  
and estimated_future_wavg_120 > 1  
) or 
(wavg_distance_to_midpoint_percent60min < 0 
and wavg_distance_to_midpoint_percent5min < 0  
and wavg_distance_to_midpoint_percent30min < 0  
and power_imbalance < 1  
and estimated_future_wavg_5 < 1  
and estimated_future_wavg_30 < 1  
and estimated_future_wavg_60 < 1  
and estimated_future_wavg_120 < 1  
))
order by date  
LIMIT 1000  

Putting the cryptourrency trade data in a spreadsheet to analyse the selected points to buy/sell at shows the strategy does quite well.

Trading in the BTC_ETH pair the algorithm performs quite well, assuming it trades at the midpoint it turns 1 btc into 1.17 (assuming no fees) despite the overall price in ethereum going down overall in the time period, the fees would be around .17 unfortunately if there was a .15% maker fee on the 106 trades made

As this result/algorithm was developed on the assumptions of the correlations computed on this data, we will need to do another test on another dataset to verify its accuracy (we may have over-fit to this dataset when we developed this algorithm)

Putting the algorithm into production shows that its very promising :) Its been running on BitMex for nearly 1 day now having earned ~6% profit.

take my money

The actual algorithm running in production is available on the BitBank.nz Github page Youll need to signup for a free 1 day trial at BitBank.nz.

Let me know if you'd like access to run queries on larger datasets on BigQuery

Also checkout our referral program to earn .003 BTC per paying user!

For more on BigQuery also see a previous post i writ on my personal blog how.nz about Analysing App Engine logs with Google BigQuery and the bitcoin dataset on BigQuery

bonus points (.003 btc and a free BitBank.nz year) for someone who can express the spreadsheet part of the equation as a BigQuery Aggregate Stateful User Defined Function, because SQL is hard, also much more to anyone who develops something like a backtesting dashboard using the BigQuery API because that would be amazing.

Eventually we will be able to integrate tools like this into the BitBank.nz core product which will make this strategy experimentation process super smooth and easy, keep an eye out!