Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 10943

Parameter Optimization with Zipline, PiCloud, StarCluster, and IPython Parallel | Quantopian Blog

$
0
0

Comments:" Parameter Optimization with Zipline, PiCloud, StarCluster, and IPython Parallel | Quantopian Blog"

URL:http://blog.quantopian.com/zipline_in_the_cloud/


Financial trading algorithms often have parameters you have to set manually. For example, the classic dual moving average cross-over has two free parameters: the number of days for the short moving average and the number of days for the long moving average. Once the short mavg crosses the long mavg, you trigger a buy order of the stock.

Using Quantopian, you can explore how the two parameters interact with each other.   Open the Quantopian example in another browser, and click the Clone button (no account required).  The short window is defined in line 7 of the code, and the long window is in line 8.  Change them to to 5 and 10 and click build to see the result.  Change the windows again to 1 and 15 and click Build to see the result.  As you can see, the results of your choices don’t follow any obvious intuition.

To get the most bang for your buck you want to set these parameters to a value that optimizes some objective function. For example, you might want to find the parameter value that maximizes cumulative wealth, or minimizes the risk you are taking on. Fortunately for us, this is a very well studied domain known as optimization. Unfortunately for us, most devised algorithms are not applicable to our problem.

Why do classical optimization algorithms like gradient descent or convex optimization not cut it? The problem is that these optimization algorithms often have strict requirements on the objective function. As the name implies, gradient descent requires you to be able to compute a gradient. Convex optimization requires the objective function to be, well, convex. However, the relationship between how parameters of trading algorithms influence the objective function can be much more complex. If the parameter is discrete like the number of days in a moving average we certainly won’t be able to compute a gradient. In addition to these problems, running a trading algorithm on a large data set can take quite a long time. Thus, any sequential optimization routine is out of the game. What instead is required is a global method that can be run in parallel.

Running Zipline in parallel in the cloud

Zipline is our open-source backtester. While our ultimate goal is to make optimziation available on Quantopian there is some more research needed to make this happen. However, for the adventurous among you, here are some instructions on how to explore parameter ranges of an algorithm in parallel in the cloud:

1. PiCloud

PiCloud is a cloud computing service that makes it very easy to set up a cluster in the cloud and distribute work to it. For your convenience, we created an official zipline environment (called “quantopian/zipline_official”) that you can use to run your algorithm. An example IPython Notebook that demonstrates how to do this can be found here.

2. StarCluster + IPython Parallel

StarCluster is a Python package that greatly simplifies the process of setting up a cluster on Amazon EC2. A plugin allows you to run an IPython Parallel cluster on top of your EC2 nodes. You can then distribute your jobs to this cluster from your local IPython session. I recently gave a talk about this at PyData SC’13. Here are the accompanying slidesIPython Notebook, and video.

Zipline in the Cloud: Optimizing Financial Trading Algorithms from PyData on Vimeo.

Optimization algorithms

Above I outlined how one can run zipline in parallel to explore the parameter space over equally spaced points — this is generally known as grid-search. However, in most cases searching the whole parameter space is incredibly wasteful. Fortunately, this is an active area of research in Machine Learning called hyperparameter optimization from which we can borrow.

At first I have been looking at genetic algorithms and particle swarm optimization as they are heuristic and require few assumptions about the parameter landscape. Lately, however, I am more closely looking at Bayesian Optimization using Gaussian Processes to find points that most likely will lead to an improvement given some regularity assumptions on the objective function (see the image below). The paper describing this idea was presented at NIPS 2012. There’s even an open-source toolbox that implements this method (see also this blog post by Michael Hughes). Finally, you might also be interested in hyperopt, a new Python module to define and search a hyper-parameter space.

I will discuss these methods and some extended ideas in more detail in a future blog post. If you enjoyed this blog post, follow me and quantopian on twitter, and read the first installment on Walk-Forward optimization.

 

This entry was posted on Thursday, April 11th, 2013 at 3:48 pm and is filed under Optimization. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


Viewing all articles
Browse latest Browse all 10943

Trending Articles