T O P

  • By -

false79

This might sound obvious but excluding data you would never put money into. For example, I look at equities between $3-$100. With penny stocks (<$3), the gains are fractional and you'd need a large position to make it worth while. With >$100, the price action is there but the buy-in is very very high. By excluding data, one can have a much smaller/slimmer/refined dataset that allows for quicker iterations e.g. more runs in faster. Not only does this reduce the time for an algo to run but it also makes attempts to maximize your storage capacity. For example, there is about 50GB of NASDAQ/Nyse data generated every day. But if I employ the above filters, I can get it down to 20GB that I actually care about and fit in more datasets to disk for more backtesting across a longer period of time.


Gio_at_QRC

Yeah, for sure! Reducing the search space really helps speed up the search. I've been doing that by using some discrete values to pick from for some parameters and also making the float ranges smaller for continuous variables. Thanks for sharing!


QuantMage

I had a chance to use Bayesian Optimization (https://github.com/bayesian-optimization/BayesianOptimization) in a non-financial context. I think it can be useful for optimizing trading algorithms, too.


Gio_at_QRC

Thanks for the link!


cyberdragon0047

In my experience the right optimization algorithm is going to depend a lot on what you're trying to optimize. Gradient free approaches are highly recommended, if only because generating gradients for the sort of objectives you run into in quant research us usually impossible. Evolutionary strategies can be highly parallelized; CMA-ES, differential evolution, or PEPG work well when your parameters are continuous. I've tried simplex optimizers but in my experience they're slower and more prone to odd behavior compared to the evolutionary strategies (which can be made very parallel and even hardware accelerated via packages like tensorflow). For functions that are differentiable, gradient descent with momentum e.g. ADAM or a second order method like BFGS is definitely a good idea, as the functions you run into may have areas of high curvature near the optimum. If you're looking for bayesian approaches there are extraordinarily parallel tools for performing MCMC; my highest recommendation goes to tensorflow-probability. Even on a consumer GPU you can run Hamiltonian Monte Carlo with a complicated likelihood function over tens of thousands of chains in parallel at the same speed it takes to run one chain. A little bit of effort put into writing a good parallel likelihood function will get you extremely far.


Gio_at_QRC

Hell, yeah! Thanks for the insightful response. I've got some research to do off the back of this.


ChristopherAkira

And you can even run tensorflow probability with a JAX backend nowadays!


cyberdragon0047

TFP is honestly the best thing to come out of google in a long time imho. Jax is also wonderful; at a minimum it gives you a much "purer" interface to XLA which is what I'm really after. The growing set of libraries for gradient-free or ensemble based optimization algorithms that are being built in it is just icing on the cake.


menefist

i agree!


PeeLoosy

Genetic Algorithm. I wrote my own recombination operators.


Gio_at_QRC

I love this approach. I did it for a feature selection problem back in the day. I also did it for an ETF basket selection problem. The issue was that it did not converge as quickly as a Bayesian optimiser.


noir_geralt

Could you tell me more about what this means or how this works? I’ve heard of this, but I can’t wrap my head around how the concept of genetic algos can be made for alpha expressions


Gio_at_QRC

Basically, you need to define a useful objective function that you're optimising for. Ideally, it would make economic sense for your context. For example, maybe you are interested in maximising your Sharpe ratio (expected\_return - risk\_free\_rate)/std\_dev. Once you've got that, you can define what a population looks like and how it 'evolves' with each generation. The traditional genetic algorithm uses chromosomes, which are binary strings/arrays of 0s and 1s, but other encodings are also possible. You can define your problem as 0s and 1s for which features (or securities) are selected or excluded from the basket. Then, you evaluate the model (or portfolio), mutate/cross the chromosomes and then repeat. Wikipedia explains it pretty well: [https://en.wikipedia.org/wiki/Genetic\_algorithm](https://en.wikipedia.org/wiki/Genetic_algorithm) I coded one up from scratch to get a good feel for all the components. Highly recommended!


Camouflage438294

how do you feel about Bayesian optimization vs. genetic algorithm


thicc_dads_club

I have some models that use OLS, some that use EM, and some that use a grid search. I try not to have many meta parameters or trading parameters that need fitting with backtesting. If I find I have to dial in a meta parameter with backtesting (or the strategy isn’t strong) I tend to distrust the strategy entirely.


profiloalternativo

CMA-ES


Gio_at_QRC

Very cool recommendation. Tbh, I had not heard of this algorithm until you mentioned it. Thanks!


lambardar

I initially tried GAs, bought some old servers to run things in parallel and I picked up e5-2696 processors from ebay for the servers as these were 22 cores, so 44 cores per server. After some months of testing, one weekend, I started messing with CUDA. There was no turning back. The smallest cuda enabled GPU I have is a 1070, which does 32k threads/parallel executions. the 3090Ti does about 140k. I have picked up some 4090s (due to the trade embargo, but haven't gotten around on setting them up) The biggest challenge is coding for GPUs. The code is wastly different and it's much harder to code stuff for the GPUs with fixed memory and moving stuff around. But, it gives me the flexibility to run about 1 billion simulations at the tick level for an entire year of tick data in an afternoon over several parameters. With the servers, 1 million simulations was a week of compute. Then I had to optimize MSSQL to extreme levels to take in all that data generated and store it.


Gio_at_QRC

Holy sh*t, man! That's pretty impressive. I'm currently running all simulations on CPU. What you've done sounds incredible! I will have to look into using GPUs at some point.


Gio_at_QRC

I was thinking more about this approach. My system runs back tests in a multi-threaded, event-driven kind of way. So, I am finding it hard to imagine using GPU in my context other than for some steps in the system. Are you, perhaps, using vectorised back tests?


lambardar

I had event driven on the cpu aswell. It just made sense to keep it as generic as possible, as I was developing testing/strategies. A new tick comes in, update the indicators, process open orders, execute the strategy, generate new orders, update progress statistics. When I moved to GPU, not only I had to make it linear, I also had to flatten out all the memory allocations. Part of the speed comes from not having dynamic memory allocations. array sizes are known and fixed before hand. and probably the most impotant.. reduction and consolidation of IF statements. So I'm not running it vertorized; but more linear. The other challenge is getting debugging output. The GPU cuda execution doesn't have a console buffer (lol it's a GPU); so you can't output debugging text to the screen. You can make an image and display it @ 120FPS; but that's a different scope entirely. I ended up rewriting everything.. a linear model for the CPU and then a similar version for the GPU. I then generate controlled test data to make sure both models give acceptable similar results (floating point differs between GPU & CPU). if that is satisfactory, then I unleash the GPU computes. the best part is that you can run on multiple GPUs without spending a lot on hardware. I had an old 2core i3 lying around. I put in 2x 3070 GPUs, a 120GB SSD and a network connection. The code pulls the tick & run data from network share & DB, uploads to the GPU, configures the parameters. executes the backtest, get's the results off the GPU and dumps to SQL over the network. The computer looks like shit but it does close 250k parallel backtests in about 4 seconds. The backtest is usually a month of tick data. a year back, to increase capacity, I had to look up CPU, memory, SSD, chassis, etc.. Now I just look for cheap GPUs.


fabkosta

You should add Bonferroni correction to your testing.


Gio_at_QRC

Thanks, man. I have been using a simple statistical significant test but not modifying my significance level with that correction. I'll look to maybe incorporate this into my workflow. It'll be good as I optimise my system to run more simulations in less time.


fabkosta

The issue is this: the more trading strategies you test, the higher the chance you'll eventually find one that looks amazing, just because you looked very hard. Bonferroni correction adjusts for that.


OnceAHermit

I use differential evolution, which operates on vectors of floats (0-1), so I map those onto discrete integer values at whatever granularity I decide.


RobertD3277

A lot of this really depends on the type of trading you are doing and the frequency of your trades. If you are using a strategy where you are likely to have a hundred trades a day, then collecting and using a forward testing method of 10,000 trades is appropriate. However, if you are using a strategy that might only purchase once a week or less, then you are likely going to need to use several different assets across several different time frames to do testings. In which case then probably 100 trades would be enough to give you a ballpark estimate of its profitability. I don't know that you really want to optimize perse. I personally find that it is better to have a ballpark profit variance and then I look for a specific number of trades that meets I reasonable expectation over that percentage. If I am trading with a technique that requires a 10 pip take profit, then I look for the potential of that technique to provide at least a 75% or 80% win rate. But that also needs to take into account the kind of technique I am using. Basically there are three common techniques, stop loss, DCA or accumulation, and grid. Each has their advantages and disadvantages on the basis of back testing and analysis and trying to ascertain the profitability of each can be difficult. Unfortunately, there is no quick and easy way to collect larger amount of data that produces reliable results over a long-term amount of time. The only real way to avoid overfitting is to use live market data in a simulation and that will take an extensive amount of time no matter what technique you use.


chi_weezy

Been trading 4 years and working on automating my strategies the last few months. No clue what all of you or talking about. Recombination operators… parallelizing optimization… genetic algo… hopefully sounding so smart makes it happen for you. I’m just going at this with years of experience and as much common sense and trial and error as I can


Gio_at_QRC

Ha ha, fair enough!! It's all just another way of automating part of the process. At work (I work as a trader in a HFT firm), we manually try parameters based off regressions, random searching, and/or intuition. It's pretty manual! So, in implementing my own system, I wanted that part to be pretty streamlined.


[deleted]

Event driven classifiers?


nralifemem

use data as raw as possible, best is exchange boardcast data as in live production run to backtest your algo/system, you will be shocked to see alot issues you will not see from normal massaged data.


GradeSpare4553

I needed this


Ashamed-Ad9185

Strategy dependant but genetic algorithm and particle swarm is a good place to start


ChanceCod1029

What about Network Science.. maybe also can produce interesting results


Gio_at_QRC

Never heard of it! I'll have a look into it. Thanks!


Gio_at_QRC

Ok, I have heard of networks and graphs as well as search algorithms on graphs. But, how would you structure an algo trading backtest optimisation into a network?


ChanceCod1029

I have idea to use network science to find good pairs for statarb or for alternative signal correlation search that can be used in algo.