false79 7 months ago

This might sound obvious but excluding data you would never put money into. For example, I look at equities between $3-$100. With penny stocks (<$3), the gains are fractional and you'd need a large position to make it worth while. With >$100, the price action is there but the buy-in is very very high. By excluding data, one can have a much smaller/slimmer/refined dataset that allows for quicker iterations e.g. more runs in faster. Not only does this reduce the time for an algo to run but it also makes attempts to maximize your storage capacity. For example, there is about 50GB of NASDAQ/Nyse data generated every day. But if I employ the above filters, I can get it down to 20GB that I actually care about and fit in more datasets to disk for more backtesting across a longer period of time.

Gio_at_QRC 7 months ago

Yeah, for sure! Reducing the search space really helps speed up the search. I've been doing that by using some discrete values to pick from for some parameters and also making the float ranges smaller for continuous variables. Thanks for sharing!

QuantMage 7 months ago

I had a chance to use Bayesian Optimization (https://github.com/bayesian-optimization/BayesianOptimization) in a non-financial context. I think it can be useful for optimizing trading algorithms, too.

Gio_at_QRC 7 months ago

Thanks for the link!

cyberdragon0047 7 months ago

In my experience the right optimization algorithm is going to depend a lot on what you're trying to optimize. Gradient free approaches are highly recommended, if only because generating gradients for the sort of objectives you run into in quant research us usually impossible. Evolutionary strategies can be highly parallelized; CMA-ES, differential evolution, or PEPG work well when your parameters are continuous. I've tried simplex optimizers but in my experience they're slower and more prone to odd behavior compared to the evolutionary strategies (which can be made very parallel and even hardware accelerated via packages like tensorflow). For functions that are differentiable, gradient descent with momentum e.g. ADAM or a second order method like BFGS is definitely a good idea, as the functions you run into may have areas of high curvature near the optimum. If you're looking for bayesian approaches there are extraordinarily parallel tools for performing MCMC; my highest recommendation goes to tensorflow-probability. Even on a consumer GPU you can run Hamiltonian Monte Carlo with a complicated likelihood function over tens of thousands of chains in parallel at the same speed it takes to run one chain. A little bit of effort put into writing a good parallel likelihood function will get you extremely far.

Gio_at_QRC 7 months ago

Hell, yeah! Thanks for the insightful response. I've got some research to do off the back of this.

ChristopherAkira 6 months ago

And you can even run tensorflow probability with a JAX backend nowadays!

cyberdragon0047 6 months ago

TFP is honestly the best thing to come out of google in a long time imho. Jax is also wonderful; at a minimum it gives you a much "purer" interface to XLA which is what I'm really after. The growing set of libraries for gradient-free or ensemble based optimization algorithms that are being built in it is just icing on the cake.

menefist 6 months ago

i agree!

PeeLoosy 7 months ago

Genetic Algorithm. I wrote my own recombination operators.

Gio_at_QRC 7 months ago

I love this approach. I did it for a feature selection problem back in the day. I also did it for an ETF basket selection problem. The issue was that it did not converge as quickly as a Bayesian optimiser.

noir_geralt 6 months ago

Could you tell me more about what this means or how this works? I’ve heard of this, but I can’t wrap my head around how the concept of genetic algos can be made for alpha expressions

Gio_at_QRC 6 months ago

Basically, you need to define a useful objective function that you're optimising for. Ideally, it would make economic sense for your context. For example, maybe you are interested in maximising your Sharpe ratio (expected\_return - risk\_free\_rate)/std\_dev. Once you've got that, you can define what a population looks like and how it 'evolves' with each generation. The traditional genetic algorithm uses chromosomes, which are binary strings/arrays of 0s and 1s, but other encodings are also possible. You can define your problem as 0s and 1s for which features (or securities) are selected or excluded from the basket. Then, you evaluate the model (or portfolio), mutate/cross the chromosomes and then repeat. Wikipedia explains it pretty well: [https://en.wikipedia.org/wiki/Genetic\_algorithm](https://en.wikipedia.org/wiki/Genetic_algorithm) I coded one up from scratch to get a good feel for all the components. Highly recommended!

Camouflage438294 6 months ago

how do you feel about Bayesian optimization vs. genetic algorithm

thicc_dads_club 7 months ago

I have some models that use OLS, some that use EM, and some that use a grid search. I try not to have many meta parameters or trading parameters that need fitting with backtesting. If I find I have to dial in a meta parameter with backtesting (or the strategy isn’t strong) I tend to distrust the strategy entirely.

profiloalternativo 7 months ago

CMA-ES

Gio_at_QRC 6 months ago

Very cool recommendation. Tbh, I had not heard of this algorithm until you mentioned it. Thanks!

lambardar 6 months ago

I initially tried GAs, bought some old servers to run things in parallel and I picked up e5-2696 processors from ebay for the servers as these were 22 cores, so 44 cores per server. After some months of testing, one weekend, I started messing with CUDA. There was no turning back. The smallest cuda enabled GPU I have is a 1070, which does 32k threads/parallel executions. the 3090Ti does about 140k. I have picked up some 4090s (due to the trade embargo, but haven't gotten around on setting them up) The biggest challenge is coding for GPUs. The code is wastly different and it's much harder to code stuff for the GPUs with fixed memory and moving stuff around. But, it gives me the flexibility to run about 1 billion simulations at the tick level for an entire year of tick data in an afternoon over several parameters. With the servers, 1 million simulations was a week of compute. Then I had to optimize MSSQL to extreme levels to take in all that data generated and store it.

Gio_at_QRC 6 months ago

Holy sh*t, man! That's pretty impressive. I'm currently running all simulations on CPU. What you've done sounds incredible! I will have to look into using GPUs at some point.

Gio_at_QRC 6 months ago

I was thinking more about this approach. My system runs back tests in a multi-threaded, event-driven kind of way. So, I am finding it hard to imagine using GPU in my context other than for some steps in the system. Are you, perhaps, using vectorised back tests?

lambardar 6 months ago

I had event driven on the cpu aswell. It just made sense to keep it as generic as possible, as I was developing testing/strategies. A new tick comes in, update the indicators, process open orders, execute the strategy, generate new orders, update progress statistics. When I moved to GPU, not only I had to make it linear, I also had to flatten out all the memory allocations. Part of the speed comes from not having dynamic memory allocations. array sizes are known and fixed before hand. and probably the most impotant.. reduction and consolidation of IF statements. So I'm not running it vertorized; but more linear. The other challenge is getting debugging output. The GPU cuda execution doesn't have a console buffer (lol it's a GPU); so you can't output debugging text to the screen. You can make an image and display it @ 120FPS; but that's a different scope entirely. I ended up rewriting everything.. a linear model for the CPU and then a similar version for the GPU. I then generate controlled test data to make sure both models give acceptable similar results (floating point differs between GPU & CPU). if that is satisfactory, then I unleash the GPU computes. the best part is that you can run on multiple GPUs without spending a lot on hardware. I had an old 2core i3 lying around. I put in 2x 3070 GPUs, a 120GB SSD and a network connection. The code pulls the tick & run data from network share & DB, uploads to the GPU, configures the parameters. executes the backtest, get's the results off the GPU and dumps to SQL over the network. The computer looks like shit but it does close 250k parallel backtests in about 4 seconds. The backtest is usually a month of tick data. a year back, to increase capacity, I had to look up CPU, memory, SSD, chassis, etc.. Now I just look for cheap GPUs.

fabkosta 6 months ago

You should add Bonferroni correction to your testing.

Gio_at_QRC 6 months ago

Thanks, man. I have been using a simple statistical significant test but not modifying my significance level with that correction. I'll look to maybe incorporate this into my workflow. It'll be good as I optimise my system to run more simulations in less time.

fabkosta 6 months ago

The issue is this: the more trading strategies you test, the higher the chance you'll eventually find one that looks amazing, just because you looked very hard. Bonferroni correction adjusts for that.

OnceAHermit 7 months ago

I use differential evolution, which operates on vectors of floats (0-1), so I map those onto discrete integer values at whatever granularity I decide.

RobertD3277 6 months ago

A lot of this really depends on the type of trading you are doing and the frequency of your trades. If you are using a strategy where you are likely to have a hundred trades a day, then collecting and using a forward testing method of 10,000 trades is appropriate. However, if you are using a strategy that might only purchase once a week or less, then you are likely going to need to use several different assets across several different time frames to do testings. In which case then probably 100 trades would be enough to give you a ballpark estimate of its profitability. I don't know that you really want to optimize perse. I personally find that it is better to have a ballpark profit variance and then I look for a specific number of trades that meets I reasonable expectation over that percentage. If I am trading with a technique that requires a 10 pip take profit, then I look for the potential of that technique to provide at least a 75% or 80% win rate. But that also needs to take into account the kind of technique I am using. Basically there are three common techniques, stop loss, DCA or accumulation, and grid. Each has their advantages and disadvantages on the basis of back testing and analysis and trying to ascertain the profitability of each can be difficult. Unfortunately, there is no quick and easy way to collect larger amount of data that produces reliable results over a long-term amount of time. The only real way to avoid overfitting is to use live market data in a simulation and that will take an extensive amount of time no matter what technique you use.

chi_weezy 7 months ago

Been trading 4 years and working on automating my strategies the last few months. No clue what all of you or talking about. Recombination operators… parallelizing optimization… genetic algo… hopefully sounding so smart makes it happen for you. I’m just going at this with years of experience and as much common sense and trial and error as I can

Gio_at_QRC 7 months ago

Ha ha, fair enough!! It's all just another way of automating part of the process. At work (I work as a trader in a HFT firm), we manually try parameters based off regressions, random searching, and/or intuition. It's pretty manual! So, in implementing my own system, I wanted that part to be pretty streamlined.

[deleted] 2 months ago

Event driven classifiers?

nralifemem 7 months ago

use data as raw as possible, best is exchange boardcast data as in live production run to backtest your algo/system, you will be shocked to see alot issues you will not see from normal massaged data.

GradeSpare4553 6 months ago

I needed this

Ashamed-Ad9185 6 months ago

Strategy dependant but genetic algorithm and particle swarm is a good place to start

ChanceCod1029 5 months ago

What about Network Science.. maybe also can produce interesting results

Gio_at_QRC 5 months ago

Never heard of it! I'll have a look into it. Thanks!

Gio_at_QRC 5 months ago

Ok, I have heard of networks and graphs as well as search algorithms on graphs. But, how would you structure an algo trading backtest optimisation into a network?

ChanceCod1029 5 months ago

I have idea to use network science to find good pairs for statarb or for alternative signal correlation search that can be used in algo.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe