I recently made SymReg. It lets you discover mathematical expressions that explain some data based on some other data.

The code used for this blog post is in this notebook.

Let's try it on a sine wave, in spite of the fact that it does not have a sine building block:


As you can see, the function found is quite a good fit. But a complexity of 337 has its disadvantages: it may be overfit. As soon as we go outside the trained interval, there are explosions:


Limiting the predictor complexity with max_complexity acts similarly to regularization of neural networks. The behavior on the outside is a bit tamer now - +/- 6 instead of +/- 15.


Sadly, Python has no multithreading due to the Global Interpreter Lock, which is a significant limitation for CPU-bound code like this. Also, multiprocessing is very slow, taking more to launch a fork than it takes a whole generation to evolve.

If you know a way around the GIL, please let me know! But it seems the only way is to change the programming language.

Still, this tool might be useful for you. And if you are optimizing multiple functions, you could do it in parallel quite efficiently. Try it out and share your thoughts!