Symbolic Regression
A classical application of genetic programming is symbolic regression. Symbolic regression is the task of finding a mathematical expression that approximates a given set of data points as closely as possible.
Example Data
For purposes of illustration, assume we are given the following table of data points, sampled from the function .
x | y |
---|---|
0.0 | 0.0 |
0.5 | 1.0606601717798212 |
1.0 | 1.7320508075688772 |
1.5 | 2.5248762345905194 |
2.0 | 3.4641016151377544 |
2.5 | 4.541475531146237 |
3.0 | 5.744562646538029 |
3.5 | 7.0622234459127675 |
4.0 | 8.48528137423857 |
4.5 | 10.00624804809475 |
5.0 | 11.61895003862225 |
Using only the points in this table, we now want Evoasm to come up with a program that, given x, will output the corresponding y for all x (i.e. not only those listed in the table).
Finding a Solution Program
Here is how it is done:
require 'evoasm' require 'evoasm/x64' Evoasm.log_level = :warn examples = { 0.0 => 0.0, 0.5 => 1.0606601717798212, 1.0 => 1.7320508075688772, 1.5 => 2.5248762345905194, 2.0 => 3.4641016151377544, 2.5 => 4.541475531146237, 3.0 => 5.744562646538029, 3.5 => 7.0622234459127675, 4.0 => 8.48528137423857, 4.5 => 10.00624804809475, 5.0 => 11.61895003862225 } parameters = Evoasm::Population::Parameters.new do |p| p.instructions = Evoasm::X64.instruction_names(:xmm).grep /(add|mul|sqrt).*?sd/ p.examples = examples p.deme_size = 1024 p.deme_count = 1 p.kernel_size = 10 p.distance_metric = :absdiff p.parameters = %i(reg0 reg1 reg2 reg3) regs = %i(xmm0 xmm1 xmm2 xmm3) p.domains = { reg0: regs, reg1: regs, reg2: regs, reg3: regs } end puts "Supported features:" Evoasm::X64.features.each do |feature, supported| puts "\t#{feature.to_s.upcase}: #{supported ? 'YES' : 'NO'}" end puts population = Evoasm::Population.new parameters kernel, loss = population.run puts kernel.disassemble format: true puts puts kernel.run 6.0 puts kernel.run 7.0 puts kernel = kernel.eliminate_introns puts kernel.disassemble format: true puts puts kernel.run 6.0 puts kernel.run 7.0
On my machine, Evoasm will find a solution in less than a second:
0x555556b9e270: vmulsd xmm2, xmm0, xmm2
0x555556b9e274: vfmadd213sd xmm2, xmm1, xmm0
0x555556b9e279: vaddsd xmm3, xmm3, xmm2
0x555556b9e27d: vfnmadd231sd xmm2, xmm2, xmm2
0x555556b9e282: vfnmadd231sd xmm0, xmm1, xmm0
0x555556b9e287: vfmadd132sd xmm0, xmm0, xmm0
0x555556b9e28c: vsqrtsd xmm1, xmm1, xmm3
0x555556b9e290: addsd xmm0, xmm2
0x555556b9e294: vaddsd xmm3, xmm3, xmm2
0x555556b9e298: vmulsd xmm0, xmm2, xmm2
Examining the Solution
You can now experiment with the found program. Use the Kernel#run method to run the found program with arbitrary input.
program.run 1.0 # => [1.7320508075688772]
# test for values not given in table
program.run 10.0 # => [31.937438845342623]
Intron Elimination
The solutions found by Evoasm will usually contain large portions of noneffective code (so-called introns).
Introns can be removed using the eliminate_introns!
method.
This will considerably shorten the size of the solution:
program.eliminate_introns.disassembly format: true
gives
0x555556ba59f0: vmulsd xmm2, xmm0, xmm2
0x555556ba59f4: vfmadd213sd xmm2, xmm1, xmm0
0x555556ba59f9: vaddsd xmm3, xmm3, xmm2
0x555556ba59fd: vsqrtsd xmm1, xmm1, xmm3
For comparison, here is what GCC 7 outputs with -O3 -march=core-avx2 -ffast-math
vmovapd xmm1, xmm0
vfmadd213sd xmm1, xmm0, QWORD PTR .LC0[rip]
vmulsd xmm0, xmm1, xmm0
vsqrtsd xmm0, xmm0, xmm0
.LC0:
.long 0
.long 1073741824