Methods for Dynamic Compilation and Loading

I'm looking for advice on the best way to do dynamic compilation and loading in C++ (I know the general answer is "don't", but there are good reasons why we need to do this).

To explain the context, we are using a genetic programming system to evolve a core component (a set of arithmetic expressions) of a complex ecological model. In a typical run, the GP system would desirably generate 10^4-10^5 potential expressions (more if we could afford it). Many of these expressions will have the same structure, but different parameter values (especially later in the run, when the system has converged on good model structures). Each individual (i.e. arithmetic expression) is called ~10^4 times within a run of the overall ecological model, to estimate its error on the training data (the error is the GP fitness function). At the moment, because of the ~10^4 calls, it is costing us minutes for each individual, to evaluate its performance in the model. This would pay for a lot of compilation/loading overhead (even more if we used hashing to avoid re-compiling different individuals with the same structure but different parameter values).

The current expression interpreter is using a hashed function pointer jump table; it's a substantial speed-up on our earlier switch-based implementations, but the expression interpreter is still taking ~90% of the runtime, and we are probably close to the limits of speed-ups we can get in an interpreter. Hence the desire to look at run-time compilation. The overall GP system and ecological model is quite large, so converting the whole thing to another language, even if it were desirable, isn't feasible.

The expression interpreter requires access to about 30 double values (model parameters) that are fixed during the evaluation of a single individual (but might change if we used the same compilation modeule for a different individual with the same expression but different parameters) and about 15 variable values that vary each time the expression is called. I assume the most efficient way to handle this is going to be to pass an array of parameter values, rather than to pass each parameter separately. The code it's embedded in is somewhat hokey C/C++; it's not fully standards-compliant, but does compile OK under both gcc and ilpc. It needs to run on a fedora linux cluster (not for parallel execution - though parallel evaluation is one potential further avenue for speed-up - but so we can do multiple runs at the same time).

At the moment I can see two ways to do this that seem feasible, and a couple of further possibilities:
1. We could write the expression out to a file, use system() calls to a c++ compiler to do the compilation, and then use dlopen to load. Since I've never tried this, I have no idea how difficult it might be, and would appreciate any pointers.

2. We could link in a java method, then do dynamic compilation and linking from java - I'm thinking java rather than more typical dynamic compilation languages because it would simplify passing an array of parameters.

3. Maybe some other language would be a better choice - I'm reasonably familiar with functional and logic languages and their support for dynamic execution, but worried about parameter passing complexities.

4. In principle, this seems like an ideal application for clang, but I haven't been able to find any pointers to current support for dynamic compilation in clang, so I'm guessing it is somewhat down the track as a feature.

If you've got this far, thank you for reading. If you have any useful comments on good (or bad) ways to do this, they will be greatly appreciated...

Thanks and Best Wishes
Bob
Topic archived. No new replies allowed.