Goal of this exercise is to learn and understand genetic algorithms by finding an optimal configuration for highly configurable software systems
Can be for example the Linux-Kernel or SQLite.
suppose we would measure each variant of SQLite:
2^88 variants and 5 minutes per measurement (compile + benchmark)
= 2^88 * 5min / 60 (per h) / 24 (per day) / 365 (per year)
= 2,944,111,585,058,457,655,296 years
this is why we use a meta-heuristic, since we don't want to wait that long ;)
Highly configurable software systems are systems which provide many features to be selected for creating an individual variant of a program.
However, selecting and deselecting specific features for a configuration can result in efficient or an inefficient configurations of the program.
For the exercise, we want to find an optimal configuration for the h264 and bdbc systems.
h264 is a block-oriented motion-compensation-based video compression standard (source).
Berkeley DB (bdb) is a software library intended to provide a high-performance embedded database for key/value data (source). The given dataset is based on the c-implementation of bdb.
bdbc_f = "code/datasets/bdbc_feature.txt" bdbc_i = "code/datasets/bdbc_interactions.txt" h264_f = "code/datasets/h264_feature.txt" h264_i = "code/datasets/h264_interactions.txt" path = bdbc_f #path = bdbc_i #path = h264_f #path = h264_i with open(path) as f: content = f.read() print(content)
root: 4.29699011265453 HAVE_CRYPTO: 0.0120768614910988 HAVE_HASH: 4.62346632789263 HAVE_REPLICATION: 0.274902720516604 HAVE_VERIFY: 0.0534583494140551 HAVE_SEQUENCE: 0.468274246157709 HAVE_STATISTICS: 0.422176987829888 DIAGNOSTIC: 0.00352722010086667 PS1K: 0.789118194698634 PS4K: 0.505426563786175 PS8K: 1.69334751326048 PS16K: 4.45418742564813 PS32K: 0.746689251767928 CS32MB: 3.86431506487224 CS16MB: 2.95905633060594 CS64MB: 2.58412079773263 CS512MB: 5.34132772539305
4.29 + HAVE_CRYPTO 0.01 + HAVE_HASH 4.63 + HAVE_REPLICATION * 0.27 + ... +
PS16K CS512MB (-4.05) + PS1K HAVE_REPLICATION (- .73) + ...
How to build a genetic algorithm? See lecture nodes for that.
How to model an individual candidate solution?
For example when selecting only the feature "root" in bdbc:
The fitness of this candidate solution is: 4.29699011265453
Compare your Algorithm with these values (my best performing solutions):