run-bench
Table of Contents
The objective of run-bench is to measure the execution time of a
set of tools (i.e., executable programs such as compilers) and
tool options on several data files (e.g., source files).
run-bench has been designed:
- to be tool agnostic
- to make it easy to
- compare tools performances
- compare options performances of the same tool
- take advantage of multi-cores architectures
- generate tables and graphics
- in a reproducible manner: more runs are done until the Confidence Interval is small enough.
1. TL;DR
- install
git clone https://gricad-gitlab.univ-grenoble-alpes.fr/verimag/reproducible-research/run-bench.git
cd run-bench
opam install --deps-only ./run-bench.opam
make && make install
- use
cd expe
git clone https://github.com/jahierwan/lustre-examples.git
run-bench -j compare-lv6-execution-systems.yml compare-lv6-execution-systems.txt
will run the jobs defined in expe/compare-lv6-execution-systems.yml on the programs listed in expe/compare-lv6-execution-systems.txt.
2. Initial Set-up
3. Jobs
Jobs are defined in a yaml file, that defines
- one or several mappings from
<job-name>to a sequence of<job-phase>. - Each
<job-phase>is itself a mapping from<phase-name>to sequences of shell commands.
For example, the following yaml file:
lv6-via-c:
- copy-files:
- echo "One must copy there all necessary files in $workdir"
- mkdir -p $workdir/$1
- cp $2/$3* $workdir/$1/
- lus2c:
- cd $workdir/$1
- basename=$3
- node=$4
- echo "let's compile $basename.lus using $node as main node"
- cat $basename.stdin | lv6 $basename.lus -n $node -2c
- c2exe:
- echo "let's compile the generated C code ($3.sh is generated by lv6 -2c)"
- cd $workdir/$1
- sh $4.sh
- run:
- ci_rel_size=0.01
- cd $workdir/$1
- cat $1.stdin | ./$4.exec
defines 1 job named lv6-via-c. This job is made of a sequence of 4
phases: copy-files, lus2c, c2exe, and run. Each phase can be
defined as a sequence of shell commands. The jobs parameters ($1,
$2, etc.) are described in the next section. If one phase fails,
the remaining ones in the sequence are not executed.
$workdir is by default the current directory (the one run-bench
was run from), but it can be changed via the --word-dir CLI option
(cf run-bench -help).
Each phase can use the value of the ci_rel_size environment
variable (in ]0; 1]). If unset, only 1 run of the phase commands
sequence is performed. Otherwise several (n) runs are performed in
order to compute a phase execution time average av(t) that is
precise enough.
More precisely, more runs are performed until the size of the
Confidence Interval (CI) at 95% (4 x sigma(t)/sqrt(n) 1
where sigma(t) is the observed variance of the execution time t)
is smaller than ci_rel_size x av(t). You can set ci_rel_size to
0.1 to have a quickly a rough idea of the execution time, and to
(e.g.) 0.01 to have more precise (reproducible).
With ci_rel_size set to 1, you get a result with 1 significant
digit, with 0.1 you get 2 significant digits, with 0.01 3
significant digits, and so on. Actually, run-bench also accepts one
to define the variable significant_digits as a positive
number. Thus writing for some n significant_digits=n is actually
equivalent to write ci_rel_size=v when v=1O-n.
Beware: Small values of ci_rel_size can lead to a lot of runs, as
the size of the CI decreases slowly when n grows (sqrt(n)).
4. Files Input Format
run-bench expects in argument a file that contains a list of data
to benchmark. Each line, made of strings separated by blanks, defines
all the necessary information to run jobs. The content of the first
string is accessed in jobs via $1, the content of the second string
by $2, etc.
metro-train metro.lus train ../files/verimag-v6/examples/ edge-rising edge.lus rising ../files/verimag-core/examples/ edge-falling edge.lus falling ../files/verimag-core/examples/
In the example above, each line is made of a label, a Lustre base
file name, a node name, and a directory. Note that run-bench does
not care about the lines content. But the job definitions do.
In other words, the information in each line can be in any order; it is the user responsibility to use an order that is consistent with job definitions.
Note however that the first string in each line is will be used as the program label in the produced files. Therefore one should use a unique label to be able to distinguish your various programs.
5. Output data files
Each Job execution leads to
- an
.orgfile (Emacs-friendly) - a
.rawfile (R-friendly) - a
.csvfile (spreadsheet-friendly)
The 3 kinds of files contain the same information: some statistics on the corresponding execution of the Jobs phases. Some examples are described in the Section below.
The benefits of .org files is that when read via emacs, contains
links to script and log files. Moreover, .html or .pdf can
easily be generated out of it (that would also contain links) using
something like
echo "| program label | phase 1 | #run | time |"> table.org cat res/*.org >> table.org emacs -batch --visit=table.org --funcall org-latex-export-to-latex && pdflatex table.tex emacs -batch --visit=table.org --funcall org-latex-export-to-html
6. The Command Line Interface (CLI)
$ run-bench -h
run-bench [options]* -j <jobs>.yml <benchs>
where
<jobs>.yml yaml file with naming conventions
<benchs> is a file made of a list of space-separated strings.
More information on https://gricad-gitlab.univ-grenoble-alpes.fr/Reproducible-Research/run-bench
--jobs-file <string>.yml
-j <string>.yml set the jobs file name (default is jobs.yml)
--cores-nb <int>
-n <int> set the max number of jobs to run in parallel (default is 1)
--timeouts <float>
-t <float> (in sec) set jobs timeout (default is 360.00)
--min-run-nb <int>
-min <int> Minimum number of run for a command (default is 10)
--max-run-nb <int>
-max <int> Maximum number of run for a command (default is 1000)
--log-dir <string>
-log <string> set where .log files are generated (default is $TESTCASE_ROOT/log)
--work-dir <string>
-work <string> set where the experiments is run (default is $TESTCASE_ROOT/work)
--res-dir <string>
-res <string> set where the results are generated (default is $TESTCASE_ROOT/res)
-org generate the result of each run a .org file
-csv generate the result of each run a .csv file
-raw generate the result of each run a .raw file
--verbose
-verbose
-v set on a verbose mode
-h display this help message
7. A simple example: compare 2 manners of executing Lustre V6 programs
Lustre V6 programs can be executed in several manners. Let's focus on 2 of them:
- one based on the C code generator, and
- one based on
lv6embedded interpreter
lv6-via-c:
- copy-files:
- echo "One must copy there all necessary files in $workdir"
- mkdir -p $workdir/$1
- cp $2/$3* $workdir/$1/
- lus2c:
- cd $workdir/$1
- basename=$3
- node=$4
- echo "let's compile $basename.lus using $node as main node"
- cat $basename.stdin | lv6 $basename.lus -n $node -2c
- c2exe:
- echo "let's compile the generated C code ($3.sh is generated by lv6 -2c)"
- cd $workdir/$1
- sh $4.sh
- run:
- ci_rel_size=0.01
- cd $workdir/$1
- cat $1.stdin | ./$4.exec
lv6-via-exec:
- copy-files:
- mkdir -p $workdir/$1
- cp $2/$3* $workdir/$1/
- lus2c:
- echo "nothing to do"
- c2exe:
- echo "nothing to do"
- run:
- ci_rel_size=0.01
- cd $workdir/$1
- basename=$3
- node=$4
- cat $3.stdin | lv6 $basename.lus -n $node -exec
cf expe/compare-lv6-execution-systems.yml
We can use Lustre programs from the repo: https://github.com/jahierwan/lustre-examples
cd expe
git clone https://github.com/jahierwan/lustre-examples.git
Let's focus on 2 Lustre programs from this repository:
speed lustre-examples/verimag-v6/examples speed speed transpose lustre-examples/verimag-v6/examples transpose test_transpose
cf expe/compare-lv6-execution-systems.txt
The conventions (on this example) used by the jobs is that each line is made of a label, a directory, a lustre file basename, and a main node.
In order to execute Lustre programs, we need to provide inputs.
speed.lus has 2 Boolean inputs. Let's randomly chose some values
for 100 steps, and generate a speed.stdin file in the same
directory as speed.stdin (so that the cp $2/$3* $workdir catches
it).
(for ((a=1; a <= 100 ; a++)) do echo "$(($RANDOM % 2)) $(($RANDOM % 2)) " ; done; echo "q") > \ lustre-examples/verimag-v6/examples/speed.stdin
cf expe/lustre-examples/verimag-v6/examples/speed.stdin
The node test_transpose in transpose.lus require 8 integers per step
(for ((a=1; a <= 100 ; a++)) do \ echo "$RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM ";\ done; echo "q") > lustre-examples/verimag-v6/examples/transpose.stdin
cf expe/lustre-examples/verimag-v6/examples/transpose.stdin
Now we can run the benchmarks:
cd expe
run-bench -j compare-lv6-execution-systems.yml compare-lv6-execution-systems.txt
/home/jahier/run-bench/expe/work/lv6-via-c-copy-files.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-c-lus2c.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-c-c2exe.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-c-run.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-exec-copy-files.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-exec-lus2c.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-exec-c2exe.sh has been generated /home/jahier/run-bench/expe/work/lv6-via-exec-run.sh has been generated a phase run of job speed__lv6-via-c finished, cf /home/jahier/run-bench/expe/log/speed__lv6-via-c_run_4c253d.log a phase run of job transpose__lv6-via-c finished, cf /home/jahier/run-bench/expe/log/transpose__lv6-via-c_run_b0d55f.log a phase run of job speed__lv6-via-exec finished, cf /home/jahier/run-bench/expe/log/speed__lv6-via-exec_run_c611ca.log a phase run of job transpose__lv6-via-exec finished, cf /home/jahier/run-bench/expe/log/transpose__lv6-via-exec_run_ba53b1.log /home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.org has been generated /home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.data has been generated /home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.csv has been generated cf /home/jahier/run-bench/expe/log/ for log files; and /home/jahier/run-bench/expe/res/ for data files
As written above, several files were generated. For instance, the
/home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.org
contains:
| program | job | copy-files-count | copy-files-times | lus2c-count | lus2c-times | c2exe-count | c2exe-times | run-count | run-times | | speed | lv6-via-c | 1 | 0.00687 | 1 | 0.03632 | 1 | 0.25305 | 10 | 0.01842 | | transpose | lv6-via-c | 1 | 0.01730 | 1 | 0.06618 | 1 | 0.27608 | 10 | 0.00614 | | speed | lv6-via-exec | 1 | 0.00485 | 1 | 0.00183 | 1 | 0.00172 | 20 | 0.26417 | | transpose | lv6-via-exec | 1 | 0.01598 | 1 | 0.00605 | 1 | 0.00598 | 12 | 0.04243 |
once exported in html, we obtain:
| program | job | copy-files-count | copy-files-times | lus2c-count | lus2c-times | c2exe-count | c2exe-times | run-count | run-times |
| speed | lv6-via-c | 1 | 0.00687 | 1 | 0.03632 | 1 | 0.25305 | 10 | 0.01842 |
| transpose | lv6-via-c | 1 | 0.01730 | 1 | 0.06618 | 1 | 0.27608 | 10 | 0.00614 |
| speed | lv6-via-exec | 1 | 0.00485 | 1 | 0.00183 | 1 | 0.00172 | 20 | 0.26417 |
| transpose | lv6-via-exec | 1 | 0.01598 | 1 | 0.00605 | 1 | 0.00598 | 12 | 0.04243 |
Each line corresponds to a job execution (column 2) on a program (column 1). Columns 2-3 give information about the first phase; Columns 4-5 give information about the second phase, and so on.
More precisely, for phase 1 (resp i):
- column 3 (
3+2*(i-1)) contains the number of times the phase1(i) was run, and - column 4 (
4+2*(i-1)) contains the average (user+sys) execution time the phase1(i) lasted.
nb: this .org file contains links to script and log files that were
generated by run-bench.
7.1. Plotting
From the output files (org, csv, or raw), one can use
third-party tools to generate nice graphics (using, e.g., excel,
R, or gnuplot).
For example, like that using R:
library(ggplot2) data <- data.frame(val=read.table("./result.data")) # the first row contains the column names colnames(data) <- data[1,] data <- data[-1, ] plot <- ggplot(data=data, aes(x=program, y=run_times, fill=job)) + geom_bar(stat="identity", position=position_dodge()) + scale_fill_hue(name="Execution system") + xlab("Programs") + ylab("Time in seconds") + ggtitle("Comparing 2 Execution systems") pdf("compare-lv6-execution-systems.png") print(plot) dev.off()
8. Future Features
8.1. TODO The measurements should be configurable
times by default, a user-provided program otherwise
Something like (in phase definition, via an environment variable similar to
ci_rel_size):
- measure_command="times" - measure_command="cat f.log | grep step | cut -d ' ' -f 3"
8.2. TODO Add a mode where the number of run n is fixed, and the size of the CI is provided as an extra column
- runs_number=100
8.3. TODO copy the output of /proc/loadavg every minutes in log/loadavg.log
because it is interesting to track the content of that file during the experiments.
Footnotes:
The 4 in this formula is an approximation of the value
of the 95th percentile the of student distribution with n-1 degrees
of freedom. The exact value, which tends with n towards 2 x
1.96, can be found in the 95% 2-sided column of
https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values.