https://gricad-gitlab.univ-grenoble-alpes.fr/verimag/reproducible-research/run-bench/tree/main

run-bench

Table of Contents

The objective of run-bench is to measure the execution time of a set of tools (i.e., executable programs such as compilers) and tool options on several data files (e.g., source files).

run-bench has been designed:

  1. to be tool agnostic
  2. to make it easy to
    • compare tools performances
    • compare options performances of the same tool
    • take advantage of multi-cores architectures
    • generate tables and graphics
  3. in a reproducible manner: more runs are done until the Confidence Interval is small enough.

1. TL;DR

  • install
git clone https://gricad-gitlab.univ-grenoble-alpes.fr/verimag/reproducible-research/run-bench.git
cd run-bench
opam install --deps-only ./run-bench.opam
make && make install
  • use
cd expe
git clone https://github.com/jahierwan/lustre-examples.git
run-bench -j compare-lv6-execution-systems.yml compare-lv6-execution-systems.txt

will run the jobs defined in expe/compare-lv6-execution-systems.yml on the programs listed in expe/compare-lv6-execution-systems.txt.

2. Initial Set-up

You need to have git, opam, and make.

git clone https://gricad-gitlab.univ-grenoble-alpes.fr/verimag/reproducible-research/run-bench.git
cd run-bench
opam install --deps-only ./run-bench.opam
make && make install

3. Jobs

Jobs are defined in a yaml file, that defines

  • one or several mappings from <job-name> to a sequence of <job-phase>.
  • Each <job-phase> is itself a mapping from <phase-name> to sequences of shell commands.

For example, the following yaml file:

lv6-via-c:
  - copy-files:
    - echo "One must copy there all necessary files in $workdir"
    - mkdir -p $workdir/$1
    - cp $2/$3* $workdir/$1/
  - lus2c:
    - cd $workdir/$1
    - basename=$3
    - node=$4
    - echo "let's compile $basename.lus using $node as main node"
    - cat $basename.stdin | lv6 $basename.lus -n $node -2c
  - c2exe:
    - echo "let's compile the generated C code ($3.sh is generated by lv6 -2c)"
    - cd $workdir/$1
    - sh $4.sh
  - run:
    - ci_rel_size=0.01
    - cd $workdir/$1
    - cat $1.stdin | ./$4.exec

defines 1 job named lv6-via-c. This job is made of a sequence of 4 phases: copy-files, lus2c, c2exe, and run. Each phase can be defined as a sequence of shell commands. The jobs parameters ($1, $2, etc.) are described in the next section. If one phase fails, the remaining ones in the sequence are not executed.

$workdir is by default the current directory (the one run-bench was run from), but it can be changed via the --word-dir CLI option (cf run-bench -help).

Each phase can use the value of the ci_rel_size environment variable (in ]0; 1]). If unset, only 1 run of the phase commands sequence is performed. Otherwise several (n) runs are performed in order to compute a phase execution time average av(t) that is precise enough.

More precisely, more runs are performed until the size of the Confidence Interval (CI) at 95% (4 x sigma(t)/sqrt(n) 1 where sigma(t) is the observed variance of the execution time t) is smaller than ci_rel_size x av(t). You can set ci_rel_size to 0.1 to have a quickly a rough idea of the execution time, and to (e.g.) 0.01 to have more precise (reproducible).

With ci_rel_size set to 1, you get a result with 1 significant digit, with 0.1 you get 2 significant digits, with 0.01 3 significant digits, and so on. Actually, run-bench also accepts one to define the variable significant_digits as a positive number. Thus writing for some n significant_digits=n is actually equivalent to write ci_rel_size=v when v=1O-n.

Beware: Small values of ci_rel_size can lead to a lot of runs, as the size of the CI decreases slowly when n grows (sqrt(n)).

4. Files Input Format

run-bench expects in argument a file that contains a list of data to benchmark. Each line, made of strings separated by blanks, defines all the necessary information to run jobs. The content of the first string is accessed in jobs via $1, the content of the second string by $2, etc.

metro-train  metro.lus train    ../files/verimag-v6/examples/
edge-rising  edge.lus  rising   ../files/verimag-core/examples/
edge-falling edge.lus  falling  ../files/verimag-core/examples/

In the example above, each line is made of a label, a Lustre base file name, a node name, and a directory. Note that run-bench does not care about the lines content. But the job definitions do.

In other words, the information in each line can be in any order; it is the user responsibility to use an order that is consistent with job definitions.

Note however that the first string in each line is will be used as the program label in the produced files. Therefore one should use a unique label to be able to distinguish your various programs.

5. Output data files

Each Job execution leads to

  • an .org file (Emacs-friendly)
  • a .raw file (R-friendly)
  • a .csv file (spreadsheet-friendly)

The 3 kinds of files contain the same information: some statistics on the corresponding execution of the Jobs phases. Some examples are described in the Section below.

The benefits of .org files is that when read via emacs, contains links to script and log files. Moreover, .html or .pdf can easily be generated out of it (that would also contain links) using something like

  echo "| program label | phase 1 | #run | time |"> table.org
  cat res/*.org  >> table.org
  emacs -batch --visit=table.org --funcall org-latex-export-to-latex && pdflatex table.tex
  emacs -batch --visit=table.org --funcall org-latex-export-to-html

6. The Command Line Interface (CLI)

  $ run-bench -h
  run-bench [options]* -j <jobs>.yml <benchs>
  where
    <jobs>.yml yaml file with naming conventions
    <benchs> is a file made of a list of space-separated strings.
  
  More information on https://gricad-gitlab.univ-grenoble-alpes.fr/Reproducible-Research/run-bench
  
    --jobs-file <string>.yml
    -j <string>.yml      set the jobs file name (default is jobs.yml)
    --cores-nb <int>
    -n <int>             set the max number of jobs to run in parallel (default is 1)
    --timeouts <float>
    -t <float>           (in sec) set jobs timeout (default is 360.00)
    --min-run-nb <int>
    -min <int>           Minimum number of run for a command (default is 10)
    --max-run-nb <int>
    -max <int>           Maximum number of run for a command (default is 1000)
    --log-dir <string>
    -log <string>        set where .log files are generated (default is $TESTCASE_ROOT/log)
    --work-dir <string>
    -work <string>       set where the experiments is run (default is $TESTCASE_ROOT/work)
    --res-dir <string>
    -res <string>        set where the results are generated (default is $TESTCASE_ROOT/res)
    -org                 generate the result of each run a .org file
    -csv                 generate the result of each run a .csv file
    -raw                 generate the result of each run a .raw file
    --verbose  
    -verbose  
    -v           set on a verbose mode
    -h           display this help message

7. A simple example: compare 2 manners of executing Lustre V6 programs

Lustre V6 programs can be executed in several manners. Let's focus on 2 of them:

  • one based on the C code generator, and
  • one based on lv6 embedded interpreter
lv6-via-c:
  - copy-files:
    - echo "One must copy there all necessary files in $workdir"
    - mkdir -p $workdir/$1
    - cp $2/$3* $workdir/$1/
  - lus2c:
    - cd $workdir/$1
    - basename=$3
    - node=$4
    - echo "let's compile $basename.lus using $node as main node"
    - cat $basename.stdin | lv6 $basename.lus -n $node -2c
  - c2exe:
    - echo "let's compile the generated C code ($3.sh is generated by lv6 -2c)"
    - cd $workdir/$1
    - sh $4.sh
  - run:
    - ci_rel_size=0.01
    - cd $workdir/$1
    - cat $1.stdin | ./$4.exec

lv6-via-exec:
  - copy-files:
    - mkdir -p $workdir/$1
    - cp $2/$3* $workdir/$1/
  - lus2c:
    - echo "nothing to do"
  - c2exe:
    - echo "nothing to do"
  - run:
    - ci_rel_size=0.01
    - cd $workdir/$1
    - basename=$3
    - node=$4
    - cat $3.stdin | lv6 $basename.lus -n $node -exec

cf expe/compare-lv6-execution-systems.yml

We can use Lustre programs from the repo: https://github.com/jahierwan/lustre-examples

cd expe
git clone https://github.com/jahierwan/lustre-examples.git

Let's focus on 2 Lustre programs from this repository:

speed lustre-examples/verimag-v6/examples speed speed
transpose lustre-examples/verimag-v6/examples transpose test_transpose

cf expe/compare-lv6-execution-systems.txt

The conventions (on this example) used by the jobs is that each line is made of a label, a directory, a lustre file basename, and a main node.

In order to execute Lustre programs, we need to provide inputs.

speed.lus has 2 Boolean inputs. Let's randomly chose some values for 100 steps, and generate a speed.stdin file in the same directory as speed.stdin (so that the cp $2/$3* $workdir catches it).

(for ((a=1; a <= 100 ; a++)) do echo "$(($RANDOM % 2)) $(($RANDOM % 2)) " ; done; echo "q") > \
    lustre-examples/verimag-v6/examples/speed.stdin

cf expe/lustre-examples/verimag-v6/examples/speed.stdin

The node test_transpose in transpose.lus require 8 integers per step

(for ((a=1; a <= 100 ; a++)) do \
 echo "$RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM ";\
 done; echo "q") > lustre-examples/verimag-v6/examples/transpose.stdin

cf expe/lustre-examples/verimag-v6/examples/transpose.stdin

Now we can run the benchmarks:

cd expe
run-bench -j compare-lv6-execution-systems.yml compare-lv6-execution-systems.txt
/home/jahier/run-bench/expe/work/lv6-via-c-copy-files.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-c-lus2c.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-c-c2exe.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-c-run.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-exec-copy-files.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-exec-lus2c.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-exec-c2exe.sh has been generated
/home/jahier/run-bench/expe/work/lv6-via-exec-run.sh has been generated
a phase run of job speed__lv6-via-c finished, cf /home/jahier/run-bench/expe/log/speed__lv6-via-c_run_4c253d.log
a phase run of job transpose__lv6-via-c finished, cf /home/jahier/run-bench/expe/log/transpose__lv6-via-c_run_b0d55f.log
a phase run of job speed__lv6-via-exec finished, cf /home/jahier/run-bench/expe/log/speed__lv6-via-exec_run_c611ca.log
a phase run of job transpose__lv6-via-exec finished, cf /home/jahier/run-bench/expe/log/transpose__lv6-via-exec_run_ba53b1.log
/home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.org has been generated
/home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.data has been generated
/home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.csv has been generated
cf /home/jahier/run-bench/expe/log/ for log files; and /home/jahier/run-bench/expe/res/ for data files

As written above, several files were generated. For instance, the /home/jahier/run-bench/expe/res/result-10-0-2024_15-21-56.org contains:

| program   | job          | copy-files-count | copy-files-times | lus2c-count | lus2c-times | c2exe-count | c2exe-times | run-count | run-times |
| speed     | lv6-via-c    | 1                | 0.00687          | 1           | 0.03632     | 1           | 0.25305     | 10        | 0.01842   |
| transpose | lv6-via-c    | 1                | 0.01730          | 1           | 0.06618     | 1           | 0.27608     | 10        | 0.00614   |
| speed     | lv6-via-exec | 1                | 0.00485          | 1           | 0.00183     | 1           | 0.00172     | 20        | 0.26417   |
| transpose | lv6-via-exec | 1                | 0.01598          | 1           | 0.00605     | 1           | 0.00598     | 12        | 0.04243   |

once exported in html, we obtain:

program job copy-files-count copy-files-times lus2c-count lus2c-times c2exe-count c2exe-times run-count run-times
speed lv6-via-c 1 0.00687 1 0.03632 1 0.25305 10 0.01842
transpose lv6-via-c 1 0.01730 1 0.06618 1 0.27608 10 0.00614
speed lv6-via-exec 1 0.00485 1 0.00183 1 0.00172 20 0.26417
transpose lv6-via-exec 1 0.01598 1 0.00605 1 0.00598 12 0.04243

Each line corresponds to a job execution (column 2) on a program (column 1). Columns 2-3 give information about the first phase; Columns 4-5 give information about the second phase, and so on.

More precisely, for phase 1 (resp i):

  • column 3 (3+2*(i-1)) contains the number of times the phase 1 (i ) was run, and
  • column 4 (4+2*(i-1)) contains the average (user+sys) execution time the phase 1 (i) lasted.

nb: this .org file contains links to script and log files that were generated by run-bench.

7.1. Plotting

From the output files (org, csv, or raw), one can use third-party tools to generate nice graphics (using, e.g., excel, R, or gnuplot).

For example, like that using R:

library(ggplot2)

data <- data.frame(val=read.table("./result.data"))
# the first row contains the column names
colnames(data) <- data[1,]
data <- data[-1, ]

plot <- ggplot(data=data, aes(x=program, y=run_times, fill=job)) +
  geom_bar(stat="identity", position=position_dodge()) +
  scale_fill_hue(name="Execution system") + xlab("Programs") + ylab("Time in seconds") +
  ggtitle("Comparing 2 Execution systems")

pdf("compare-lv6-execution-systems.png")
print(plot)
dev.off()

compare-lv6-execution-systems.png

8. Future Features

8.1. TODO The measurements should be configurable

times by default, a user-provided program otherwise

Something like (in phase definition, via an environment variable similar to ci_rel_size):

- measure_command="times"
- measure_command="cat f.log | grep step | cut -d ' ' -f 3"

8.2. TODO Add a mode where the number of run n is fixed, and the size of the CI is provided as an extra column

- runs_number=100

8.3. TODO copy the output of /proc/loadavg every minutes in log/loadavg.log

because it is interesting to track the content of that file during the experiments.

Footnotes:

1

The 4 in this formula is an approximation of the value of the 95th percentile the of student distribution with n-1 degrees of freedom. The exact value, which tends with n towards 2 x 1.96, can be found in the 95% 2-sided column of https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values.

Author: Erwan Jahier

Created: 2024-05-23 jeu. 15:47

Validate