However, when I see people with long-running programs they invariably use only a single CPU. A waste. These programs could finish in half the time with just a few additional lines of code.
#include <omp.h> ... omp_set_num_threads(tn); #pragma omp parallel for for (i=0; i<ub; i++) { ... } ...
Add to your C program an include line and a pragma line, and give the gcc compiler the -fopenmp flag, and lo! your program uses more than one CPU. If you want, you can control the number of CPUs used (even dynamically, say by reading a small file once a second).
Be careful! One has to use locking, or arrange the algorithm in such a way that the various CPUs do not modify the same data, or even data in the same cache line.
Be careful! Your C library may be unable to handle simultaneous calls.
This is a big topic, and there are many details. Read some introduction.