This lesson is in the early stages of development (Alpha version)

HPC Parallelisation For Novices: Knowledge Base

Key Points

Recap: Changing the Environment
  • An HPC cluster is a shared resource. As such, users are not allowed to use package managers and install additional packages on all nodes.

  • Software can still be installed in user-owned file system locations.

  • The shell environment can be changed to allow a user to run custom software.

  • Changing the environment is so common, that automated systems like the environment modules are used to manage that.

Estimation of Pi for Pedestrians
  • Each programming language typically provides tools called profilers with which you can analyse the runtime of your code.

  • The estimate of pi spends most of it’s time while generating random numbers.

  • The estimation of pi with the Monte Carlo method is a compute bound problem because pseudo-random numbers are just algorithms.

Parallel Estimation of Pi for Pedestrians
  • Amdahl’s law is a description of what you can expect of your parallelisation efforts.

  • Use the profiling data to calculate the time consumption of hot spots in the code.

  • The generation and processing of random numbers can be parallelized as it is a data parallel task.

  • Time consumption of a single application can be measured using the time utility.

  • The ratio of the run time of a parallel program divided by the time of the equivalent serial implementation, is called speed-up.

Higher levels of parallelism
  • The implementation using multiprocessing used python standard library components (very portable).

  • The dask library offers parallelisation using constructs that are very numpy like.

  • To port to dask only the import statements and the container construction needs to be changed.

  • The advantage of these changes lie in the capability to scale the job to larger machines (test locally, scale globally).

  • At the heart of the ease of use lie ‘standardized’ building blocks for algorithms using the map-reduce paradigm.

  • Amdahl’s law still holds.

Searching for Pi
  • Searching through a large file is bound by the speed that it can be read-in.

  • Having a set of files, the result of searching one file is independent of searching its sibling.

  • HPC clusters have very powerful parallel file systems, that offer the best speed if data is accessed in parallel.

  • The operation of searching through a file can be mapped to individual nodes on the cluster. (map step)

  • After the map step has been completed, all sub-results have to be reduced to one final result. (reduce step)

Bonus session: Distributing computations among computers
  • The MPI driver mpirun sends compute jobs to a set of allocated computers.

  • The MPI software then executes these jobs on the remote hosts and synchronizes their state/memory.

  • The print_hostname.py infers the hostname of the current machine. If run in parallel with mpirun, it prints several different host names.

  • MPI can be used to split the random sampling into components and have several nodes generate random numbers and report back only the pi estimate of this partition.