Why bother with performance?

Overview

Teaching: 10 min
Exercises: 5 min

Questions

What is software performance?

Why is software performance important?

How can performance be measured?

What is meant by flops, walltime and CPU hours?

How can performance be enhanced?

How can I use compute resources effectively?

Objectives

Understand the link between software performance and hardware

Identify the different methods of enhancing performance

Calculate walltime, CPU hours

What is software performance?

Before getting into the software side of things, lets take a few steps back. The concept of performance is generic and can apply to many factors in our lives, such as our own physical and mental performance. With software performance the emphasis is not on using the most powerful machine, but on how best to utilise the power that you have.

Say you are the chef in a restaurant and every dish that you do is perfect. You would be able to produce a set 7 course meal for a table of 3-6 without too much difficulty. If you are catering a conference dinner though, it becomes more difficult, as people would be waiting many hours for their food. However, if you could delegate tasks to a team of 6 additional chefs who could assist you (while also communicating with each other), you have a much higher chance of getting the food out on time and coping with a large workload. With 7 courses and a total of 7 chefs, it’s most likely that each chef will spend most of their time concentrating on doing one course.

When we dramatically increase the workload we need to distribute it efficiently over the resources we have available so that we can get the desired result in a reasonable amount of time. That is the essence of software performance, using the resources you have to the best of their capabilities.

Why is software performance important?

This is potentially a complex question but has a pretty simple answer when we restrict ourselves to the context of this lesson. Since the focus here is our usage of software that is developed by someone else, we can take a self-centred and simplistic approach: all we care about is reducing the time to solution to an acceptable level while minimising our resource usage.

This lesson is about taking a well-informed, systematic approach towards how to do this.

Enhancing performance: rationale

Imagine you had a 10x10x10 box like the one below, divided up into smaller boxes, each measuring 1x1x1. In one hour, one CPU core can simulate one hour of activity inside the smaller box. If you wanted to simulate what was happening inside the large box for 8 hours, how long will it take to run if we only use one CPU core?

Solution

8000 hours…close to a year!

This is way longer than is feasibly convenient! But remember, that is utilising just one core. If you had a machine that could simulate each of those smaller boxes simultaneously and a code that enables each box to effectively interact with each other, the whole job would only take roughly an hour (but probably a little more because of issues we will discuss in subsequent episodes).

How is performance measured?

There are a number of key terms in computing when it comes to understanding performance and expected duration of tasks. The most important of these are walltime, flops and CPU hours.

Walltime

Walltime is the length of time, usually measured in seconds, that a program takes to run (i.e., to execute its assigned tasks). It is not directly related to the resources used, it’s the time it takes according to an independent clock on the wall.

Flops

Flops (or sometimes flop/s) stands for floating point operations per second and they are typically used to measure the (theoretical) performance of a computer’s processor.

The theoretical peak flops is given by
Number of cores * Average frequency * Operations per cycle`
What a software program can achieve in terms of flops is usually a surprisingly small percentage of this value (e.g., 10% efficiency is not a bad number!).

Since in our case we don’t develop or compile the code, the only influence we have on the flops achieved by the application is by dictating the choices it makes during execution (sometimes called telling it what code path to walk).

CPU Hours

CPU hours is the amount of CPU time spent processing. For example, if I execute for a walltime of 1 hour on 10 CPUs, then I will have used up 10 CPU hours.

Maximising the flops of our application will help reduce the CPU hours, since we will squeeze more calculations out of the same CPU time. However, you can achieve much greater influence on the CPU hours by using a better algorithm to get to your result. The best algorithm choice is usually heavily dependent on your particular use case (and you are always limited by what algorithm options are available in the software).

Calculate CPU hours

Assume that you are utilising all the available cores in a node with 40 CPU cores. Calculate the CPU hours you would need if you expect to run for 5 hours using 2 nodes.

How much walltime do you need?

How do you ask the resource scheduler for that much time?
Solution

400 CPU hours are needed.

We need to ask the scheduler for 5 hours (and 2 nodes worth of resources for that time). To ask for this from the scheduler we need to include the appropriate option in our job script:
#SBATCH -t 5:00:00

The -t option to the scheduler used in the exercise indicates the maximum amount of walltime requested, which will differ from the actual walltime the code spends to run.

How can we enhance performance?

If you are taking this course you are probably application users, not application developers. To enhance the code performance you need to trigger behaviour in the software that the developers will have put in place to potentially improve performance. To do that, you need to know what the triggers are and then try them out with your use case to see if they really do improve performance.

Some triggers will be hardware related, like the use of OpenMP, MPI, or GPUs. Others might relate to alternative libraries or algorithms that could be used by the application.

Enhancing Performance

Which of these are viable ways of enhancing performance? There may be more than one correct answer.

Utilising more CPU cores and nodes

Increasing the simulation walltime

Reduce the frequency of file writing in the code

Having a computer with higher flops

Solution

Yes, potentially, the more cores and nodes you have, the more work can be distributed across those cores…but you need to ensure they are being used!

No, increasing simulation walltime only increases the possible duration the code will run for. It does not improve the code performance.

Yes, IO is a very common bottleneck in software, it can be very expensive (in terms of time) to write files. Reducing the frequency of file writing can have a surprisingly large impact.

Yes, in a lot of cases the faster the computer, the faster the code can run. However, you may not be able to find a computer with higher flops, since higher individual CPU flops need disproportionately more power to run (so are not well suited to an HPC system).

In an HPC system, you will usually find a lot of lower flop cores because they make sense in terms of the use of electrical energy. This surprises people because when they move to an HPC system, they may initially find that their code is slower rather than faster. However, this is usually not about the resources but managing their understanding and expectations of them.

As you can see, there are a lot of right answers, however some methods work better than others, and it can entirely depend on the problem you are trying to solve.

How can I use compute resources effectively?

Unfortunately there is no simple answer to this, because the art of getting the ‘optimum’ performance is ambiguous. There is no ‘one size fits all’ method to performance optimization.

While preparing for production use of software, it is considered good practice to test performance on manageable portion of your problem first, attempt to optimize it (using some of the recommendations we outline in this lesson, for example) and then make a decision about how to run things in production.

Key Points

Software performance is the use of computational resources effectively to reduce runtime

Understanding performance is the best way of utilising your HPC resources efficiently

Performance can be measured by looking at flops, walltime, and CPU hours

There are many ways of enhancing performance, and there is no single ‘correct’ way. The performance of any software will vary depending on the tasks you want it to undertake.

lesson home

Running LAMMPS on HPC systems

next episode

Why bother with performance?

Overview

What is software performance?

Why is software performance important?

Enhancing performance: rationale

Solution

How is performance measured?

Walltime

Flops

CPU Hours

Calculate CPU hours

Solution

How can we enhance performance?

Enhancing Performance

Solution

How can I use compute resources effectively?

Key Points

lesson home

next episode