Profiling¶

# if you're using colab, then install the required modules
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install line_profiler snakeviz pyinstrument eliot eliot-tree

Profiling analyses your code in terms of speed or memory.

This helps identify where the bottlenecks are (why and where is it slow?) and how much potential there is for improvement.

Note

IPython magic commands are useful, commonly used commands in Jupyter Notebooks.
They have one % at the start for use on a single line.
Or two %% at the start for use over a whole cell.

`timeit`¶

A IPython magic command that measures the time execution of an expression.

It runs a few times, depending on how intensive the expression is, and returns the best result with the uncertainty.

It is useful for benchmarking a small code snippet.

Tip

timeit is different to time, which is a less optimised alternative.
Hence, use timeit.

%timeit range(100)

97.5 ns ± 0.117 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%%timeit
for x in range(100):
    pass

1.05 µs ± 0.217 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Another option is:

`line_profiler`¶

A module that measures the time spent in each line of a function.

%load_ext line_profiler

import time

def relax():
    pass

def bottleneck():
    time.sleep(0.001)

def some_function():
    nums = [num for num in range(1_000)]
    bigger_nums = [num ** 2 for num in nums]
    bottleneck()
    bigger_nums.extend(bigger_nums)
    relax()

The Timer unit is the conversion to seconds.
The table shows the following columns for each line of the function (each row):
- Line #: The line number.
- Hits: The number of times that line was called.
- Time: The total time spent calling that line (in the Timer units).
- Per Hit: The time spend for one call of that line.
- % Time: The percentage of total time spent on that line (this is the often the most useful column).
- Line Contents: The source code for that line.

%lprun -f some_function some_function()

We can see that most of the time was in the bottleneck() call, highlighting the potential here for possible improvements.

`SnakeViz`¶

SnakeViz visualises the output from cProfile in nice interactive plots.

cProfile is part of the Python standard library.

It’s a deterministic profiler of wallclock time (giving the same output for the same input). It is for serial code (not parallel, which we’ll cover later).

However, the output has lots of tabular information that isn’t ideal. Every function call gets measured in seconds (each row).

For example:

import cProfile

cProfile.run('some_function()')

         10 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 572293778.py:10(<listcomp>)
        1    0.000    0.000    0.000    0.000 572293778.py:11(<listcomp>)
        1    0.000    0.000    0.000    0.000 572293778.py:3(relax)
        1    0.000    0.000    0.001    0.001 572293778.py:6(bottleneck)
        1    0.000    0.000    0.001    0.001 572293778.py:9(some_function)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.001    0.001    0.001    0.001 {built-in method time.sleep}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}

Instead, visualising this output in SnakeViz is nicer.

You can switch between the two styles using the “Style” dropdown.

Style: Icicle

Functions are represented by rectangles.
The root function is the top-most rectangle, with functions it calls below it, then the functions those call below them, and so on.
The amount of time spent inside a function is represented by the width of the rectangle.
- A big rectangle that stretches across most of the visualization represents a function that is taking up most of the time of its calling function, while a small rectangle represents a function that is using hardly any time at all.

Style: Sunburst

Functions are represented by arcs.
The root function is the centre circle, with functions it calls around, then the functions those functions call around that, and so on.
The amount of time spent inside a function is represented by how far around the circle the arc goes.
- An big arc that wraps most of the way around the circle represents a function that is taking up most of the time of its calling function, while a small arc represents a function that is using hardly any time at all.

%load_ext snakeviz

%%snakeviz
some_function()

 
*** Profile stats marshalled to file '/tmp/tmp3viwk5jx'.
Embedding SnakeViz in this document...

Warning

SnakeViz doesn’t currently work with Colab.

`pyinstrument`¶

A statistical profiler of wall-clock time (recording the call stack every 1ms), lowering the overhead compared to tracing profilers (like cProfile).

It hides library frames and fast functions, so you can focus on the slow parts of your code.

The output shows a tree of actions of how the function executes, using a traffic light colour legend (i.e., red = slow).

%load_ext pyinstrument

%%pyinstrument
some_function()

`Eliot`¶

Logs the performance information.

It tracks each function called, its arguments, the return values, and any rasied exceptions.

Hence, it can be helpful for profiling in real use cases.

Note

Logging has a higher overhead than cProfile.

It works by using decorators.

Decorator are a nice way for functions to return other functions.

So:

@f1
def f2(): 
  pass

Is the same as:

def f2(): 
  pass 

f2 = f1(f2)

For example, here is a decorated function that logs calls to a function:

def log_calls(function):
    
    def wrapper():
        name = function.__name__
        
        print(f'Before {name} was called ...')
        function()
        print(f'After {name} was called ...')
        
    return wrapper

Calling this returns a function:

log_calls(some_function)

<function __main__.log_calls.<locals>.wrapper()>

So, to execute it we add ():

log_calls(some_function)()

Before some_function was called ...
After some_function was called ...

The @wrapper syntax is a nice way to return this same result, where the outer function is placed above the inner function:

@log_calls
def some_function_with_logging():
    some_function()

some_function_with_logging()

Before some_function_with_logging was called ...
After some_function_with_logging was called ...

Okay, so now we can use the @eliot.log_call decorator to profile a function:

import eliot

def bottleneck():
    time.sleep(0.5)

@eliot.log_call
def multiply(x, y):
    bottleneck()
    return x * y

@eliot.log_call
def add(x, y):
    return x + y

# remove file if exists
import os

try:
    os.remove("profiling_some_function.log")
except OSError:
    pass

# tell eliot where to log the output messages
eliot.to_file(open('profiling_some_function.log', 'w+'))
result = multiply(add(3, 4), add(5, 6))

!eliot-tree profiling_some_function.log

df447e60-29f0-49ee-8eba-c532e0b7a249
└── __main__.add/1 ⇒ started 2022-03-03 13:51:49 ⧖ 0.000s
    ├── x: 3
    ├── y: 4
    └── __main__.add/2 ⇒ succeeded 2022-03-03 13:51:49
        └── result: 7

a75bca69-1bdb-42b9-beac-5c3245cae725
└── __main__.add/1 ⇒ started 2022-03-03 13:51:49 ⧖ 0.000s
    ├── x: 5
    ├── y: 6
    └── __main__.add/2 ⇒ succeeded 2022-03-03 13:51:49
        └── result: 11

b16d1884-30aa-4cb8-9769-6beb41225c76
└── __main__.multiply/1 ⇒ started 2022-03-03 13:51:49 ⧖ 0.501s
    ├── x: 7
    ├── y: 11
    └── __main__.multiply/2 ⇒ succeeded 2022-03-03 13:51:49
        └── result: 77

Tip

Visualising how Python handles functions can be helpful in understanding then.

Click here to see an example.

Exercises¶

Exercise 1

What is profiling and why is it useful?

Solution

Profiling analyses your code in terms of speed or memory.

This helps identify where the bottlenecks are (why and where is it slow?) and how much potential there is for improvement.

Exercise 2

What profiling tool times the execution of a cell in a Jupyter Notebook?

Solution

%%timeit

Exercise 3

Below are two approaches for filling up an empty NumPy array.

Which approach is faster and why?

def fill_array_approach_1(n):
    array = np.empty(1)
    
    for index in range(n):
        new_point = np.random.rand()
        array = np.append(array, new_point)

    return array

def fill_array_approach_2(n):
    array = np.empty(n)

    for index in range(len(array)):
        new_point = np.random.rand()
        array[index] = new_point

    return array

Hint

Use %timeit.

Solution

Approach 2 is about 10x faster.

For example, over 1,000 data points:

%timeit fill_array_approach_1(1_000)
2.7 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit fill_array_approach_2(1_000)
251 µs ± 4.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is because approach 2 creates one empty array of the required size in memory once, before filling it up in a loop.

In contrast, approach 1 creates a new array each time by appending the new point.

Exercise 4

Below are two methods that find two numbers from an array of unique integers that add up to a target sum.

If the target can’t be made, then an empty list is returned.

Each element in the array can only be used once.

Which method is faster and why?

def two_sum_brute_force(array, target):  
    """
    A brute force approach to find two numbers from an array that add up to a target.
    
    Steps
    1. Loop through the array twice, adding up pairs of array elements.
    2. Compare each of these sums to the target.
    3. Return the pair that sums to the target, if one exists.
    """
    for index_one in range(len(array)):
        for index_two in range(index_one + 1, len(array)):
            if (
                array[index_one] + array[index_two] == target  # check sum of pair
                and index_one != index_two  # can't use the same element twice
            ):
                return [index_one, index_two]  # return the target pair

    return []  # return an empty list if the target pair isn't found

def two_sum_cache(array, target):
    """
    Use caching to find two numbers from an array that add up to a target.
    
    Steps
    1. Create a dictionary of cached differences relative to the target sum.
    2. Loop through the array once, adding each index and difference to the cache.
    3. If the required difference of a new array element is already in the cache, 
       then you've found a matching pair, which you can return.
    """
    cache_differences = {}
    for index, element in enumerate(array):
        difference = target - element  # calculate the target difference for this element
        if difference in cache_differences:  # if we have the matching pair
            return [index, cache_differences[difference]]  # return the target pair
        cache_differences[element] = index  # if we don't have a match, add to the cache

    return []  # return an empty list if the target pair isn't found

import numpy as np
array = np.random.choice(1_000, 500, replace=False)
target = 250

Solution

The cached version is faster.

This is because it only needs to loop through the array once.

So, for the brute force method, the time complexity is O(n^2) and the space complexity is O(1).

And for the cached method, the time complexity is O(n) and the space complexity is O(n). This is because the cache itself (i.e., the dictionary) takes up space up to the size of the array.

For example, using the array and target above:

%timeit two_sum_brute_force(array, target)
272 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit two_sum_cache(array, target)
31.8 µs ± 8.57 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Key Points¶

Important

Measure the time of cells, functions, and programs to find bottlenecks e.g., using timeit and line_profiler.
Visualise the profiled code e.g., using SnakeViz and pyinstrument.
Log profiling information e.g., using Eliot.
Consider how fast the code could go e.g., Big O notation.

Further information¶

Other options¶

pyheat
- Combines pprofile and matplotlib to show a heatmap of the profiled code.
VizTracer
- A low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.

Memory profiling¶

memory_profiler
- Measures the memory used by a function, at its peak and the overall increment.

Resources¶

Beyond cProfile: Choosing the right tool for performance optimization

SWD6: High Performance Python

Profiling

Contents

Profiling¶

timeit¶

line_profiler¶

SnakeViz¶

pyinstrument¶

Eliot¶

Exercises¶

Key Points¶

Further information¶

Other options¶

Memory profiling¶

Resources¶

`timeit`¶

`line_profiler`¶

`SnakeViz`¶

`pyinstrument`¶

`Eliot`¶