Skip to content

2. Advance Numpy

After mastering the basics of NumPy, you can explore more advanced features that offer powerful functionality for numerical computing. This tutorial will cover advanced array manipulations, optimizations, and NumPy’s core capabilities in areas such as broadcasting, memory efficiency, linear algebra, and advanced indexing.

1. Broadcasting in Depth

Broadcasting allows NumPy to work with arrays of different shapes by automatically expanding their shapes to be compatible.

1.1 How Broadcasting Works

For broadcasting to work, two conditions must be satisfied:

  1. The dimensions must either be equal, or one of them must be 1.
  2. The arrays can be broadcasted along axes of size 1.

Example:

import numpy as np
arr1 = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3)
arr2 = np.array([10, 20, 30]) # Shape (3,)
# Broadcasting arr2 to match the shape of arr1
result = arr1 + arr2
print(result)

In the example above, arr2 is broadcast to match the shape of arr1 by replicating its elements along the missing axis.

1.2 Advanced Broadcasting Example

You can also broadcast arrays across more than one axis:

arr1 = np.array([[1], [2], [3]]) # Shape (3, 1)
arr2 = np.array([10, 20, 30]) # Shape (3,)
# Broadcasting over multiple axes
result = arr1 * arr2
print(result)

2. Vectorization for Speed

Vectorization refers to expressing operations in terms of array operations rather than loops. This takes advantage of NumPy’s optimized C code, resulting in significant speed-ups.

Example:

# Using loops (slow)
arr = np.arange(1e6)
squared = np.empty_like(arr)
for i in range(arr.size):
squared[i] = arr[i] ** 2
# Vectorized operation (fast)
squared_vec = arr ** 2

Why it’s faster: NumPy arrays are stored in contiguous memory blocks, allowing efficient processing in compiled C code. Avoiding Python loops also reduces overhead.


3. Advanced Array Manipulation

3.1 np.newaxis for Adding Dimensions

You can add dimensions to an array using np.newaxis (or None), which is helpful when reshaping arrays for broadcasting.

arr = np.array([1, 2, 3])
# Add a new axis to make it a column vector
arr_col = arr[:, np.newaxis]
print(arr_col) # Shape (3, 1)

3.2 Fancy Indexing

Fancy indexing allows for selecting arbitrary elements from an array based on lists or other arrays.

arr = np.array([10, 20, 30, 40, 50])
# Using a list to index
indices = [0, 2, 4]
print(arr[indices]) # Output: [10 30 50]

3.3 Boolean Masking

You can use boolean arrays to filter or modify elements.

arr = np.array([10, 20, 30, 40, 50])
# Mask for values greater than 30
mask = arr > 30
print(arr[mask]) # Output: [40 50]

4. Memory Layout of Arrays

Understanding memory layout is crucial for optimizing performance.

4.1 Strides and Views

Arrays in NumPy share memory via views. A stride is the number of bytes to step in each dimension when traversing an array.

Example:

arr = np.array([[1, 2, 3], [4, 5, 6]])
# Create a view
view = arr[:, :2]
# Check strides (step size for each axis in memory)
print(arr.strides) # Stride (16, 8) -> (row, column)
print(view.strides) # View has similar strides

4.2 Changing Memory Layout: C vs. F Order

NumPy arrays can be stored in row-major (C) or column-major (F) order. You can transpose arrays without copying memory.

arr = np.ones((3, 3), order='F') # Column-major order
print(arr.flags['C_CONTIGUOUS']) # False (not row-major)
print(arr.flags['F_CONTIGUOUS']) # True (column-major)

5. Structured Arrays

You can create structured arrays in NumPy that store heterogeneous data types. This is useful for datasets where each row contains different types of data (like a database).

dtype = [('name', 'S10'), ('age', 'i4'), ('weight', 'f4')]
data = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)], dtype=dtype)
# Accessing fields
print(data['name'])

6. NumPy and Linear Algebra

6.1 Eigenvalues and Eigenvectors

You can compute eigenvalues and eigenvectors using np.linalg.eig().

A = np.array([[1, 2], [2, 3]])
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

6.2 Singular Value Decomposition (SVD)

SVD decomposes a matrix into three other matrices and is useful in signal processing, data compression, and machine learning.

A = np.array([[1, 2], [2, 3], [3, 4]])
# SVD
U, s, Vt = np.linalg.svd(A)
print("U matrix:", U)
print("Singular values:", s)
print("Vt matrix:", Vt)

7. Broadcasting Rules for Advanced Use Cases

When working with arrays of different shapes, remember the broadcasting rules:

  1. Compare the shapes element-wise, starting from the trailing dimension.
  2. The dimensions are compatible if they are equal, or one of them is 1.

Example of a more complex broadcasting:

arr1 = np.ones((3, 1, 4))
arr2 = np.ones((1, 5, 4))
# Broadcasting allows element-wise operations even with different shapes
result = arr1 + arr2
print(result.shape) # Shape (3, 5, 4)

8. Optimizing Memory and Performance

8.1 Avoiding Copies

NumPy tries to avoid copies whenever possible. Using views instead of copies reduces memory overhead.

arr = np.arange(10)
# Create a view (no memory copy)
view = arr[::2]
# Modify the view
view[0] = 100
print(arr) # The original array is modified

8.2 In-place Operations

You can use in-place operations to modify arrays without creating new ones, reducing memory usage.

arr = np.array([1, 2, 3])
# In-place addition
arr += 10
print(arr) # Output: [11 12 13]

9. Using np.vectorize() for Function Broadcasting

You can use np.vectorize() to apply a function to arrays in an element-wise fashion, even if the function doesn’t support broadcasting natively.

def custom_func(x):
return x ** 2 + 2
vectorized_func = np.vectorize(custom_func)
# Apply the vectorized function to a NumPy array
arr = np.array([1, 2, 3])
result = vectorized_func(arr)
print(result) # Output: [3 6 11]

Key Takeaways for Advanced NumPy:

  • Broadcasting: A powerful tool for operating on arrays of different shapes.
  • Memory Efficiency: Learn how to use views, in-place operations, and avoid unnecessary copies.
  • Linear Algebra: Built-in support for complex matrix operations like eigenvalues, SVD, etc.
  • Vectorization: Write efficient code by avoiding loops and using NumPy’s internal optimizations.
  • Structured Arrays: Handle heterogeneous datasets effectively.

This advanced tutorial helps you dive deeper into NumPy’s full capabilities, focusing on efficient memory usage, performance optimization, and leveraging NumPy for more complex mathematical operations.