2. Advance Numpy
After mastering the basics of NumPy, you can explore more advanced features that offer powerful functionality for numerical computing. This tutorial will cover advanced array manipulations, optimizations, and NumPy’s core capabilities in areas such as broadcasting, memory efficiency, linear algebra, and advanced indexing.
1. Broadcasting in Depth
Broadcasting allows NumPy to work with arrays of different shapes by automatically expanding their shapes to be compatible.
1.1 How Broadcasting Works
For broadcasting to work, two conditions must be satisfied:
- The dimensions must either be equal, or one of them must be 1.
- The arrays can be broadcasted along axes of size 1.
Example:
import numpy as np
arr1 = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3)arr2 = np.array([10, 20, 30]) # Shape (3,)
# Broadcasting arr2 to match the shape of arr1result = arr1 + arr2print(result)
In the example above, arr2
is broadcast to match the shape of arr1
by replicating its elements along the missing axis.
1.2 Advanced Broadcasting Example
You can also broadcast arrays across more than one axis:
arr1 = np.array([[1], [2], [3]]) # Shape (3, 1)arr2 = np.array([10, 20, 30]) # Shape (3,)
# Broadcasting over multiple axesresult = arr1 * arr2print(result)
2. Vectorization for Speed
Vectorization refers to expressing operations in terms of array operations rather than loops. This takes advantage of NumPy’s optimized C code, resulting in significant speed-ups.
Example:
# Using loops (slow)arr = np.arange(1e6)squared = np.empty_like(arr)for i in range(arr.size): squared[i] = arr[i] ** 2
# Vectorized operation (fast)squared_vec = arr ** 2
Why it’s faster: NumPy arrays are stored in contiguous memory blocks, allowing efficient processing in compiled C code. Avoiding Python loops also reduces overhead.
3. Advanced Array Manipulation
3.1 np.newaxis
for Adding Dimensions
You can add dimensions to an array using np.newaxis
(or None
), which is helpful when reshaping arrays for broadcasting.
arr = np.array([1, 2, 3])
# Add a new axis to make it a column vectorarr_col = arr[:, np.newaxis]print(arr_col) # Shape (3, 1)
3.2 Fancy Indexing
Fancy indexing allows for selecting arbitrary elements from an array based on lists or other arrays.
arr = np.array([10, 20, 30, 40, 50])
# Using a list to indexindices = [0, 2, 4]print(arr[indices]) # Output: [10 30 50]
3.3 Boolean Masking
You can use boolean arrays to filter or modify elements.
arr = np.array([10, 20, 30, 40, 50])
# Mask for values greater than 30mask = arr > 30print(arr[mask]) # Output: [40 50]
4. Memory Layout of Arrays
Understanding memory layout is crucial for optimizing performance.
4.1 Strides and Views
Arrays in NumPy share memory via views. A stride is the number of bytes to step in each dimension when traversing an array.
Example:
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Create a viewview = arr[:, :2]
# Check strides (step size for each axis in memory)print(arr.strides) # Stride (16, 8) -> (row, column)print(view.strides) # View has similar strides
4.2 Changing Memory Layout: C
vs. F
Order
NumPy arrays can be stored in row-major (C
) or column-major (F
) order. You can transpose arrays without copying memory.
arr = np.ones((3, 3), order='F') # Column-major orderprint(arr.flags['C_CONTIGUOUS']) # False (not row-major)print(arr.flags['F_CONTIGUOUS']) # True (column-major)
5. Structured Arrays
You can create structured arrays in NumPy that store heterogeneous data types. This is useful for datasets where each row contains different types of data (like a database).
dtype = [('name', 'S10'), ('age', 'i4'), ('weight', 'f4')]data = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)], dtype=dtype)
# Accessing fieldsprint(data['name'])
6. NumPy and Linear Algebra
6.1 Eigenvalues and Eigenvectors
You can compute eigenvalues and eigenvectors using np.linalg.eig()
.
A = np.array([[1, 2], [2, 3]])
# Eigenvalues and eigenvectorseigenvalues, eigenvectors = np.linalg.eig(A)print("Eigenvalues:", eigenvalues)print("Eigenvectors:", eigenvectors)
6.2 Singular Value Decomposition (SVD)
SVD decomposes a matrix into three other matrices and is useful in signal processing, data compression, and machine learning.
A = np.array([[1, 2], [2, 3], [3, 4]])
# SVDU, s, Vt = np.linalg.svd(A)print("U matrix:", U)print("Singular values:", s)print("Vt matrix:", Vt)
7. Broadcasting Rules for Advanced Use Cases
When working with arrays of different shapes, remember the broadcasting rules:
- Compare the shapes element-wise, starting from the trailing dimension.
- The dimensions are compatible if they are equal, or one of them is 1.
Example of a more complex broadcasting:
arr1 = np.ones((3, 1, 4))arr2 = np.ones((1, 5, 4))
# Broadcasting allows element-wise operations even with different shapesresult = arr1 + arr2print(result.shape) # Shape (3, 5, 4)
8. Optimizing Memory and Performance
8.1 Avoiding Copies
NumPy tries to avoid copies whenever possible. Using views instead of copies reduces memory overhead.
arr = np.arange(10)
# Create a view (no memory copy)view = arr[::2]
# Modify the viewview[0] = 100print(arr) # The original array is modified
8.2 In-place Operations
You can use in-place operations to modify arrays without creating new ones, reducing memory usage.
arr = np.array([1, 2, 3])
# In-place additionarr += 10print(arr) # Output: [11 12 13]
9. Using np.vectorize()
for Function Broadcasting
You can use np.vectorize()
to apply a function to arrays in an element-wise fashion, even if the function doesn’t support broadcasting natively.
def custom_func(x): return x ** 2 + 2
vectorized_func = np.vectorize(custom_func)
# Apply the vectorized function to a NumPy arrayarr = np.array([1, 2, 3])result = vectorized_func(arr)print(result) # Output: [3 6 11]
Key Takeaways for Advanced NumPy:
- Broadcasting: A powerful tool for operating on arrays of different shapes.
- Memory Efficiency: Learn how to use views, in-place operations, and avoid unnecessary copies.
- Linear Algebra: Built-in support for complex matrix operations like eigenvalues, SVD, etc.
- Vectorization: Write efficient code by avoiding loops and using NumPy’s internal optimizations.
- Structured Arrays: Handle heterogeneous datasets effectively.
This advanced tutorial helps you dive deeper into NumPy’s full capabilities, focusing on efficient memory usage, performance optimization, and leveraging NumPy for more complex mathematical operations.