NUMPY tutorial
Several numpy tutorials can be found here
Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.
For MATLAB users, consider consulting the Numpy's NumPy for Matlab Users page.
Numpy’s main object is the multidimensional numpy array
:
=> it is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers
=> array dimensions are called axes
=> the number of dimensions is called the array rank
=> numpy's array class is called ndarray
Import the numpy
package as follows:
import numpy as np
Create arrays from a sequence of values (Python list or tuple).
# create from a sequence of values (Python list or tuple):
a = np.array([0, 1, 2]) #>> create from list
a = np.array((0, 1, 2)) #>> create from tuple
print(type(a))
a
<class 'numpy.ndarray'>
array([0, 1, 2])
# create multi-dimentional arrays from nested lists
nested_list = [[1,2,3],[4,5,6]]
a = np.array(nested_list)
a
array([[1, 2, 3], [4, 5, 6]])
# specify data type upon creation
a = np.array([1, 2, 3], dtype='uint8')
print(a.dtype)
a = np.array([1, 2, 3], dtype='float32')
print(a.dtype)
a = np.array(['1', '2', '3'])
print(a.dtype)
uint8 float32 <U1
Create a 1D array of values, specifying the start
, stop
, step
. The arange function is analogous to the Python built-in range, but returns an array.
a = np.arange(0, 10)
a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.arange(6) # 1d array
print(a)
[0 1 2 3 4 5]
Arange can be combined with the reshape()
function to create multidimensional arrays:
b = np.arange(12).reshape(4, 3) # 2d array (12 elements, arranged as 4 rows x 3 columns)
print(b)
[[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]]
# WARNING: in the following example, the 3D array is filled sequentially, parsing the values through the various channels
# => does NOT fill a channel before going to the next)
# => in this case the first channel has values 0, 4, 8, 12, 16, 20 (since there are 4 channels in array)
c = np.arange(24).reshape(2, 3, 4) # 3d array (24 elements, arranged as 4 matrices of 2 rows x 3 columns)
print(c)
print(c[:,:,0])
[[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] [[ 0 4 8] [12 16 20]]
Create an array of zeros, specifying the array shape as a tuple.
a = np.zeros((3, 4))
a
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
Create an array of ones, specifying the array shape as a tuple.
a = np.ones((3, 4))
a
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
Create an array of unique values, specifying the array shape as a tuple.
a = np.full((2,2), 7)
print(a)
[[7 7] [7 7]]
Create the identity matrix, specifying the array shape as a tuple.
a = np.eye(2)
print(a)
[[1. 0.] [0. 1.]]
Create an array of random floats in the range [0-1], specifying the array shape as a tuple.
a = np.random.random((2,2)) # Create an array filled with random values
print(a)
[[0.43048459 0.06487667] [0.87162144 0.11032227]]
Creates an array whose initial content is random and depends on the state of the memory.
a = np.empty((2,2))
a
array([[0.51020597, 0.58265003], [0.49145982, 0.48806746]])
Create an empty array with the same shape as a.
a = np.ones((2,2))
b = np.empty_like(a) # Create an empty matrix with the same shape as x
b
array([[0.51020597, 0.58265003], [0.49145982, 0.48806746]])
# create 2D array
a = np.array([[1, 2, 3], [4, 5, 6]])
a
array([[1, 2, 3], [4, 5, 6]])
Returns the dimensions of the array as a tuple of integers:
(nb_rows, nb_columns)
for 2D arrays, (nb_rows, nb_columns, nb_channels)
for 3D arrays
a.shape
(2, 3)
Returns the number of dimensions
of the array.
a.ndim
2
len(a.shape) # >> equivalent to asking for the length of the array shape
2
Returns the type of the elements in the array.
a.dtype
dtype('int64')
Numpy offers indexing
and slicing
, similar to Python lists:
However because arrays may be multidimensional, you must specify a slice for each dimension of the array:
array[rows, cols, channels]
# --- REMINDER: indexing/slicing Python lists
l = [1, 2, 3, 4, 5] # create list
# - access single element
l[0] # access first element
l[-1] # access last element
l[-2] # access second to last element
# - slice (access multiple elements)
l[1:3] # access 2nd & 4th elements
l[1:-2] # access 2nd until 2nd to last element
l[:3] # access all elements from start until 4th element
l[3:] # access all elements from 4th element until end
l[::2] # access every nth element
# - assign element
l[0] = 0 # replace element
l[1:2] = [-1, -2] # assign a sublist to a slice
# --- 1D numpy array
# => index/slice array just like a list
a = np.arange(6) # create 1D array
print(a)
print(a[0]) # access first element
print(a[1:3]) # access 2nd & 4th elements
[0 1 2 3 4 5] 0 [1 2]
# --- 2D numpy array
# => specify a slice for each dimension of the array
b = np.arange(12).reshape(4, 3) # create 2D array (12 elements, arranged as 4 rows x 3 columns)
print(b)
b[:, 0] # access all row elements from the first column
b[-1, -2:] # access the last 2 elements from the last row
[[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]]
array([10, 11])
# --- 3D numpy array
# WARNING: in the following example, the 3D array is filled sequentially, parsing the values through the various channels
# => does NOT fill a channel before going to the next)
# => in this case the first channel has values 0, 4, 8, 12, 16, 20 (since there are 4 channels in array)
c = np.arange(24).reshape(2, 3, 4) # create 3d array (24 elements, arranged as 4 matrices of 2 rows x 3 columns)
print(c)
c[:, :, 0] # access all rows and columns from the first channel
c[..., 0] # (equivalent to above command)
[[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]]
array([[ 0, 4, 8], [12, 16, 20]])
Boolean indexing
Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (a > 2) # Find the elements of a that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.
print(bool_idx)
[[False False] [ True True] [ True True]]
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])
# We can do all of the above in a single concise statement:
print(a[a > 2])
[3 4 5 6] [3 4 5 6]
Returns the array, flattened.
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a)
a.ravel()
[[1 2 3] [4 5 6]]
array([1, 2, 3, 4, 5, 6])
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a)
a.T
[[1 2 3] [4 5 6]]
array([[1, 4], [2, 5], [3, 6]])
Stack arrays horizontally.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.hstack((a, b))
c
array([1, 2, 3, 4, 5, 6])
Stack arrays vertically.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.vstack((a, b))
c
array([[1, 2, 3], [4, 5, 6]])
Stack arrays depth-wise.
(Useful to create RGB arrays from distinct R, G, B channels.)
R = np.zeros((3, 3)) # create 1st channel with 0
G = np.ones((3, 3)) # create 2nd channel with 1
B = np.full((3, 3), 10) # create 3rd channel with 10
RGB = np.dstack((R, G, B))
RGB
array([[[ 0., 1., 10.], [ 0., 1., 10.], [ 0., 1., 10.]], [[ 0., 1., 10.], [ 0., 1., 10.], [ 0., 1., 10.]], [[ 0., 1., 10.], [ 0., 1., 10.], [ 0., 1., 10.]]])
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a)
a.reshape(3,2)
[[1 2 3] [4 5 6]]
array([[1, 2], [3, 4], [5, 6]])
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a)
a.resize((3, 2))
a
[[1 2 3] [4 5 6]]
array([[1, 2], [3, 4], [5, 6]])
Every numpy array is a grid of elements of the same type.
You can explicitly specify the datatype when creating an array.
See numpy documentation about all numpy datatypes here.
x = np.array([1, 2]) # dtype not specified => guessed by numpy => int
y = np.array([1.0, 2.0]) # dtype not specified => guessed by numpy => float
z = np.array([1, 2], dtype=np.uint8) # dtype specified
print(x.dtype, y.dtype, z.dtype)
int64 float64 uint8
Arithmetic operators on arrays apply `elementwise`. A new array is created and filled with the result.
Basic mathematical functions are alse available as functions in the numpy module (ex: np.add()
, etc.). See the full list of mathematical functions provided by numpy in the documentation.
Note: unlike MATLAB, *
is elementwise multiplication, not matrix multiplication. Numpy uses instead the dot
function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices.
# --- create arrays
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
# --- elementwise sum
print(x + y)
print(np.add(x, y)) #>> equivalent result
[[ 6 8] [10 12]] [[ 6 8] [10 12]]
# --- elementwise difference
print(x - y)
print(np.subtract(x, y)) #>> equivalent result
[[-4. -4.] [-4. -4.]] [[-4. -4.] [-4. -4.]]
# --- elementwise product
print(x * y)
print(np.multiply(x, y)) #>> equivalent result
[[ 5. 12.] [21. 32.]] [[ 5. 12.] [21. 32.]]
# --- elementwise division
print(x / y)
print(np.divide(x, y)) #>> equivalent result
[[0.2 0.33333333] [0.42857143 0.5 ]] [[0.2 0.33333333] [0.42857143 0.5 ]]
# --- elementwise square root
print(np.sqrt(x))
[[1. 1.41421356] [1.73205081 2. ]]
# --- dot product
# NB: unlike MATLAB, `*` is elementwise multiplication, not matrix multiplication. Numpy uses instead the `dot` function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices.
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])
# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))
219 219
You can also use the @
operator which is equivalent to numpy's dot
operator.
print(v @ w)
219
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))
print(x @ v)
[29 67] [29 67] [29 67]
# Matrix / matrix product; both produce the rank 2 array
print(x.dot(y))
print(np.dot(x, y))
print(x @ y)
[[19 22] [43 50]] [[19 22] [43 50]] [[19 22] [43 50]]
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.
Example: add a constant vector to each row of a matrix
# => add vector v to each row of the matrix x, storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x) # Create an empty matrix with the same shape as x
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
print(y)
[[ 2 2 4] [ 5 5 7] [ 8 8 10] [11 11 13]]
This works; however when the matrix x
is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x
is equivalent to forming a matrix vv
by stacking multiple copies of v
vertically, then performing elementwise summation of x
and vv
. We could implement this approach like this:
vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other
print(vv) # Prints "[[1 0 1]
# [1 0 1]
# [1 0 1]
# [1 0 1]]"
[[1 0 1] [1 0 1] [1 0 1] [1 0 1]]
y = x + vv # Add x and vv elementwise
print(y)
[[ 2 2 4] [ 5 5 7] [ 8 8 10] [11 11 13]]
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not.
There are three cases.
Simple assignments make no copy of objects or their data.
a = np.ones((5, 5), dtype='uint8')
b = a # no new object is created
b is a # a and b are two names for the same ndarray object
True
Different array objects can share the same data. The view
method creates a new array object that looks at the same data.
a = np.ones((5, 5), dtype='uint8')
a_shallow = a.view() # make shallow copy: ``a_deep`` is a 'view' of the data owned by a
print(a_shallow is a)
print(a_shallow.base is a )
False True
a_shallow[0, 0] = 100 # a's data changes
a
array([[100, 1, 1, 1, 1], [ 1, 1, 1, 1, 1], [ 1, 1, 1, 1, 1], [ 1, 1, 1, 1, 1], [ 1, 1, 1, 1, 1]], dtype=uint8)
a_shallow = a_shallow.reshape((1, 25)) # a's shape doesn't change
a.shape
(5, 5)
The copy method makes a complete copy of the array and its data.
a = np.ones((5, 5), dtype='uint8')
a_deep = a.copy() # a new array ``a_deep`` with new data is created
print(a_deep is a)
print(a_deep.base is a) # a_deep doesn't share anything with a
a_deep[0, 0] = 9999 # changing the deep copy does not change the original variable
a
False False
array([[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], dtype=uint8)
del a # the memory of ``a`` can be released
When you print an array, NumPy displays it in a similar way to nested lists.
(In the case of large arrays, it prints the first and last elements only.)
a = np.arange(6) # 1d array
print(a)
[0 1 2 3 4 5]
b = np.arange(12).reshape(4, 3) # 2d array (12 elements, arranged as 4 rows, 3 columns)
print(b)
[[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]]
c = np.arange(24).reshape(2, 3, 4) # 3d array (24 elements, arranged as 4 matrices of 2 rows and 3 columns)
print(c)
[[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]]