Numpy reference: https://docs.scipy.org/doc/numpy-1.13.0/reference/
Numpy is a powerful python library that allows for efficient operations on arrays.
Install jupyter:
pip3 install jupyter
Launch your notebook (opens in browser):
jupyter notebook [name_of_file.ipynb]
Alternatively, you can run Jupyter Notebooks in Google Drive using Colaboratory.
numpy
is a library made by other people! We need to import
libraries in order to use them.
import numpy as np
Numpy’s main use is np.array
Numpy arrays take less space than built-in lists and come with a wide variety of useful functions.
# make an array
a = np.array([2,3,4])
a
array([2, 3, 4])
# make a 2-dimensional array (matrix)
matrix = np.array([ [1,2,3],
[4,5,6],
[7,8,9] ])
matrix
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# you can multiply matrices with np.dot
np.dot(matrix, a)
array([20, 47, 74])
These operations are convenient and extremeley fast. Much faster than accomplishing the same thing with a for loop.
You can add/subtract/multiply/divide with numpy arrays! You cannot do this with built-in python lists.
a + 5
array([7, 8, 9])
a * -1
array([-2, -3, -4])
b = np.array([3, 2, 1])
a + b
array([5, 5, 5])
If you try to perform operations on two arrays of different lengths, an error will occur. Try running the following cell!
# Run me!
b + np.array([1, 2, 3, 4])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <\ipython-input-10-01af955792b6> in <\module>() 1 # Run me! ----> 2 b + np.array([1, 2, 3, 4]) ValueError: operands could not be broadcast together with shapes (3,) (4,)
You will also get an error when trying to access the value at an index that does not exist in the array.
Why do we use Numpy? Numpy provides a multitude of useful functions for arrays. We’ll teach you a few (many more exist!)
Exercise:Search online how to find the mean of a numpy array.
Use len( array )
to find length of array.
len(b)
3
len(np.array([1, 2, 3, 4]))
4
Conditionals apply to every element of a numpy array as well. This will come in handy later!
a = np.array([1, 2, 3, 1, 1])
a == 1
array([ True, False, False, True, True], dtype=bool)
x = np.array([1, 5, -7, 18, 1, -2, 4])
# Find the mean of array x
x_mean = np.mean(x)
Here, we’ll give you a list of some useful numpy functions. Remember, you can easily find info about these by searching google / numpy documentation!
np.sum(x)
20
np.min(x)
-7
np.max(x)
18
np.median(x)
1.0
np.cumsum(x)
array([ 1, 6, -1, 17, 18, 16, 20])
np.abs(x)
array([ 1, 5, 7, 18, 1, 2, 4])
What do you think np.cumsum
does? Note, numpy has a similar function np.cumprod
. Try it!
What do you think np.diff
does?
np.diff(x)
array([ 4, -12, 25, -17, -3, 6])
Two super useful functions in numpy are np.arange
and np.linspace
. They allow you to craft arrays with equidistant values:
start
], stop
, and [step
]start
, stop
, and num
np.arange(0, 100, 10)
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
np.linspace(0, 100, 15)
array([ 0. , 7.14285714, 14.28571429, 21.42857143, 28.57142857, 35.71428571, 42.85714286, 50. , 57.14285714, 64.28571429, 71.42857143, 78.57142857, 85.71428571, 92.85714286, 100. ])
Using np.arrays
in python is a little bit different than with built-in lists.
a = np.array([2, 3, 4])
b = [2, 3, 4]
print(a)
print(b)
[2 3 4] [2, 3, 4]
b.append("hello")
b
[2, 3, 4, 'hello']
a = np.append(a, 'hello')
a
array(['2', '3', '4', 'hello'], dtype='< U21')
c = np.array([1, 2, 3, 4, 5])
cumulative_product = 1
for element in c:
cumulative_product *= element
cumulative_product
120
Use np.arange
to create an array called arr1
that contains every odd number from 1 to 100, inclusive.
arr1 = np.arange(1, 100, 2)
arr1
array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99])
Use arr1
to create an array arr2
of every number divisible by 4 from 1 to 200, inclusive.
arr2 = (arr1 + 1) * 2
arr2
array([ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116, 120, 124, 128, 132, 136, 140, 144, 148, 152, 156, 160, 164, 168, 172, 176, 180, 184, 188, 192, 196, 200])
Create the same array, but using np.linspace
instead. Call this array arr3
.
arr3 = np.linspace(4, 200, 50)
arr3
array([ 4., 8., 12., 16., 20., 24., 28., 32., 36., 40., 44., 48., 52., 56., 60., 64., 68., 72., 76., 80., 84., 88., 92., 96., 100., 104., 108., 112., 116., 120., 124., 128., 132., 136., 140., 144., 148., 152., 156., 160., 164., 168., 172., 176., 180., 184., 188., 192., 196., 200.])
Print the following summary statistics for arr3
:
np.percentile()
)print('Minimum: ' + str(np.min(arr3)))
print('1st quartile: ' + str(np.percentile(arr3, 25)))
print('Median: ' + str(np.median(arr3)))
print('Mean: ' + str(np.mean(arr3)))
print('Standard Deviation: ' + str(np.std(arr3)))
print('3rd Quartile: ' + str(np.percentile(arr3, 75)))
print('Max: ' + str(np.max(arr3)))
Minimum: 4.0 1st quartile: 53.0 Median: 102.0 Mean: 102.0 Standard Deviation: 57.7234787586 3rd Quartile: 151.0 Max: 200.0
While it may not have been obvious from the token examples in this tutorial, when we are dealing with huge, multi-dimensional arrays numpy is vastly superior than python lists in terms of speed.
Applying arithmetic operations or functions on numpy arrays is also much faster than manually going through a python for loop to accomplish the same task.