NumPy#

NumPy stands for numerical python and is a package that enables working with (primarily) numeric information in a multidimensional array and carrying out operations on those arrays efficiently. If you’re familiar with matrices in mathematics, often used in linear algebra, then you’re familiar with multidimensional arrays. If you’re not, we’ll get you up to speed here.

A reminder that to run any of the NumPy code below you’ll need to first import numpy. Conventionally, pythonistas (people who code in python) import NumPy using: import numpy as np, so we’ll encourage that here. Using this import statement, any time you want to reference an object in NumPy, you can do so with just the two letters np.

Be sure to import numpy using import numpy as np before attempting to run any of the included code in this section.
import numpy as np

Homogenous Data#

First and foremost, the NumPy package was developed around the concept of an array. Arrays are useful for storing homogenous data - or data that are all of the same type. Most often this is numeric information. For example, if you had numeric recordings from a bunch of participants from multiple visits to the lab, you could imagine storing each individual participant’s data in a row and their numeric recordings in columns. (If you’re picturing a spreadheet with numbers in cells of the spreadsheet, then you’ve got the right idea.) NumPy arrays are great for storing this type of information! Importantly, NumPy supports multidimensional arrays, including 2D arrays, like data stored in rows and columns, as well as multidimensional arrays.

Up to this point, if we wanted to store such information, we could have stored the numbers in a list, or maybe in a dictionary with the participant’s identifier as the key. And while, lists of lists are possible, managing them can be a nightmare and operating on them is not trivial. This is why the development of the numpy array was critical.

The NumPy array#

In programming, the term array refers to a data structure that enables the storage and retrieval of data. When we reference NumPy arrays, we often discuss each number being stored in a “cell” of a grid.

A simple one-dimensional array would store homogenous (again, typically numeric) data in something that looks a lot like a list:

5 1 8 3

However, arrays really shine when we work in more than a single dimensional data, such as a two-dimesional array:

3 7 1 8
4 9 2 6
5 3 10 7
6 1 8 4

Arrays can be more than two dimensional; however, for our purposes, we’ll stick to only working with 2D arrays for now.

Now that we have an idea of the types of information stored in arrays and what they look like, we’ll discuss the ground rules for NumPy arrays. In NumPy, arrays must:

  1. Store information of the same type (homogenous data)

  2. Remain the same total size once created

  3. Be rectangular (meaning every row of a 2D array must have the same number of columns)

The principle object within NumPy is the ndarray (which stands for N-dimensional array). Here we’ll create our first two arrays:

array_0 = np.array([[1, 2], [3, 4]])
array_0
array([[1, 2],
       [3, 4]])
array_1 = np.array([[5, 6], [7, 8]])
array_1
array([[5, 6],
       [7, 8]])

As a reminder, in the above we can easily see the structure of each of these arrays is a two dimensional array. Each has two rows and two columns. In array_0 the first row contains the integers 1 and 2 and the second row 3 and 4.

The reason we use arrays is because it maintains this structure, making row and column operations feasible. If we were to simply use a list of lists, instead of an array, the dimensionality (rows and columns) would be lost, as demonstrated here:

[[1, 2], [3, 4]] 
[[1, 2], [3, 4]]

Basic operations#

One of the many advantages of using ndarrays is that you can then easily carry out operations on your arrays. For example, your two arrays can be added together using + to carry out matrix addition.

array_0 + array_1
array([[12,  8],
       [10, 12]])

Similarly, matrix multiplication is now equally simple:

array_0 * array_1
array([[35, 12],
       [21, 32]])

While we won’t be covering the mathematical principles underlying these operations here, we can see that mathematically operating on an array is feasible in a way that was not possible with the variable types discussed up to this point.

Attributes#

Because the ndarray is the core object in NumPy, there are a number of helpful attributes (and methods - we’ll get there) associated with this object. Again, this is why object-oriented programming is particularly helpful. There are attributes attached to and methods associated with the ndarray object that are particularly helpful for working with data.

shape#

The first thing we often want to know about an array is its shape - how many rows? how many columns? The shape attribute stores this information:

array_0.shape
(2, 2)

The (2, 2) reports how many rows and how many columns are in the array. The first number will always be the number of rows and the second the number of columns.

size#

The total number of elements stored within an array can be accessed with the size attribute:

array_0.size
4

Here, we see that there are four total elements within the array_0 object

dtype#

As noted above, ndarrays store homogenous information. Typically, these will be numbers, but they aren’t required to be numbers. To determine the data type stored in the array, the dtype attribute can be used:

array_0.dtype
dtype('int64')

Above, we see that the information stored within array_0 are all integers

Note: There are additional array attributes, referenced here; however, mostare beyond the scope of knowledge required here.

Indexing & Slicing#

In addition to knowing information about the array, we often want to be able to access particular elements of the array.

For example, if you wanted to index into an array and find the value stored at a particular position, we can do so using our typical approach to indexing ([]). However, note here, that to access a single value within an array, we’ll need to provide both the row and column location within the array.

For example, to access the value in the first row but second column of array_0, we’d use the following:

array_0[0, 1]
2

A reminder that Python is zero-indexed, so the information in the first row will be accessed with the index 0 and the information in the second column will be accessed with the index 1.

Additionally, rows of data can be accessed using a single value when indexing. The following returns the first row of the array:

array_0[0]
array([1, 2])

Beyond accessing a single row, slices of the original array can also be accessed using the slice notation with which we’re familiar. For example, the following returns the first column of the array:

array_0[:, 0]
array([1, 3])

The : says select all rows, whereas the 0 indicates to only return the first column.`

Finally, as arrays are mutable, the ability to access particular elements in or parts of an array enables values within the array to be updated after object creation. If I wanted to change the first value in array_0 to be the number 7 instead of 1, I could do so using the following assignment:

array_0[0, 0] = 7
array_0
array([[7, 2],
       [3, 4]])

Methods#

In addition to attributes and the ability operate mathematically, ndarray objects have a number of helpful methods.

sum()#

For example, if you wanted to quickly compute the sum of all the values in an array, there’s the method sum for that:

array_0.sum()
16

Helpfully, this method can also operate to calculate the sum across the columns of arrays, by specifying the value 0 for the axis parameter:

array_0.sum(axis=0)
array([10,  6])

…or across rows by specifying the value 1:

array_0.sum(axis=1)
array([9, 7])

Aggregation functions#

Beyond sum, there are a number of methods that calculate some statistic across your array.

For example, max() provides the largest value in the array, min() the smallest, mean() the average, and std() the standard deviation:

# smallest value
array_0.min()
2
# largest vallue
array_0.max()
7
# average
array_0.mean()
4.0
# standard devaation
array_0.std()
1.8708286933869707

As with sum(), the axis parameter would carry out any of the operations by row or column. For example, calculating the mean for each column:

array_0.mean(axis=0)
array([5., 3.])

While we won’t walk through examples of all of the existing methods in NumPy, we’ll summarize a few common ones here:

Function

Purpose

tolist()

Convert the array to a nested list

fill()

Fill array with a particular value

transpose()

Transposes the axes of the array

all()

Returns True if all elements in array meet condition

any()

Returns True if any element in array meets condition

Additional methods can be found in the NumPy Documentation.

Functions#

While the ndarray object is the main object in NumPy, there are a number of additional functions provided within the package that add additional functionality when working with arrays.

Specifically, what if you wanted to find all of the unique values in an array quickly? There’s a function (np.array) for that.

For example, if you had the following array:

array_dups = np.array([[1, 5, 5, 5, 7, 9, 10],
                       [1, 5, 5, 5, 7, 9, 10],
                       [5, 7, 9, 9, 8, 2, 3]])
array_dups                  
array([[ 1,  5,  5,  5,  7,  9, 10],
       [ 1,  5,  5,  5,  7,  9, 10],
       [ 5,  7,  9,  9,  8,  2,  3]])

…you could use the np.unique() function to return the following to extract an array of all the unique values:

np.unique(array_dups)
array([ 1,  2,  3,  5,  7,  8,  9, 10])

Note that again the axis parameter would allow you to do the same across rows, returning only unique rows:

np.unique(array_dups, axis=0)
array([[ 1,  5,  5,  5,  7,  9, 10],
       [ 5,  7,  9,  9,  8,  2,  3]])

While we won’t walk through examples of all of the existing methods in NumPy, we’ll summarize a few common ones here:

Function

Purpose

np.where()

Idenfies location within matrix where condition is met

np.flip()

Reverses the order of an array

np.arange()

Add values in range to array

np.ones()

Fills a 2D array with ones

np.zeroes()

Fills a 2D array with zeroes

Exercises#

Q1. Create three 2D numpy array, each with 3 rows and 2 columns. Fill the first one with zeroes, teh second with ones, and the third one with a range of values.

Q2. Using, NumPy attributes, double check that each array has the correct shape and size.

Q3. Calculate the minimum, maximum, mean, and standard deviation of the values in the array you created with a range of values

Q4. Calculate the same metrics as in Q3, but by row.

Q5. Calculate the same metrics as in Q3, but by column.