NumPy Linear Algebra

hemanshi
7 min readJun 30, 2021

--

Numpy provides the following functions to perform the different algebraic calculations on the input data.

NumPy package contains numpy.linalg module that provides all the functionality required for linear algebra. Some of the important functions in this module are described in the following table.

dot

Dot product of the two arrays

vdot

Dot product of the two vectors

inner

Inner product of the two arrays

matmul

Matrix product of the two arrays

determinant

Computes the determinant of the array

solve

Solves the linear matrix equation

inv

Finds the multiplicative inverse of the matrix

  1. Arrays in NumPy:

NumPy’s main object is the homogeneous multidimensional array.

  • It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.
  • In NumPy dimensions are called axes. The number of axes is rank.
  • NumPy’s array class is called ndarray. It is also known by the alias array.

Example :

[[ 1, 2, 3],
[ 4, 2, 5]]
Here,
rank = 2 (as it is 2-dimensional or it has 2 axes)
first dimension(axis) length = 2, second dimension has length = 3
overall shape can be expressed as: (2, 3)
# Python program to demonstrate
# basic array characteristics
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

Output :

Array is of type:  
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64

Numpy arrays are homogeneous in nature means it is an array that contains data of a single type only. Python’s lists and tuples, which are unrestricted in the type of data they contain. The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed-up when compared to simple non-vectorized operations.

Example 1 : Using vectorized sum method on NumPy array. We will compare the vectorized sum method along with simple non-vectorized operation i.e the iterative method to calculate the sum of numbers from 0–14,999.

# importing the modules

import numpy as np

import timeit

# vectorized sum

print(np.sum(np.arange(15000)))

print("Time taken by vectorized sum : ", end = "")

%timeit np.sum(np.arange(15000))

# itersative sum

total = 0

for item in range(0, 15000):

total += item

a = total

print("\n" + str(a))

print("Time taken by iterative sum : ", end = "")

%timeit a

Output :

The above example shows the more optimal nature of vectorized operations of NumPy when compared with non-vectorized operations. This means when computational efficiency is the key factor in a program and we should avoid using these simple operations, rather we should use NumPy vectorized functions.

Numpy ufunc | Universal functions

Universal functions in Numpy are simple mathematical functions. It is just a term that we gave to mathematical functions in the Numpy library. Numpy provides various universal functions that cover a wide variety of operations.

These functions include standard trigonometric functions, functions for arithmetic operations, handling complex numbers, statistical functions, etc. Universal functions have various characteristics which are as follows-

  • These functions operates on ndarray (N-dimensional array) i.e Numpy’s array class.
  • It performs fast element-wise array operations.
  • It supports various features like array broadcasting, type casting etc.
  • Numpy, universal functions are objects those belongs to numpy.ufunc class.
  • Python functions can also be created as a universal function using frompyfunc library function.
  • Some ufuncs are called automatically when the corresponding arithmetic operator is used on arrays. For example when addition of two array is performed element-wise using ‘+’ operator then np.add() is called internally.

Broadcasting function

The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. Arithmetic operations on arrays are usually done on corresponding elements. If two arrays are of exactly the same shape, then these operations are smoothly performed.

Example 1

import numpy as np a = np.array([1,2,3,4]) 
b = np.array([10,20,30,40])
c = a * b
print c

Its output is as follows −

[10   40   90   160]

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible. However, operations on arrays of non-similar shapes is still possible in NumPy, because of the broadcasting capability. The smaller array is broadcast to the size of the larger array so that they have compatible shapes.

Broadcasting is possible if the following rules are satisfied −

  • Array with smaller ndim than the other is prepended with ‘1’ in its shape.
  • Size in each dimension of the output shape is maximum of the input sizes in that dimension.
  • An input can be used in calculation, if its size in a particular dimension matches the output size or its value is exactly 1.
  • If an input has a dimension size of 1, the first data entry in that dimension is used for all calculations along that dimension.

A set of arrays is said to be broadcastable if the above rules produce a valid result and one of the following is true −

  • Arrays have exactly the same shape.
  • Arrays have the same number of dimensions and the length of each dimension is either a common length or 1.
  • Array having too few dimensions can have its shape prepended with a dimension of length 1, so that the above stated property is true.

The following program shows an example of broadcasting.

Example 2

import numpy as np 
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]])
b = np.array([1.0,2.0,3.0])

print 'First array:'
print a
print '\n'

print 'Second array:'
print b
print '\n'

print 'First Array + Second Array'
print a + b

The output of this program would be as follows −

First array:
[[ 0. 0. 0.]
[ 10. 10. 10.]
[ 20. 20. 20.]
[ 30. 30. 30.]]
Second array:
[ 1. 2. 3.]
First Array + Second Array
[[ 1. 2. 3.]
[ 11. 12. 13.]
[ 21. 22. 23.]
[ 31. 32. 33.]]

The following figure demonstrates how array b is broadcast to become compatible with a.

Boolean Mask

Boolean masking is typically the most efficient way to quantify a sub-collection in a collection. Masking in python and data science is when you want manipulated data in a collection based on some criteria. The criteria you use is typically of a true or false nature, hence the boolean part. They can also be used for indexing but it is very different as compare to index arrays.

Understand it by taking an example

Problem Statement: Imagine a situation where we want to get all the values from the array which is less then the mean of the entire array.

Solution : Firstly creating an array (Consider 2D Arrays)

Secondly, Calculating mean

Calculate mean using mean attribute of the array a.

According to problem we need the values less than mean-

One way is to run the for loop over an array and get them.
But, using numpy masking we do this in one single line.

Thirdly, Each value of an array compared with the mean , if it is less we retain the value — Creating a mask and will do a mask everywhere, where a is less than mean.

Let’s see the Output-

Matrix a with boolean values in every spot

True — where values are less than mean.

False — values are greater than mean.

Lastly, retrieving all the values of a that satisfy mask-

All values are less than mean

Counting entries

To count the number of True entries in a Boolean array, np.count_nonzero is useful.

We see that there are 10 array entries that are less than mean.

Another way to get at this information is to use np.sum

Basic Datetimes

The most basic way to create datetimes is from strings in ISO 8601 date or datetime format. It is also possible to create datetimes from an integer by offset relative to the Unix epoch (00:00:00 UTC on 1 January 1970). The unit for internal storage is automatically selected from the form of the string, and can be either a date unit or a time unit. The date units are years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’), and some additional SI-prefix seconds-based units. The datetime64 data type also accepts the string “NAT”, in any combination of lowercase/uppercase letters, for a “Not A Time” value.

Example

A simple ISO date:

>>> np.datetime64('2005-02-25')
numpy.datetime64('2005-02-25')

From an integer and a date unit, 1 year since the UNIX epoch:

>>> np.datetime64(1, 'Y')
numpy.datetime64('1971')

Using months for the unit:

>>> np.datetime64('2005-02')
numpy.datetime64('2005-02')

Specifying just the month, but forcing a ‘days’ unit:

>>> np.datetime64('2005-02', 'D')
numpy.datetime64('2005-02-01')

From a date and time:

>>> np.datetime64('2005-02-25T03:30')
numpy.datetime64('2005-02-25T03:30')

NAT (not a time):

>>> np.datetime64('nat')
numpy.datetime64('NaT')

--

--