sparse_matrix

# Sparse Matrix and its visualization¶

This is a matrix (2 dimensional array) in which very few elements are non-zero. So, most of the elements are zero. So, the name sparse (thinly dispersed or scattered)

On contrast, if most of the elements are non-zero, it is called dense matrix.

It is computationally very expensive to store all the data (lot of zeros and their locations in the matrix).
So, there are numerous methods to store only non-zero data

SciPy gives an ease way of defining and storing this matrices. Several ways of storing sparse matrix using SciPy

Sparsity is measured by

Sparsity = number of zero elements / total elements

Sparse matrix knowledge has applications in many fields.

In CFD, this sparsity comes from the stencil ( arrangement of a nodal group). We will deal with very large sparse matrix depending on the size of the grid. Technically speaking these are banded matrices (non-zero elements are confined to a diagonal band)

In Data Science, say example for a users watch list data contains zeroes for most of the movies those user has not watched.

### Defining¶

Say there s a data matrix of 1 million elements, but only 10,000 elements are non-zero.
Here we can save memory by just saving those 10,000 elements and their location by the index (row and column numbers)
So, we will be saving 30,000 elements instead of saving 1 million elements.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.sparse import bsr_matrix


One of the way of defining sparse matrix is Block Sparse Row matrix from SciPy

In [2]:
row = np.array([0, 0, 1, 2, 2])
col = np.array([0, 2, 2, 0, 1])
data = np.array([1, 2, 3 ,4, 5])

a = bsr_matrix((data, (row, col)), shape=(3, 3)).toarray()


This is a 3 by 3 matrix, but has six elements non-zero. Their row and column index are defined.

### Visualization¶

In [3]:
a

Out[3]:
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 0]], dtype=int64)
In [4]:
plt.spy(a)
plt.show()


#### Another large sparse matrix¶

Don't worry about the below code, just generating random data for the indices and data points.
Just a quick code, not an efficient one.

In [5]:
def generate_random():
random_array = []
run = True
while run:
random_no = np.random.randint(1000)  #max data point value
if random_no not in random_array:
random_array.append(random_no)
if len(random_array)> 500: # max length of the array size
run = False
return random_array

def generate_matrix():
indices_1 = np.array(generate_random())
indices_2 = np.array(generate_random())
if max(indices_1)>max(indices_2):
row = indices_1
col = indices_2
else:
row = indices_2
col = indices_1

return (row,col)

data = np.array(generate_random())
row,col = generate_matrix()

s=max(row)+1

a=bsr_matrix((data, (row, col)), shape=(s,s)).toarray()

plt.spy(a, markersize=3)
plt.show()

In [6]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))