Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.
Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.
Our first seaborn plot
Here’s an example of what seaborn can do:
# Import seaborn
import seaborn as sns# Apply the default theme
sns.set_theme()# Load an example dataset
tips = sns.load_dataset("tips")# Create a visualization
sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
A few things have happened here. Let’s go through them one by one:
# Import seaborn
import seaborn as sns
Seaborn is the only library we need to import for this simple example. By convention, it is imported with the shorthand sns
.
Behind the scenes, seaborn uses matplotlib to draw its plots. For interactive work, it’s recommended to use a Jupyter/IPython interface in matplotlib mode, or else you’ll have to call matplotlib.pyplot.show()
when you want to see the plot.
# Apply the default theme
sns.set_theme()
This uses the matplotlib rcParam system and will affect how all matplotlib plots look, even if you don’t make them with seaborn. Beyond the default theme, there are several other options, and you can independently control the style and scaling of the plot to quickly translate your work between presentation contexts (e.g., making a version of your figure that will have readable fonts when projected during a talk). If you like the matplotlib defaults or prefer a different theme, you can skip this step and still use the seaborn plotting functions.
# Load an example dataset
tips = sns.load_dataset("tips")
Most code in the docs will use the load_dataset()
function to get quick access to an example dataset. There’s nothing special about these datasets: they are just pandas dataframes, and we could have loaded them with pandas.read_csv()
or built them by hand. Most of the examples in the documentation will specify data using pandas dataframes, but seaborn is very flexible about the data structures that it accepts.
# Create a visualization
sns.relplot(
data=tips,
x="total_bill", y="tip", col="time",
hue="smoker", style="smoker", size="size",
)
This plot shows the relationship between five variables in the tips dataset using a single call to the seaborn function relplot()
. Notice how we provided only the names of the variables and their roles in the plot. Unlike when using matplotlib directly, it wasn’t necessary to specify attributes of the plot elements in terms of the color values or marker codes. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib.
Seaborn | Style And Color
Seaborn is a statistical plotting library in python. It has beautiful default styles. This article deals with the ways of styling the different kinds of plots in seaborn.
Seaborn Figure Styles
This affects things like the color of the axes, whether a grid is enabled by default, and other aesthetic elements.
The ways of styling themes are as follows:
- white
- dark
- whitegrid
- darkgrid
- ticks
Set the background to be white:
Given style with the help of countplot and the dataset is present in seaborn by default. load_dataset() function is used to load the dataset. set_style() function is used for plot styling.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
# load the tips dataset present by default in seaborn
tips =
sns.load_dataset('tips')
sns.set_style('white')
# make a countplot
sns.countplot(x ='sex', data =
tips)
Output:
Set the background to ticks:
Ticks appear on the sides of the plot on setting it as set_style(‘ticks’). palette attribute is used to set the color of the bars. It helps to distinguish between chunks of data.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
tips =
sns.load_dataset('tips')
sns.set_style('ticks')
sns.countplot(x ='sex', data =
tips, palette =
'deep')
Output:
Set the background to be darkgrid:
Darkgrid appear on the sides of the plot on setting it as set_style(‘darkgrid’). palette attribute is used to set the color of the bars. It helps to distinguish between chunks of data.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
# load the tips dataset present by default in seaborn
tips =
sns.load_dataset('tips')
sns.set_style('darkgrid')
# make a countplot
sns.countplot(x ='sex', data =
tips)
Output:
Set the background to be Whitegrid:
Whitegrid appears on the sides of the plot on setting it as set_style(‘whitegrid’). palette attribute is used to set the color of the bars. It helps to distinguish between chunks of data.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
# load the tips dataset present by default in seaborn
tips =
sns.load_dataset('tips')
sns.set_style('whitegrid')
# make a countplot
sns.countplot(x ='sex', data =
tips)
Output:
Removing Axes Spines
The despine() is a function that removes the spines from the right and upper portion of the plot by default. sns.despine(left = True) helps remove the spine from the left.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
tips =
sns.load_dataset('tips')
sns.countplot(x ='sex', data =
tips)
sns.despine()
Output
Size and aspect
Non grid plot: The figure() is a matplotlib function used to plot the figures. The figsize is used to set the size of the figure.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
tips =
sns.load_dataset('tips')
plt.figure(figsize =(12, 3))
sns.countplot(x ='sex', data =
tips)
Output:
Grid type plot: This example shows a regression plot of tips vs the total_bill from the dataset. lmplot stands for linear model plot and is used to create a regression plot. x =’total_bill’ sets the x axis to total_bill. y=’tip’ sets the y axis to tips. size=2 is used to the size(the height)of the plot. aspect is used to set the width keeping the width constant.
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
tips =
sns.load_dataset('tips')
sns.lmplot(x ='total_bill', y ='tip', size =
2, aspect =
4, data =
tips)
Output:
Scale and Context
The set_context() allows us to override default parameters. This affects things like the size of the labels, lines, and other elements of the plot, but not the overall style.
The context are:
- poster
- paper
- notebook
- talk
Seaborn | Distribution Plots
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. This article deals with the distribution plots in seaborn which is used for examining univariate and bivariate distributions. In this article we will be discussing 4 types of distribution plots namely:
- joinplot
- distplot
- pairplot
- rugplot
Besides providing different kinds of visualization plots, seaborn also contains some built-in datasets. We will be using the tips dataset in this article. The “tips” dataset contains information about people who probably had food at a restaurant and whether or not they left a tip, their age, gender and so on. Lets have a look at it.
Code :
# import thr necessary libraries
import
seaborn as sns
import
matplotlib.pyplot as plt %
matplotlib inline
# to ignore the warnings
from
warnings import
filterwarnings
# load the dataset
df =
sns.load_dataset('tips')
# the first five entries of the dataset
df.head()
Now, lets proceed onto the plots.
Displot
It is used basically for univariant set of observations and visualizes it through a histogram i.e. only one observation and hence we choose one particular column of the dataset.
Syntax:
distplot(a[, bins, hist, kde, rug, fit, ...])
Example:
# set the background style of the plot
sns.set_style('whitegrid')
sns.distplot(df['total_bill'], kde =
False, color ='red', bins =
30)
Output:
Explanation:
- KDE stands for Kernel Density Estimation and that is another kind of the plot in seaborn.
- bins is used to set the number of bins you want in your plot and it actually depends on your dataset.
- color is used to specify the color of the plot
Now looking at this we can say that most of the total bill given lies between 10 and 20.
Joinplot
It is used to draw a plot of two variables with bivariate and univariate graphs. It basically combines two different plots.
Syntax:
jointplot(x, y[, data, kind, stat_func, ...])
Example:
sns.jointplot(x ='total_bill', y ='tip', data =
df)
Output:
sns.jointplot(x ='total_bill', y ='tip', data =
df, kind ='kde')
# KDE shows the density where the points match up the most
Explanation:
- kind is a variable that helps us play around with the fact as to how do you want to visualise the data.It helps to see whats going inside the joinplot. The default is scatter and can be hex, reg(regression) or kde.
- x and y are two strings that are the column names and the data that column contains is used by specifying the data parameter.
- here we can see tips on the y axis and total bill on the x axis as well as a linear relationship between the two that suggests that the total bill increases with the tips.
Pairplot
- hue sets up the categorical separation between the entries if the dataset.
- palette is used for designing the plots.
Seaborn | Categorical Plots
Plots are basically used for visualizing the relationship between variables. Those variables can be either be completely numerical or a category like a group, class or division. This article deals with categorical variables and how they can be visualized using the Seaborn library provided by Python.
Seaborn besides being a statistical plotting library also provides some default datasets. We will be using one such default dataset called ‘tips’. The ‘tips’ dataset contains information about people who probably had food at a restaurant and whether or not they left a tip for the waiters, their gender, whether they smoke and so on.
Let us have a look at the tips dataset.
Code
- Python3
# import the seaborn libaray
import
seaborn as sns
# import done to avoid warnings
from
warnings import
filterwarnings
# reading the dataset
df =
sns.load_dataset('tips')
# first five entries if the dataset
df.head()
Now lets proceed onto the plots so that we can how we can visualize these categorical variables.
Seaborn | Regression Plots
The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Regression plots as the name suggests creates a regression line between 2 parameters and helps to visualize their linear relationships. This article deals with those kinds of plots in seaborn and shows the ways that can be adapted to change the size, aspect, ratio etc. of such plots.
Seaborn is not only a visualization library but also a provider of built-in datasets. Here, we will be working with one of such datasets in seaborn named ‘tips’. The tips dataset contains information about the people who probably had food at the restaurant and whether or not they left a tip. It also provides information about the gender of the people, whether they smoke, day, time and so on.
Let us have a look at the dataset first before we start with the regression plots.
Load the dataset
# import the library
import
seaborn as sns
# load the dataset
dataset =
sns.load_dataset('tips')
# the first five entries of the dataset
dataset.head()
Output
Now let us begin with the regression plots in seaborn.
Regression plots in seaborn can be easily implemented with the help of the lmplot() function. lmplot() can be understood as a function that basically creates a linear model plot. lmplot() makes a very simple linear regression plot.It creates a scatter plot with a linear fit on top of it.
Simple linear plot
sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data =
dataset)
Output
Explanation
x and y parameters are specified to provide values for the x and y axes. sns.set_style() is used to have a grid in the background instead of a default white background. The data parameter is used to specify the source of information for drawing the plots.
Linear plot with additional parameters
sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data =
dataset,
hue ='sex', markers =['o', 'v'])
Output
Explanation
In order to have a better analysis capability using these plots, we can specify hue to have a categorical separation in our plot as well as use markers that come from the matplotlib marker symbols. Since we have two separate categories we need to pass in a list of symbols while specifying the marker.
Setting the size and color of the plot
sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data =
dataset, hue ='sex',
markers =['o', 'v'], scatter_kws ={'s':100},
palette ='plasma')
Output
Explanation
In this example what seabron is doing is that its calling the matplotlib parameters indirectly to affect the scatter plots. We specify a parameter called scatter_kws. We must note that the scatter_kws parameter changes the size of only the scatter plots and not the regression lines. The regression lines remain untouched. We also use the palette parameter to change the color of the plot.Rest of the things remain the same as explained in the first example.
Displaying multiple plots
sns.lmplot(x ='total_bill', y ='tip', data =
dataset,
col ='sex', row ='time', hue ='smoker')
Output
Explanation
In the above code, we draw multiple plots by specifying a separation with the help of the rows and columns. Each row contains the plots of tips vs the total bill for the different times specified in the dataset. Each column contains the plots of tips vs the total bill for the different genders. A further separation is done by specifying the hue parameter on the basis of whether the person smokes.
Size and aspect ratio of the plots
sns.lmplot(x ='total_bill', y ='tip', data =
dataset, col ='sex',
row ='time', hue ='smoker', aspect =
0.6,
size =
4, palette ='coolwarm')
Output
Explanation
Suppose we have a large number of plots in the output, we need to set the size and aspect for it in order to better visualize it.
aspect : scalar, optional specifies the aspect ratio of each facet, so that “aspect * height” gives the width of each facet in inches.
Grid Plot in Python using Seaborn
Grids are general types of plots that allow you to map plot types to grid rows and columns, which helps you to create similar character-separated plots. In this article, we will be using two different data sets (Iris and Tips) for demonstrating grid plots
Using Iris Dataset
We are going to use the Iris dataset which is a very famous dataset available as an in-built dataset. It contains a measurement of a bunch of different Irises(flowers). The dataset comprises of four measures: sepal distance, sepal weight, petal length, and petal width. We will be using the following code snippet for all the examples used below.
Code Snippet:
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
iris =
sns.load_dataset("iris")
iris.head()
# YOUR CODE HERE
Output:
Note: Place the below code snippets at the place of the YOUR CODE HERE.
Pair Plot
Pair Plot is like an automated joint plot for the entire dataset. In a sample, a pair plot maps a pairwise partnership. The pairplot()
method generates an Axes map, such that each data vector is spread over a single row in the y-axis and across a single column in the x-axis. That generates plots as shown below.
sns.pairplot(data = iris)
We use ‘hue’ to visualize independent color for each type of ‘species’ in the plot. And the palette is used for customizing colors of the plot as shown below.
sns.pairplot(iris, hue="species", palette="rainbow")
Pair Grid
We can customize pair plot by using seaborn’s PairGrid mechanism. PairGrid takes all the numerical columns and grids them up making subplots as shown below.
sns.PairGrid(data = iris)
We can map to the grid by calling ‘.map’ off of it. Over here we have called the scatter plot by ‘.scatter’. Now all the grids will be of scatter plot kind. We can define the type of plot we want in the grids using the ‘.map’.
g = sns.PairGrid(iris)
g.map(plt.scatter) #All the grids plot scatter plot
To map into the upper grid, lower grid and diagonal grid we call ‘.map_upper’, ‘.map_lower’, ‘.map_diag’ off of the PairGrid. Over here we can see the diagonals are ‘hist’ type plot, the upper grid is’scatter’ type plot and the lower grid is ‘kde’ type plot.
g = sns.PairGrid(iris)
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)
Using Tips Dataset
We are going to use another in-built dataset. The dataset comprises of seven features: total bill, tip, sex, smoker, day, time, size. We will be using the following code snippet for all the examples used below.
Code Snippet:
- Python3
import
seaborn as sns
import
matplotlib.pyplot as plt
tips =
sns.load_dataset("tips")
tips.head()
# YOUR CODE HERE
Output:
Note: Place the below code snippets at the place of the YOUR CODE HERE.
FacetGrid
FacetGrid is a general way of creating plot grids based on a function. Its object uses the dataframe as input and the names of the variables that shape the row, column, or color dimensions of the grid.
Over here we can see, as there are two types of values in the smoker column which is smoker = No and smoker = Yes so this creates two rows in grid one for smoker = Yes and other for smoker = No. For the columns, as there are two types of values in time column which is time = Lunch and time = Dinner so this creates two columns in grid one for time = Lunch and other for time = Dinner.
g = sns.FacetGrid(tips, col="time", row="smoker")
The total bills are plotted as hist across the grid we created using’map’.
g = sns.FacetGrid(tips, col="time", row="smoker")
g = g.map(plt.hist, "total_bill")
Over here we have defined hue as sex and also plotted a scatter plot where X-axis is total_bill and Y-axis is yip..
g = sns.FacetGrid(tips, col="time", row="smoker", hue='sex')
g = g.map(plt.scatter, "total_bill", "tip").add_legend()
Joint Grid
JointGrid is the general version for the grid type of jointplot()
. Jointplot by Seaborn shows a relationship in the margins between 2 variables (bivariate) and 1D profiles (univariate). This plot is a product form that wraps up the JointGrid.
g = sns.JointGrid(x="total_bill", y="tip", data=tips)
g = sns.JointGrid(x="total_bill", y="tip", data=tips)
g = g.plot(sns.regplot, sns.distplot)
ML | Matrix plots in Seaborn
Seaborn is a wonderful visualization library provided by python. It has several kinds of plots through which it provides the amazing visualization capabilities. Some of them include count plot, scatter plot, pair plots, regression plots, matrix plots and much more. This article deals with the matrix plots in seaborn.
Example 1: Heatmaps
Heatmap is a way to show some sort of matrix plot. To use a heatmap the data should be in a matrix form. By matrix we mean that the index name and the column name must match in some way so that the data that we fill inside the cells are relevant. Lets look at an example to understand this better.
Code : Python program
# import the necessary libraries
import
seaborn as sns
import
matplotlib.pyplot as plt %
matplotlib inline
# load the tips dataset
dataset =
sns.load_dataset('tips')
# first five entries of the tips dataset
dataset.head()
# correlation between the different parameters
tc =
dataset.corr()
# plot a heatmap of the correlated data
sns.heatmap(tc)
The first five entries of the dataset
The correlation matrix
Heatmap of the correlated matrix
Inorder to obatin a better visualisation with the heatmap, we can add the parameters such as annot, linewidth and line colour.
# import the necessary libraries
import
seaborn as sns
import
matplotlib.pyplot as plt %
matplotlib inline
# load the tips dataset
dataset =
sns.load_dataset('tips')
# first five entries of the tips dataset
dataset.head()
# correlation between the different parameters
tc =
dataset.corr()
sns.heatmap(tc, annot =
True, cmap ='plasma',
linecolor ='black', linewidths =
1)
Explanation
- annot is used to annotate the actual value that belongs to these cells
- cmap is used for the colour mapping you want like coolwarm, plasma, magma etc.
- linewidth is used to set the width of the lines separating the cells.
- linecolor is used to set the colour of the lines separating the cells.
Here is a plot that shows those attributes.
So we can say that all a heatmap does is color the cells based on the gradient and uses some parameters to increase the data visualizaion.
Example 2: Cluster maps
Cluster maps use hierarchical clustering. It performs the clustering based on the similarity of the rows and columns.
# import the necessary libraries
import
pandas as pd
import
seaborn as sns
import
matplotlib.pyplot as plt %
matplotlib inline
# load the flights dataset
fd =
sns.load_dataset('flights')
# make a dataframe of the data
df =
pd.pivot_table(values ='passengers', index ='month',
columns ='year', data =
fd)
# first five entries of the dataset
df.head()
# make a clustermap from the dataset
sns.clustermap(df, cmap ='plasma')
The first five entries of the dataset
The matrix created using the pivot table(first five entries)
Clustermap from the given data
We can also change the scale of the color bar by using the standard_scale parameter.
# import the necessary libraries
import
pandas as pd
import
seaborn as sns
import
matplotlib.pyplot as plt %
matplotlib inline
# load the flights dataset
fd =
sns.load_dataset('flights')
# make a dataframe of the data
df =
pd.pivot_table(values ='passengers',
index ='month', columns ='year', data =
fd)
# first five entries of the dataset
df.head()
# make a clustermap from the dataset
sns.clustermap(df, cmap ='plasma', standard_scale =
1)
Clustermap after using standard scaling
standard_scale = 1 normalises the data from 0 to 1 range. We can see that the months as well as years are no longer in order as they are clustered according to the similarity in case of clustermaps.
So we can conclude that a heatmap will display things in the order we give whereas the cluster map clusters the data based on similarity.