Introduction to Seaborn for Data Visualization
Learn Seaborn step by step with outputs

🧠What is Seaborn?
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn helps you explore and understand your data through visualizations that are both aesthetically pleasing and statistically meaningful.
While Matplotlib gives you full control over your visualizations, Seaborn provides a simpler, more intuitive interface with beautiful default styles and color palettes.
🎯 Key Features of Seaborn
- Built-in themes and color palettes: Seaborn comes with attractive default styles that make your visualizations look professional with minimal effort.
- Statistical estimation: Many Seaborn functions automatically perform statistical estimations and plot the results.
- Dataset-oriented API: Functions work directly with Pandas DataFrames, making it easy to visualize your data.
- Complex visualizations simplified: Create complex visualizations like heatmaps, pair plots, and categorical plots with just a few lines of code.
- Integration with Matplotlib: Seaborn works seamlessly with Matplotlib, allowing you to customize your plots further when needed.
1. Installing Seaborn
You can install Seaborn using pip:
pip install seaborn
2. Importing Seaborn
First, let's import the necessary libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set the style
sns.set_theme(style="whitegrid") # Other options: darkgrid, whitegrid, dark, white, ticks
3. Working with Built-in Datasets
Seaborn comes with several built-in datasets that are great for learning and experimentation. Let's start by exploring the iris dataset:
import seaborn as sns
# Load the iris dataset
iris = sns.load_dataset("iris")
# Display the first 5 rows
print(iris.head())
Output:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
3.1 Basic Scatter Plot with Iris Dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Load the iris dataset
iris = sns.load_dataset("iris")
# Create a scatter plot
sns.scatterplot(x="sepal_length", y="sepal_width", data=iris)
plt.title("Scatter Plot of Sepal Length vs Width")
plt.show()
Output:
3.2 Line Plot with Iris Dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Load the iris dataset
iris = sns.load_dataset("iris")
# Create a line plot
sns.lineplot(x="sepal_length", y="sepal_width", data=iris)
plt.title("Line Plot of Sepal Length vs Width")
plt.show()
Output:
3.3 Bar Plot with Iris Dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Load the iris dataset
iris = sns.load_dataset("iris")
# Create a bar plot
sns.barplot(x="species", y="petal_length", data=iris)
plt.title("Average Petal Length by Species")
plt.show()
Output:
3.4 Correlation Heatmap with Iris Dataset
import seaborn as sns
import matplotlib.pyplot as plt
# Load the iris dataset
iris = sns.load_dataset("iris")
# Create a correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(iris.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix of Iris Features")
plt.tight_layout()
plt.show()
Output:
4. More Advanced Line Plot
Let's explore a more complex line plot using Seaborn's lineplot()
function:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create some data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create a DataFrame
df = pd.DataFrame({'x': x, 'y': y})
# Create a line plot
plt.figure(figsize=(10, 6))
sns.lineplot(x='x', y='y', data=df)
plt.title('Sine Wave')
plt.xlabel('X value')
plt.ylabel('Sine of X')
plt.show()
Output:
Explanation
-
sns.lineplot()
creates a line plot from the data. - The
x
andy
parameters specify which columns to use for the x and y axes. - The
data
parameter specifies the DataFrame containing the data. - Seaborn automatically adds confidence intervals around the line (the shaded area).
4. Scatter Plot with Seaborn
Scatter plots are great for showing the relationship between two variables:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create some random data
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)
# Create a DataFrame
df = pd.DataFrame({'x': x, 'y': y})
# Create a scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', data=df)
plt.title('Scatter Plot Example')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()
Output:
5. Adding a Third Variable with Hue
One of Seaborn's strengths is the ability to easily visualize multiple dimensions of data. Let's add a categorical variable using the hue
parameter:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create data with a categorical variable
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)
categories = np.random.choice(['A', 'B', 'C'], size=n)
# Create a DataFrame
df = pd.DataFrame({
'x': x,
'y': y,
'category': categories
})
# Create a scatter plot with hue
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', hue='category', data=df)
plt.title('Scatter Plot with Categories')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()
Output:
Explanation
- The
hue
parameter assigns different colors to different categories. - Seaborn automatically creates a legend showing which color corresponds to which category.
- This allows you to visualize three dimensions of data in a 2D plot.
Try This Code Yourself & See the Result
Add Size Parameter to Scatter Plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create data with categorical variable and size variable
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)
categories = np.random.choice(['A', 'B', 'C'], size=n)
sizes = np.random.randint(10, 100, size=n)
# Create a DataFrame
df = pd.DataFrame({
'x': x,
'y': y,
'category': categories,
'size': sizes
})
# Create a scatter plot with hue and size
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', hue='category', size='size', data=df)
plt.title('Scatter Plot with Categories and Sizes')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()
6. Distribution Plots
Seaborn excels at visualizing distributions. Let's look at some examples:
6.1 Histogram (distplot/histplot)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Generate random data
np.random.seed(42)
data = np.random.normal(size=1000)
# Create a histogram
plt.figure(figsize=(10, 6))
sns.histplot(data, kde=True, bins=30)
plt.title('Histogram with KDE')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
Output:
Explanation
sns.histplot()
creates a histogram of the data.- The
kde=True
parameter adds a Kernel Density Estimate curve, which is a smoothed version of the histogram. - The
bins
parameter controls how many bins to use for the histogram.
Try This Code Yourself & See the Result
Create a Histogram with Multiple Distributions:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Generate two different distributions
np.random.seed(42)
group1 = np.random.normal(0, 1, 1000)
group2 = np.random.normal(2, 1.5, 1000)
# Combine into a DataFrame
df = pd.DataFrame({
'value': np.concatenate([group1, group2]),
'group': np.repeat(['Group 1', 'Group 2'], 1000)
})
# Create a histogram with multiple distributions
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='value', hue='group', kde=True, alpha=0.5)
plt.title('Histogram of Two Distributions')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
6.2 Box Plot
Box plots show the distribution of a variable across different categories:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create data for different categories
np.random.seed(42)
categories = ['A', 'B', 'C', 'D']
data = []
for i, cat in enumerate(categories):
# Create different distributions for each category
values = np.random.normal(loc=i*2, scale=0.5+i*0.2, size=100)
for val in values:
data.append({'category': cat, 'value': val})
# Convert to DataFrame
df = pd.DataFrame(data)
# Create a box plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Box Plot by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Output:
Explanation
- The box shows the quartiles of the dataset (25th, 50th/median, and 75th percentiles).
- The whiskers extend to show the rest of the distribution, except for outliers which are plotted as individual points.
- This is useful for comparing distributions across categories and identifying outliers.
Try This Code Yourself & See the Result
Create a Box Plot with Notches:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create data for different categories
np.random.seed(42)
categories = ['A', 'B', 'C', 'D']
data = []
for i, cat in enumerate(categories):
# Create different distributions for each category
values = np.random.normal(loc=i*2, scale=0.5+i*0.2, size=100)
for val in values:
data.append({'category': cat, 'value': val})
# Convert to DataFrame
df = pd.DataFrame(data)
# Create a box plot with notches
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df, notch=True)
plt.title('Box Plot with Notches')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
6.3 Violin Plot
Violin plots are similar to box plots but show the full distribution:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Using the same data as the box plot
# Create a violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(x='category', y='value', data=df)
plt.title('Violin Plot by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Output:
Explanation
- Violin plots combine aspects of box plots and KDE plots.
- The width of the "violin" at each point represents the density of the data at that value.
- This gives you more information about the distribution shape than a box plot.
Try This Code Yourself & See the Result
Create a Split Violin Plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the tips dataset
tips = sns.load_dataset('tips')
# Create a split violin plot
plt.figure(figsize=(12, 8))
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, split=True, palette='pastel')
plt.title('Split Violin Plot by Day and Gender')
plt.xlabel('Day of the Week')
plt.ylabel('Total Bill Amount')
plt.show()
7. Heatmap
Heatmaps are great for visualizing matrices of data, such as correlation matrices:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create a correlation matrix
np.random.seed(42)
data = np.random.randn(100, 5)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])
corr = df.corr()
# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix Heatmap')
plt.show()
Output:
Explanation
annot=True
adds the numerical values to each cell.cmap='coolwarm'
sets the color map (blue for negative, red for positive).vmin=-1, vmax=1
sets the range for the color mapping (correlation values range from -1 to 1).
Try This Code Yourself & See the Result
Create a Customized Heatmap:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the iris dataset
iris = sns.load_dataset('iris')
# Calculate correlation matrix
corr = iris.drop('species', axis=1).corr()
# Create a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Create a customized heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr,
mask=mask, # Only show the lower triangle
annot=True,
fmt='.2f', # Format annotations to 2 decimal places
cmap='viridis',
linewidths=0.5,
cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix Heatmap (Lower Triangle)')
plt.tight_layout()
plt.show()
8. Pair Plot
Pair plots show pairwise relationships between variables in a dataset:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the iris dataset as an example
iris = sns.load_dataset('iris')
# Create a pair plot
sns.pairplot(iris, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()
Output:
Explanation
- Pair plots create a grid of relationships between each pair of variables.
- The diagonal shows the distribution of each variable.
- The off-diagonal cells show scatter plots of each pair of variables.
- The
hue
parameter colors the points by the specified categorical variable.
Try This Code Yourself & See the Result
Create a Customized Pair Plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the iris dataset
iris = sns.load_dataset('iris')
# Create a customized pair plot
g = sns.pairplot(
iris,
hue='species',
diag_kind='kde', # Use KDE plots on the diagonal
plot_kws={'alpha': 0.6}, # Make scatter points semi-transparent
diag_kws={'fill': True}, # Fill the KDE plots
palette='viridis' # Use the viridis color palette
)
# Add a title
g.fig.suptitle('Customized Pair Plot of Iris Dataset', y=1.02, fontsize=16)
# Adjust the layout
plt.tight_layout()
plt.show()
9. Categorical Plots
Seaborn provides specialized plots for categorical data:
9.1 Bar Plot
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create a DataFrame with categorical data
data = {
'category': ['A', 'B', 'C', 'D', 'E'],
'value': [5, 7, 3, 9, 4]
}
df = pd.DataFrame(data)
# Create a bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x='category', y='value', data=df)
plt.title('Bar Plot Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Output:
9.2 Count Plot
Count plots show the counts of observations in each categorical bin:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create data with repeated categories
data = {
'category': np.random.choice(['A', 'B', 'C', 'D'], size=100)
}
df = pd.DataFrame(data)
# Create a count plot
plt.figure(figsize=(10, 6))
sns.countplot(x='category', data=df)
plt.title('Count Plot Example')
plt.xlabel('Category')
plt.ylabel('Count')
plt.show()
Output:
10. Customizing Seaborn Plots
Seaborn offers several ways to customize your plots:
10.1 Setting Themes
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Available themes: darkgrid, whitegrid, dark, white, ticks
sns.set_theme(style="darkgrid")
# Create a simple plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Plot with Dark Grid Theme')
plt.xlabel('X value')
plt.ylabel('Y value')
plt.show()
Output:
10.2 Using Color Palettes
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set a color palette
# Available palettes: deep, muted, pastel, bright, dark, colorblind
sns.set_palette("pastel")
# Create a categorical plot
categories = ['A', 'B', 'C', 'D', 'E']
values = [5, 7, 3, 9, 4]
plt.figure(figsize=(10, 6))
sns.barplot(x=categories, y=values)
plt.title('Bar Plot with Pastel Palette')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Output:
Quiz Time
1. What is Seaborn?
2. Which function would you use to create a scatter plot in Seaborn?
3. What does the 'hue' parameter do in Seaborn plots?
4. Which Seaborn plot is best for visualizing the distribution of a single variable?
5. What is a violin plot in Seaborn?
6. Which Seaborn function would you use to create a correlation matrix heatmap?
7. What is a pair plot in Seaborn?
8. How do you change the style of Seaborn plots?
9. Which of these is NOT a built-in dataset in Seaborn?
10. What's the relationship between Matplotlib and Seaborn?
Seaborn Practice Questions
🧠Beginner-Level Questions
- Create a simple line plot using Seaborn with random data.
- Load the 'tips' dataset from Seaborn and create a scatter plot of 'total_bill' vs 'tip'.
- Create a histogram of the 'tip' column from the tips dataset.
- Create a box plot showing the distribution of 'tip' by 'day' from the tips dataset.
- Create a violin plot showing the distribution of 'total_bill' by 'day' from the tips dataset.
- Create a count plot showing the frequency of each 'day' in the tips dataset.
- Create a heatmap of the correlation matrix for the iris dataset.
- Create a pair plot for the iris dataset with points colored by species.
- Create a bar plot showing the average 'tip' by 'day' from the tips dataset.
- Change the theme of your plots to 'darkgrid' and create any plot of your choice.
💡 Need help? See the GitHub Page For AnswersGitHub.
Tip: Try running these codes in your Python editor or Google Colab.