Seaborn for Data Visualization

Introduction to Seaborn for Data Visualization

Learn Seaborn step by step with outputs

🧠 What is Seaborn?

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn helps you explore and understand your data through visualizations that are both aesthetically pleasing and statistically meaningful.

While Matplotlib gives you full control over your visualizations, Seaborn provides a simpler, more intuitive interface with beautiful default styles and color palettes.

🎯 Key Features of Seaborn

Built-in themes and color palettes: Seaborn comes with attractive default styles that make your visualizations look professional with minimal effort.
Statistical estimation: Many Seaborn functions automatically perform statistical estimations and plot the results.
Dataset-oriented API: Functions work directly with Pandas DataFrames, making it easy to visualize your data.
Complex visualizations simplified: Create complex visualizations like heatmaps, pair plots, and categorical plots with just a few lines of code.
Integration with Matplotlib: Seaborn works seamlessly with Matplotlib, allowing you to customize your plots further when needed.

1. Installing Seaborn

You can install Seaborn using pip:

pip install seaborn

2. Importing Seaborn

First, let's import the necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style
sns.set_theme(style="whitegrid")  # Other options: darkgrid, whitegrid, dark, white, ticks

3. Working with Built-in Datasets

Seaborn comes with several built-in datasets that are great for learning and experimentation. Let's start by exploring the iris dataset:

import seaborn as sns

# Load the iris dataset
iris = sns.load_dataset("iris")

# Display the first 5 rows
print(iris.head())

Output:

    sepal_length  sepal_width  petal_length  petal_width species 

0           5.1          3.5           1.4          0.2  setosa 

1           4.9          3.0           1.4          0.2  setosa

2           4.7          3.2           1.3          0.2  setosa

3           4.6          3.1           1.5          0.2  setosa

4           5.0          3.6           1.4          0.2  setosa

3.1 Basic Scatter Plot with Iris Dataset

import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a scatter plot
sns.scatterplot(x="sepal_length", y="sepal_width", data=iris)
plt.title("Scatter Plot of Sepal Length vs Width")
plt.show()

Output:

3.2 Line Plot with Iris Dataset

import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a line plot
sns.lineplot(x="sepal_length", y="sepal_width", data=iris)
plt.title("Line Plot of Sepal Length vs Width")
plt.show()

Output:

3.3 Bar Plot with Iris Dataset

import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a bar plot
sns.barplot(x="species", y="petal_length", data=iris)
plt.title("Average Petal Length by Species")
plt.show()

Output:

3.4 Correlation Heatmap with Iris Dataset

import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(iris.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix of Iris Features")
plt.tight_layout()
plt.show()

Output:

4. More Advanced Line Plot

Let's explore a more complex line plot using Seaborn's lineplot() function:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create some data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a DataFrame
df = pd.DataFrame({'x': x, 'y': y})

# Create a line plot
plt.figure(figsize=(10, 6))
sns.lineplot(x='x', y='y', data=df)
plt.title('Sine Wave')
plt.xlabel('X value')
plt.ylabel('Sine of X')
plt.show()

Output:

Explanation

sns.lineplot() creates a line plot from the data.
The x and y parameters specify which columns to use for the x and y axes.
The data parameter specifies the DataFrame containing the data.
Seaborn automatically adds confidence intervals around the line (the shaded area).

4. Scatter Plot with Seaborn

Scatter plots are great for showing the relationship between two variables:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create some random data
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)

# Create a DataFrame
df = pd.DataFrame({'x': x, 'y': y})

# Create a scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', data=df)
plt.title('Scatter Plot Example')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()

Output:

5. Adding a Third Variable with Hue

One of Seaborn's strengths is the ability to easily visualize multiple dimensions of data. Let's add a categorical variable using the hue parameter:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create data with a categorical variable
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)
categories = np.random.choice(['A', 'B', 'C'], size=n)

# Create a DataFrame
df = pd.DataFrame({
    'x': x,
    'y': y,
    'category': categories
})

# Create a scatter plot with hue
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', hue='category', data=df)
plt.title('Scatter Plot with Categories')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()

Output:

Explanation

The hue parameter assigns different colors to different categories.
Seaborn automatically creates a legend showing which color corresponds to which category.
This allows you to visualize three dimensions of data in a 2D plot.

Try This Code Yourself & See the Result

Add Size Parameter to Scatter Plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create data with categorical variable and size variable
np.random.seed(42)
n = 100
x = np.random.normal(size=n)
y = 2 * x + np.random.normal(size=n)
categories = np.random.choice(['A', 'B', 'C'], size=n)
sizes = np.random.randint(10, 100, size=n)

# Create a DataFrame
df = pd.DataFrame({
    'x': x,
    'y': y,
    'category': categories,
    'size': sizes
})

# Create a scatter plot with hue and size
plt.figure(figsize=(10, 6))
sns.scatterplot(x='x', y='y', hue='category', size='size', data=df)
plt.title('Scatter Plot with Categories and Sizes')
plt.xlabel('X variable')
plt.ylabel('Y variable')
plt.show()

6. Distribution Plots

Seaborn excels at visualizing distributions. Let's look at some examples:

6.1 Histogram (distplot/histplot)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data
np.random.seed(42)
data = np.random.normal(size=1000)

# Create a histogram
plt.figure(figsize=(10, 6))
sns.histplot(data, kde=True, bins=30)
plt.title('Histogram with KDE')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Output:

Explanation

sns.histplot() creates a histogram of the data.
The kde=True parameter adds a Kernel Density Estimate curve, which is a smoothed version of the histogram.
The bins parameter controls how many bins to use for the histogram.

Try This Code Yourself & See the Result

Create a Histogram with Multiple Distributions:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Generate two different distributions
np.random.seed(42)
group1 = np.random.normal(0, 1, 1000)
group2 = np.random.normal(2, 1.5, 1000)

# Combine into a DataFrame
df = pd.DataFrame({
    'value': np.concatenate([group1, group2]),
    'group': np.repeat(['Group 1', 'Group 2'], 1000)
})

# Create a histogram with multiple distributions
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='value', hue='group', kde=True, alpha=0.5)
plt.title('Histogram of Two Distributions')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

6.2 Box Plot

Box plots show the distribution of a variable across different categories:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create data for different categories
np.random.seed(42)
categories = ['A', 'B', 'C', 'D']
data = []
for i, cat in enumerate(categories):
    # Create different distributions for each category
    values = np.random.normal(loc=i*2, scale=0.5+i*0.2, size=100)
    for val in values:
        data.append({'category': cat, 'value': val})

# Convert to DataFrame
df = pd.DataFrame(data)

# Create a box plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Box Plot by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Output:

Explanation

The box shows the quartiles of the dataset (25th, 50th/median, and 75th percentiles).
The whiskers extend to show the rest of the distribution, except for outliers which are plotted as individual points.
This is useful for comparing distributions across categories and identifying outliers.

Try This Code Yourself & See the Result

Create a Box Plot with Notches:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create data for different categories
np.random.seed(42)
categories = ['A', 'B', 'C', 'D']
data = []
for i, cat in enumerate(categories):
    # Create different distributions for each category
    values = np.random.normal(loc=i*2, scale=0.5+i*0.2, size=100)
    for val in values:
        data.append({'category': cat, 'value': val})

# Convert to DataFrame
df = pd.DataFrame(data)

# Create a box plot with notches
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df, notch=True)
plt.title('Box Plot with Notches')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

6.3 Violin Plot

Violin plots are similar to box plots but show the full distribution:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Using the same data as the box plot
# Create a violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(x='category', y='value', data=df)
plt.title('Violin Plot by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Output:

Explanation

Violin plots combine aspects of box plots and KDE plots.
The width of the "violin" at each point represents the density of the data at that value.
This gives you more information about the distribution shape than a box plot.

Try This Code Yourself & See the Result

Create a Split Violin Plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the tips dataset
tips = sns.load_dataset('tips')

# Create a split violin plot
plt.figure(figsize=(12, 8))
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, split=True, palette='pastel')
plt.title('Split Violin Plot by Day and Gender')
plt.xlabel('Day of the Week')
plt.ylabel('Total Bill Amount')
plt.show()

7. Heatmap

Heatmaps are great for visualizing matrices of data, such as correlation matrices:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create a correlation matrix
np.random.seed(42)
data = np.random.randn(100, 5)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])
corr = df.corr()

# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix Heatmap')
plt.show()

Output:

Explanation

annot=True adds the numerical values to each cell.
cmap='coolwarm' sets the color map (blue for negative, red for positive).
vmin=-1, vmax=1 sets the range for the color mapping (correlation values range from -1 to 1).

Try This Code Yourself & See the Result

Create a Customized Heatmap:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the iris dataset
iris = sns.load_dataset('iris')

# Calculate correlation matrix
corr = iris.drop('species', axis=1).corr()

# Create a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

# Create a customized heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr, 
            mask=mask,  # Only show the lower triangle
            annot=True, 
            fmt='.2f',  # Format annotations to 2 decimal places
            cmap='viridis',
            linewidths=0.5,
            cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix Heatmap (Lower Triangle)')
plt.tight_layout()
plt.show()

8. Pair Plot

Pair plots show pairwise relationships between variables in a dataset:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the iris dataset as an example
iris = sns.load_dataset('iris')

# Create a pair plot
sns.pairplot(iris, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()

Output:

Explanation

Pair plots create a grid of relationships between each pair of variables.
The diagonal shows the distribution of each variable.
The off-diagonal cells show scatter plots of each pair of variables.
The hue parameter colors the points by the specified categorical variable.

Try This Code Yourself & See the Result

Create a Customized Pair Plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the iris dataset
iris = sns.load_dataset('iris')

# Create a customized pair plot
g = sns.pairplot(
    iris, 
    hue='species',
    diag_kind='kde',  # Use KDE plots on the diagonal
    plot_kws={'alpha': 0.6},  # Make scatter points semi-transparent
    diag_kws={'fill': True},  # Fill the KDE plots
    palette='viridis'  # Use the viridis color palette
)

# Add a title
g.fig.suptitle('Customized Pair Plot of Iris Dataset', y=1.02, fontsize=16)

# Adjust the layout
plt.tight_layout()
plt.show()

9. Categorical Plots

Seaborn provides specialized plots for categorical data:

9.1 Bar Plot

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create a DataFrame with categorical data
data = {
    'category': ['A', 'B', 'C', 'D', 'E'],
    'value': [5, 7, 3, 9, 4]
}
df = pd.DataFrame(data)

# Create a bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x='category', y='value', data=df)
plt.title('Bar Plot Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Output:

9.2 Count Plot

Count plots show the counts of observations in each categorical bin:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create data with repeated categories
data = {
    'category': np.random.choice(['A', 'B', 'C', 'D'], size=100)
}
df = pd.DataFrame(data)

# Create a count plot
plt.figure(figsize=(10, 6))
sns.countplot(x='category', data=df)
plt.title('Count Plot Example')
plt.xlabel('Category')
plt.ylabel('Count')
plt.show()

Output:

10. Customizing Seaborn Plots

Seaborn offers several ways to customize your plots:

10.1 Setting Themes

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Available themes: darkgrid, whitegrid, dark, white, ticks
sns.set_theme(style="darkgrid")

# Create a simple plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Plot with Dark Grid Theme')
plt.xlabel('X value')
plt.ylabel('Y value')
plt.show()

Output:

10.2 Using Color Palettes

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set a color palette
# Available palettes: deep, muted, pastel, bright, dark, colorblind
sns.set_palette("pastel")

# Create a categorical plot
categories = ['A', 'B', 'C', 'D', 'E']
values = [5, 7, 3, 9, 4]
plt.figure(figsize=(10, 6))
sns.barplot(x=categories, y=values)
plt.title('Bar Plot with Pastel Palette')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Output:

Quiz Time

Seaborn Practice Questions

🧠 Beginner-Level Questions

Create a simple line plot using Seaborn with random data.
Load the 'tips' dataset from Seaborn and create a scatter plot of 'total_bill' vs 'tip'.
Create a histogram of the 'tip' column from the tips dataset.
Create a box plot showing the distribution of 'tip' by 'day' from the tips dataset.
Create a violin plot showing the distribution of 'total_bill' by 'day' from the tips dataset.
Create a count plot showing the frequency of each 'day' in the tips dataset.
Create a heatmap of the correlation matrix for the iris dataset.
Create a pair plot for the iris dataset with points colored by species.
Create a bar plot showing the average 'tip' by 'day' from the tips dataset.
Change the theme of your plots to 'darkgrid' and create any plot of your choice.

💡 Need help? See the GitHub Page For AnswersGitHub.

Tip: Try running these codes in your Python editor or Google Colab.

Introduction to Seaborn for Data Visualization

🧠 What is Seaborn?

🎯 Key Features of Seaborn

1. Installing Seaborn

You can install Seaborn using pip:

2. Importing Seaborn

3. Working with Built-in Datasets

Output:

3.1 Basic Scatter Plot with Iris Dataset

Output:

3.2 Line Plot with Iris Dataset

Output:

3.3 Bar Plot with Iris Dataset

Output:

3.4 Correlation Heatmap with Iris Dataset

Output:

4. More Advanced Line Plot

Output:

Explanation

4. Scatter Plot with Seaborn

Output:

5. Adding a Third Variable with Hue

Output:

Explanation

Try This Code Yourself & See the Result

Add Size Parameter to Scatter Plot:

6. Distribution Plots

6.1 Histogram (distplot/histplot)

Output:

Explanation

Try This Code Yourself & See the Result

Create a Histogram with Multiple Distributions:

6.2 Box Plot

Output:

Explanation

Try This Code Yourself & See the Result

Create a Box Plot with Notches:

6.3 Violin Plot

Output:

Explanation

Try This Code Yourself & See the Result

Create a Split Violin Plot:

7. Heatmap

Output:

Explanation

Try This Code Yourself & See the Result

Create a Customized Heatmap:

8. Pair Plot

Output:

Explanation

Try This Code Yourself & See the Result

Create a Customized Pair Plot:

9. Categorical Plots

9.1 Bar Plot

Output:

9.2 Count Plot

Output:

10. Customizing Seaborn Plots

10.1 Setting Themes

Output:

10.2 Using Color Palettes

Output:

Quiz Time

1. What is Seaborn?

2. Which function would you use to create a scatter plot in Seaborn?

3. What does the 'hue' parameter do in Seaborn plots?

4. Which Seaborn plot is best for visualizing the distribution of a single variable?

5. What is a violin plot in Seaborn?

6. Which Seaborn function would you use to create a correlation matrix heatmap?

7. What is a pair plot in Seaborn?

8. How do you change the style of Seaborn plots?

9. Which of these is NOT a built-in dataset in Seaborn?

10. What's the relationship between Matplotlib and Seaborn?

Seaborn Practice Questions

🧠 Beginner-Level Questions

Post a Comment

POST ADS1