Pandas in Python

Pandas Operations for Beginners

Pandas Operations for Beginners

Learn Pandas step by step with outputs


Pandas Logo

๐Ÿง  What is Pandas?

Pandas is a powerful open-source Python library used for data manipulation and analysis.

It provides fast, flexible, and expressive data structures like DataFrame and Series to work with structured data easily.

Pandas is widely used in data science, machine learning, and scientific computing.

1. Install and Import Pandas

First, install pandas and import it in your Python script.

pip install pandas
import pandas as pd

2. Creating a Series

A Series is a one-dimensional labeled array that can hold any data type. It’s like a column in a spreadsheet or database table. Here, we create a Series with three integers. The left side shows the index (default is 0,1,2...), and the right side shows the values. Use case: Great for simple datasets where you want to label or index a list of values.

s = pd.Series([10, 20, 30]) print(s)
0 10
1 20
2 30
dtype: int64

3. Creating DataFrames

A DataFrame is a 2D tabular structure with labeled rows and columns, similar to a spreadsheet or SQL table. Here, we create a DataFrame from a dictionary where keys become column names and values are lists of column data. Use case: Most common Pandas structure used for complex datasets with multiple columns.

data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Paris', 'London'] } df = pd.DataFrame(data) print(df)
Name Age City
0 Alice 25 New York
1 Bob 30 Paris
2 Charlie 35 London

From List of Lists

You can also create a DataFrame from a list of lists, specifying the column names manually. This is useful when data comes in tabular form but without column labels. Use case: Quick DataFrame creation from raw nested list data.

data = [['Alice', 25], ['Bob', 30], ['Charlie', 35]] df = pd.DataFrame(data, columns=['Name', 'Age']) print(df)
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35

4. Accessing Columns and Rows

Accessing a single column returns a Series (like one spreadsheet column). Accessing multiple columns returns a DataFrame. Use case: Retrieve specific columns for analysis or display.

Access Columns

print(df['Name'])
Name
0 Alice
1 Bob
2 Charlie
dtype: object

print(df[['Name', 'Age']])
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35

Access Rows

  • .loc[] accesses rows by their index labels (can be non-numeric).
  • .iloc[] accesses rows by integer position (0-based index).
Use case: Retrieve specific rows for detailed inspection or operations.

print(df.loc[0]) # By label (first row)
0
Name Alice
Age 25
dtype: object

print(df.iloc[1]) # By position (second row)
1
Name Bob
Age 30
dtype: object

5. Adding Columns and Rows

Add a Column

You can add a new column by assigning a list or Series to a new column name. Use case: Adding computed or additional data attributes to existing datasets.

df['Salary'] = [50000, 60000, 70000] print(df)
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

Add a Row

Rows can be added by concatenating the original DataFrame with a new DataFrame containing the row(s). Note: Missing columns in either frame result in NaN (missing data). Use case: Dynamically growing datasets or appending new observations.

new_row = pd.DataFrame({'Name': ['David'], 'Age': [40], 'City': ['Berlin']}) df = pd.concat([df, new_row], ignore_index=True) print(df)
Name Age City Salary
0 Alice 25.0 NaN 50000.0
1 Bob 30.0 NaN 60000.0
2 Charlie 35.0 NaN 70000.0
3 David 40.0 Berlin NaN

6. Removing Columns and Rows

drop() removes rows or columns by label. Use axis=1 to drop columns, axis=0 for rows. inplace=True updates the DataFrame directly without needing assignment. Use case: Cleaning datasets by removing unnecessary columns or rows.

Remove Column

df.drop('City', axis=1, inplace=True) print(df)
Name Age Salary
0 Alice 25.0 50000.0
1 Bob 30.0 60000.0
2 Charlie 35.0 70000.0
3 David 40.0 NaN

Remove Row

Here, row with index label 0 is removed. Use case: Removing invalid or unwanted rows.

df.drop(0, axis=0, inplace=True) # Drop first row print(df)
Name Age Salary
1 Bob 30.0 60000.0
2 Charlie 35.0 70000.0
3 David 40.0 NaN

7. Data Analysis

.head() shows first 5 rows by default (or fewer if data is smaller). .tail(n) shows last n rows. Use case: Quickly preview the dataset for validation or inspection.

Head

print(df.head())
Name Age Salary
1 Bob 30.0 60000.0
2 Charlie 35.0 70000.0
3 David 40.0 NaN

Tail

print(df.tail(2))
Name Age Salary
2 Charlie 35.0 70000.0
3 David 40.0 NaN

Info and Describe

  • .info() shows summary of DataFrame including data types and non-null counts.
  • .describe() computes statistics (count, mean, std, min, quartiles, max) for numeric columns.
Use case: Data understanding and quality checks.

print(df.info())
(class 'pandas.core.frame.DataFrame)
Int64Index: 3 entries, 1 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 3 non-null object
1 Age 3 non-null float64
2 Salary 2 non-null float64
dtypes: float64(2), object(1)
memory usage: 96.0+ bytes
None
print(df.describe())
Age Salary
count 3.000000 2.000000
mean 35.000000 63333.333333
std 5.000000 14142.135623
min 30.000000 60000.000000
25% 32.500000 61666.666667
50% 35.000000 65000.000000
75% 37.500000 66666.666667
max 40.000000 70000.000000

8. Shape and Columns

  • .shape returns number of rows and columns.
  • .columns lists column names.
Use case: Understanding dataset structure for further processing.

print(df.shape) print(df.columns)
(3, 3)
Index(['Name', 'Age', 'Salary'], dtype='object')

9. Filtering Data

Filters the DataFrame to rows where Age is greater than 30. Use case: Selecting subsets of data based on conditions.

print(df[df['Age'] > 30])
Name Age Salary
2 Charlie 35.0 70000.0
3 David 40.0 NaN

10. Sorting Data

Sorts DataFrame by the Age column in ascending order by default. Use case: Organizing data for analysis or display.

print(df.sort_values('Age'))
Name Age Salary
1 Bob 30.0 60000.0
2 Charlie 35.0 70000.0
3 David 40.0 NaN

11. Resetting Index

After dropping rows, indexes may be non-sequential. .reset_index() resets them back to 0,1,2... drop=True removes the old index instead of adding it as a column. Use case: Clean data preparation after filtering or row deletion.

df = df.reset_index(drop=True) print(df)
Name Age Salary
0 Bob 30.0 60000.0
1 Charlie 35.0 70000.0
2 David 40.0 NaN

Quiz Time

1. What is the main purpose of the pandas library?

2. Which of the following creates a pandas Series?

3. Which function is used to read a CSV file in pandas?

4. How do you get the first 5 rows of a DataFrame?

5. What does df.info() provide?

6. How do you select a column named 'Age' from a DataFrame df?

7. Which method is used to remove missing values?

8. How do you fill missing values with 0?

9. Which method provides descriptive statistics?

10. How to sort the DataFrame by column 'Name'?

Pandas Practice Questions

๐Ÿง  Beginner-Level Questions

  1. Create a Pandas Series with the numbers: 5, 10, 15, 20, and print it.
  2. Create a DataFrame with the following data and print it:

    Data:
    Product: Apple, Banana, Mango
    Price: 100, 40, 150
    Quantity: 5, 10, 3
  3. Access and print the "Price" column from the above DataFrame.
  4. Access the second row using iloc[] and print it.
  5. Add a new column called "Total" which is Price * Quantity, and display the DataFrame.
  6. Drop the "Quantity" column and print the updated DataFrame.

๐Ÿงช Intermediate-Level Questions

  1. From the following DataFrame, filter and print only rows where Marks > 80:

    data = {'Student': ['A', 'B', 'C', 'D'], 'Marks': [75, 85, 60, 90]}
  2. Create a DataFrame with a "Department" column and group by department to find the average salary.
  3. Sort a DataFrame of employees by "Age" in descending order.
  4. Create a DataFrame with some NaN values and:
    • a. Drop rows with any NaN
    • b. Fill missing values with 0

๐Ÿ“Š Data Analysis Questions

  1. Use .describe() and .info() on a sample DataFrame and explain the result.
  2. Reset the index of a DataFrame after dropping a few rows.
  3. Rename the column "Age" to "Years" and print the updated DataFrame.
  4. Create a new row and append it to the DataFrame using pd.concat().
  5. From a given DataFrame, show only the first 3 and last 2 rows using head() and tail().

๐Ÿ’ก Need help? See the GitHub Page For AnswersGitHub.

Tip: Try running these codes in your Python editor or Google Colab.

Post a Comment

Previous Post Next Post

POST ADS1