Table of Contents:
- Introduction to Pandas
- Understanding
loc
- Syntax and Usage
- Label-based Indexing
- Selecting Rows and Columns
- Boolean Indexing
- Multi-indexing with
loc
- Exploring
iloc
- Syntax and Usage
- Integer-based Indexing
- Selecting Rows and Columns
- Slicing with
iloc
- Key Differences Between
loc
andiloc
- Examples Demonstrating
loc
andiloc
- Example 1: Indexing and Selecting Data
- Example 2: Boolean Indexing and Slicing
- Conclusion
1. Introduction to Pandas
Pandas is an open-source Python library that provides powerful data manipulation and analysis tools. It introduces two main data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional tabular data structure akin to a spreadsheet or SQL table. The loc
and iloc
indexers in Pandas are essential tools for selecting and manipulating data within these structures.
2. Understanding loc
Syntax and Usage
The loc
indexer in Pandas is used to access a group of rows and columns by labels or boolean array. The syntax for using loc
is:
dataframe.loc[row_indexer, column_indexer]
Label-based Indexing
Label-based indexing refers to selecting data based on the row and column labels. The row_indexer
and column_indexer
are typically slices, lists, or single values that represent the labels of rows and columns you want to select.
Selecting Rows and Columns
To select specific rows and columns using loc
, you can use labels:
# Select a single row by label
row_data = dataframe.loc['row_label']
# Select multiple rows by labels
multiple_rows = dataframe.loc[['row_label_1', 'row_label_2']]
# Select specific rows and columns
subset = dataframe.loc[['row_label'], ['col_label']]
Boolean Indexing
You can also use boolean arrays to perform conditional selections using loc
:
# Select rows where a condition is met
condition = dataframe['column'] > 5
selected_data = dataframe.loc[condition]
Multi-indexing with loc
Pandas also supports multi-indexing. Multi-indexing allows you to have more than one index level, enabling hierarchical representation of data. With loc
, you can specify multiple index labels to access data at different levels:
# Select data with multi-indexing
data = dataframe.loc['index_level_1', 'index_level_2']
3. Exploring iloc
Syntax and Usage
The iloc
indexer in Pandas is used for integer-location based indexing. It is used to access a group of rows and columns by integer position. The syntax for using iloc
is:
dataframe.iloc[row_indexer, column_indexer]
Integer-based Indexing
Integer-based indexing refers to selecting data based on the row and column integer positions. The row_indexer
and column_indexer
are typically slices, lists, or single integers that represent the positions of rows and columns you want to select.
Selecting Rows and Columns
To select specific rows and columns using iloc
, you can use integer positions:
# Select a single row by integer position
row_data = dataframe.iloc[2]
# Select multiple rows by integer positions
multiple_rows = dataframe.iloc[1:4]
# Select specific rows and columns by integer positions
subset = dataframe.iloc[1:3, 2:4]
Slicing with iloc
Slicing with iloc
is similar to normal Python slicing. The end index in a slice is exclusive:
# Select rows from 1 to 5 and columns from 0 to 2
sliced_data = dataframe.iloc[1:5, 0:3]
4. Key Differences Between loc
and iloc
- Label vs Integer Indexing: The most significant difference is that
loc
uses label-based indexing, whileiloc
uses integer-based indexing. - Inclusivity:
loc
includes the ending label in slices, whileiloc
does not include the ending integer position. - Usage of Boolean Arrays:
loc
allows boolean arrays for conditional selection based on labels, whereasiloc
doesn’t support this directly. - Multi-indexing:
loc
can handle multi-indexing using multiple labels, butiloc
can’t be used directly for multi-indexing.
5. Examples Demonstrating loc
and iloc
Example 1: Indexing and Selecting Data
Consider a DataFrame named sales_data
:
import pandas as pd
data = {
'Product': ['A', 'B', 'C', 'D', 'E'],
'Sales': [100, 150, 200, 75, 300]
}
sales_data = pd.DataFrame(data)
Using loc
:
# Select row by label
product_b = sales_data.loc[1]
# Select multiple rows and specific columns by label
subset = sales_data.loc[1:3, ['Product', 'Sales']]
Using iloc
:
# Select row by integer position
product_b = sales_data.iloc[1]
# Select multiple rows and specific columns by integer positions
subset = sales_data.iloc[1:4, [0, 1]]
Example 2: Boolean Indexing and Slicing
Continuing with the same sales_data
:
# Boolean indexing with loc
high_sales_products = sales_data.loc[sales_data['Sales'] > 150]
# Slicing with iloc
sliced_data = sales_data.iloc[1:4, 0:2]
6. Conclusion
In this tutorial, we’ve covered the key differences between loc
and iloc
in Pandas and provided comprehensive examples of their usage. Understanding these two indexers is crucial for effective data manipulation and analysis in Pandas. loc
is used for label-based indexing, while iloc
is used for integer-based indexing. Both are powerful tools that offer various ways to select and manipulate data within Pandas Series and DataFrames.