Python Pandas loc vs iloc: A Comprehensive Tutorial

Table of Contents:

Introduction to Pandas
Understanding loc

Syntax and Usage
Label-based Indexing
Selecting Rows and Columns
Boolean Indexing
Multi-indexing with loc

Exploring iloc

Syntax and Usage
Integer-based Indexing
Selecting Rows and Columns
Slicing with iloc

Key Differences Between loc and iloc
Examples Demonstrating loc and iloc

Example 1: Indexing and Selecting Data
Example 2: Boolean Indexing and Slicing

Conclusion

1. Introduction to Pandas

Pandas is an open-source Python library that provides powerful data manipulation and analysis tools. It introduces two main data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional tabular data structure akin to a spreadsheet or SQL table. The loc and iloc indexers in Pandas are essential tools for selecting and manipulating data within these structures.

2. Understanding `loc`

Syntax and Usage

The loc indexer in Pandas is used to access a group of rows and columns by labels or boolean array. The syntax for using loc is:

dataframe.loc[row_indexer, column_indexer]

Label-based Indexing

Label-based indexing refers to selecting data based on the row and column labels. The row_indexer and column_indexer are typically slices, lists, or single values that represent the labels of rows and columns you want to select.

Selecting Rows and Columns

To select specific rows and columns using loc, you can use labels:

# Select a single row by label
row_data = dataframe.loc['row_label']

# Select multiple rows by labels
multiple_rows = dataframe.loc[['row_label_1', 'row_label_2']]

# Select specific rows and columns
subset = dataframe.loc[['row_label'], ['col_label']]

Boolean Indexing

You can also use boolean arrays to perform conditional selections using loc:

# Select rows where a condition is met
condition = dataframe['column'] > 5
selected_data = dataframe.loc[condition]

Multi-indexing with `loc`

Pandas also supports multi-indexing. Multi-indexing allows you to have more than one index level, enabling hierarchical representation of data. With loc, you can specify multiple index labels to access data at different levels:

# Select data with multi-indexing
data = dataframe.loc['index_level_1', 'index_level_2']

3. Exploring `iloc`

Syntax and Usage

The iloc indexer in Pandas is used for integer-location based indexing. It is used to access a group of rows and columns by integer position. The syntax for using iloc is:

dataframe.iloc[row_indexer, column_indexer]

Integer-based Indexing

Integer-based indexing refers to selecting data based on the row and column integer positions. The row_indexer and column_indexer are typically slices, lists, or single integers that represent the positions of rows and columns you want to select.

Selecting Rows and Columns

To select specific rows and columns using iloc, you can use integer positions:

# Select a single row by integer position
row_data = dataframe.iloc[2]

# Select multiple rows by integer positions
multiple_rows = dataframe.iloc[1:4]

# Select specific rows and columns by integer positions
subset = dataframe.iloc[1:3, 2:4]

Slicing with `iloc`

Slicing with iloc is similar to normal Python slicing. The end index in a slice is exclusive:

# Select rows from 1 to 5 and columns from 0 to 2
sliced_data = dataframe.iloc[1:5, 0:3]

4. Key Differences Between `loc` and `iloc`

Label vs Integer Indexing: The most significant difference is that loc uses label-based indexing, while iloc uses integer-based indexing.
Inclusivity: loc includes the ending label in slices, while iloc does not include the ending integer position.
Usage of Boolean Arrays: loc allows boolean arrays for conditional selection based on labels, whereas iloc doesn’t support this directly.
Multi-indexing: loc can handle multi-indexing using multiple labels, but iloc can’t be used directly for multi-indexing.

5. Examples Demonstrating `loc` and `iloc`

Example 1: Indexing and Selecting Data

Consider a DataFrame named sales_data:

import pandas as pd

data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Sales': [100, 150, 200, 75, 300]
}

sales_data = pd.DataFrame(data)

Using loc:

# Select row by label
product_b = sales_data.loc[1]

# Select multiple rows and specific columns by label
subset = sales_data.loc[1:3, ['Product', 'Sales']]

Using iloc:

# Select row by integer position
product_b = sales_data.iloc[1]

# Select multiple rows and specific columns by integer positions
subset = sales_data.iloc[1:4, [0, 1]]

Example 2: Boolean Indexing and Slicing

Continuing with the same sales_data:

# Boolean indexing with loc
high_sales_products = sales_data.loc[sales_data['Sales'] > 150]

# Slicing with iloc
sliced_data = sales_data.iloc[1:4, 0:2]

6. Conclusion

In this tutorial, we’ve covered the key differences between loc and iloc in Pandas and provided comprehensive examples of their usage. Understanding these two indexers is crucial for effective data manipulation and analysis in Pandas. loc is used for label-based indexing, while iloc is used for integer-based indexing. Both are powerful tools that offer various ways to select and manipulate data within Pandas Series and DataFrames.

1. Introduction to Pandas

2. Understanding loc

Syntax and Usage

Label-based Indexing

Selecting Rows and Columns

Boolean Indexing

Multi-indexing with loc

3. Exploring iloc

Syntax and Usage

Integer-based Indexing

Selecting Rows and Columns

Slicing with iloc

4. Key Differences Between loc and iloc

5. Examples Demonstrating loc and iloc

Example 1: Indexing and Selecting Data

Example 2: Boolean Indexing and Slicing

6. Conclusion

Leave a Reply Cancel reply

2. Understanding `loc`

Multi-indexing with `loc`

3. Exploring `iloc`

Slicing with `iloc`

4. Key Differences Between `loc` and `iloc`

5. Examples Demonstrating `loc` and `iloc`