In data analysis and manipulation, the `pandas`

library in Python is an indispensable tool. One of the most common operations you’ll perform is grouping data based on certain criteria and then counting the occurrences within those groups. The `groupby`

and `count`

functions in `pandas`

enable you to achieve this efficiently. This tutorial will walk you through the process of using these functions step by step.

## Table of Contents

- Introduction to
`groupby`

and`count`

- Creating a Sample DataFrame
- Using the
`groupby`

Function - Applying the
`count`

Function - Handling Missing Data
- Advanced: Using
`agg`

with`groupby`

for Multiple Aggregations - Conclusion

## 1. Introduction to `groupby`

and `count`

The `groupby`

function in `pandas`

is used to group rows of data in a DataFrame based on one or more columns. This allows you to perform aggregate operations, like counting, summing, averaging, etc., on subsets of the data. The `count`

function, as the name suggests, is used to count the occurrences of non-null values in a DataFrame.

## 2. Creating a Sample DataFrame

Before we begin, let’s create a sample DataFrame that we’ll use throughout this tutorial.

```
import pandas as pd
data = {
'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 20, 15, 30, 25, 10]
}
df = pd.DataFrame(data)
```

## 3. Using the `groupby`

Function

Let’s start by grouping our DataFrame based on the ‘Category’ column.

`grouped = df.groupby('Category')`

At this point, `grouped`

is a `GroupBy`

object that has grouped the data based on unique values in the ‘Category’ column. It doesn’t perform any calculations yet, just prepares the data for aggregation.

## 4. Applying the `count`

Function

Now that we have our data grouped, we can apply the `count`

function to get the count of occurrences in each group.

`count_per_category = grouped['Value'].count()`

In this example, we’re using the ‘Value’ column for counting. The result will be a new Series containing the count of non-null values in each group.

## 5. Handling Missing Data

If your DataFrame contains missing (NaN) values, the `count`

function will exclude those values from the count. If you want to include them, you might want to use the `size`

function instead.

`size_per_category = grouped['Value'].size()`

The difference between `count`

and `size`

is that `count`

excludes NaN values, while `size`

includes them.

## 6. Advanced: Using `agg`

with `groupby`

for Multiple Aggregations

You can apply multiple aggregation functions using the `agg`

function along with `groupby`

. For example, to calculate both the count and sum for each group:

`result = grouped['Value'].agg(['count', 'sum'])`

In this case, the `result`

DataFrame will have two columns: ‘count’ and ‘sum’, showing the count and sum of ‘Value’ in each group.

## 7. Conclusion

In this tutorial, you’ve learned how to use the `groupby`

and `count`

functions in `pandas`

to efficiently group and count data based on specific columns. This is a fundamental technique for analyzing and summarizing data in a DataFrame. Remember that the power of `groupby`

doesn’t stop at counting; you can apply a wide range of aggregation functions to gain deeper insights into your data.