Pandas is a powerful data manipulation and analysis library in Python that provides various functions to handle and transform data. One such function is `cumsum()`

, which is short for “cumulative sum”. The `cumsum()`

function is used to compute the cumulative sum of elements in a pandas DataFrame or Series. In this tutorial, we will explore the `cumsum()`

function in detail, including its syntax, parameters, and practical examples.

## Table of Contents

- Introduction to
`cumsum()`

- Syntax of
`cumsum()`

- Parameters of
`cumsum()`

- Examples
- Cumulative Sum of a Series
- Cumulative Sum of a DataFrame Column

- Use Cases of
`cumsum()`

- Conclusion

## 1. Introduction to `cumsum()`

The `cumsum()`

function in pandas is used to calculate the cumulative sum of elements along a specified axis in a DataFrame or Series. Cumulative sum refers to the summation of elements from the beginning to a specific position. This function is particularly useful in various data analysis scenarios, such as tracking running totals, identifying trends, and generating cumulative distribution functions.

## 2. Syntax of `cumsum()`

The basic syntax of the `cumsum()`

function is as follows:

```
pandas.Series.cumsum(axis=None, skipna=True)
pandas.DataFrame.cumsum(axis=None, skipna=True)
```

`axis`

: Specifies the axis along which the cumulative sum is computed. The default value is`None`

, which means the sum is calculated over the flattened array.`skipna`

: A boolean value that determines whether to exclude NaN (Not-a-Number) values from the calculation. The default value is`True`

.

## 3. Parameters of `cumsum()`

`axis`

: This parameter allows you to specify the axis along which the cumulative sum will be calculated. For a DataFrame, you can choose`0`

for calculating along columns and`1`

for calculating along rows. For a Series, this parameter is not necessary, and specifying it will result in an error.`skipna`

: This parameter controls whether to exclude NaN values from the calculation. If set to`True`

, NaN values are ignored, and the cumulative sum is computed excluding them. If set to`False`

, NaN values will propagate through the calculation and affect the result.

## 4. Examples

Let’s dive into some practical examples to better understand how the `cumsum()`

function works.

### Example 1: Cumulative Sum of a Series

Suppose we have a Series containing the daily sales data for a product:

```
import pandas as pd
data = {'Day': range(1, 11),
'Sales': [150, 200, 180, 220, 250, 170, 210, 190, 230, 200]}
df = pd.DataFrame(data)
sales_series = df['Sales']
cumulative_sales = sales_series.cumsum()
print(cumulative_sales)
```

Output:

```
0 150
1 350
2 530
3 750
4 1000
5 1170
6 1380
7 1570
8 1800
9 2000
Name: Sales, dtype: int64
```

In this example, the `cumsum()`

function is applied to the ‘Sales’ Series. The resulting Series, `cumulative_sales`

, contains the cumulative sum of sales for each day.

### Example 2: Cumulative Sum of a DataFrame Column

Consider a DataFrame representing the scores of students in different subjects:

```
data = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Math': [85, 70, 90, 78, 92],
'Science': [70, 82, 88, 95, 68],
'History': [60, 75, 80, 88, 72]}
df_scores = pd.DataFrame(data)
df_scores.set_index('Student', inplace=True)
cumulative_scores = df_scores.cumsum(axis=0)
print(cumulative_scores)
```

Output:

```
Math Science History
Student
Alice 85 70 60
Bob 155 152 135
Charlie 245 240 215
David 323 335 303
Emily 415 403 375
```

In this example, the `cumsum()`

function is used on the DataFrame `df_scores`

with `axis=0`

to calculate the cumulative sum of each subject’s scores for each student.

## 5. Use Cases of `cumsum()`

The `cumsum()`

function has various practical use cases in data analysis and manipulation:

**Running Totals**: It’s often used to calculate running totals, which are useful for monitoring trends or tracking progress over time.**Financial Analysis**: Cumulative sums are useful in financial analysis for calculating accumulated gains or losses.**Time Series Analysis**: When working with time series data, cumulative sums can help identify trends and patterns over time.**Probability and Statistics**: In statistics, cumulative sums can be used to generate cumulative distribution functions (CDFs) and cumulative probability distributions.

## 6. Conclusion

The `cumsum()`

function in pandas is a valuable tool for calculating the cumulative sum of elements in Series and DataFrames. It helps in various data analysis scenarios, including running totals, trend identification, and statistical calculations. By understanding the syntax, parameters, and examples provided in this tutorial, you are now equipped to use the `cumsum()`

function effectively in your own data analysis projects. Whether you’re working with financial data, time series data, or any other dataset, the `cumsum()`

function can provide valuable insights into the cumulative progression of values.