Pandas is a powerful data manipulation and analysis library in Python that provides various functions to handle and transform data. One such function is cumsum()
, which is short for “cumulative sum”. The cumsum()
function is used to compute the cumulative sum of elements in a pandas DataFrame or Series. In this tutorial, we will explore the cumsum()
function in detail, including its syntax, parameters, and practical examples.
Table of Contents
- Introduction to
cumsum()
- Syntax of
cumsum()
- Parameters of
cumsum()
- Examples
- Cumulative Sum of a Series
- Cumulative Sum of a DataFrame Column
- Use Cases of
cumsum()
- Conclusion
1. Introduction to cumsum()
The cumsum()
function in pandas is used to calculate the cumulative sum of elements along a specified axis in a DataFrame or Series. Cumulative sum refers to the summation of elements from the beginning to a specific position. This function is particularly useful in various data analysis scenarios, such as tracking running totals, identifying trends, and generating cumulative distribution functions.
2. Syntax of cumsum()
The basic syntax of the cumsum()
function is as follows:
pandas.Series.cumsum(axis=None, skipna=True)
pandas.DataFrame.cumsum(axis=None, skipna=True)
axis
: Specifies the axis along which the cumulative sum is computed. The default value isNone
, which means the sum is calculated over the flattened array.skipna
: A boolean value that determines whether to exclude NaN (Not-a-Number) values from the calculation. The default value isTrue
.
3. Parameters of cumsum()
axis
: This parameter allows you to specify the axis along which the cumulative sum will be calculated. For a DataFrame, you can choose0
for calculating along columns and1
for calculating along rows. For a Series, this parameter is not necessary, and specifying it will result in an error.skipna
: This parameter controls whether to exclude NaN values from the calculation. If set toTrue
, NaN values are ignored, and the cumulative sum is computed excluding them. If set toFalse
, NaN values will propagate through the calculation and affect the result.
4. Examples
Let’s dive into some practical examples to better understand how the cumsum()
function works.
Example 1: Cumulative Sum of a Series
Suppose we have a Series containing the daily sales data for a product:
import pandas as pd
data = {'Day': range(1, 11),
'Sales': [150, 200, 180, 220, 250, 170, 210, 190, 230, 200]}
df = pd.DataFrame(data)
sales_series = df['Sales']
cumulative_sales = sales_series.cumsum()
print(cumulative_sales)
Output:
0 150
1 350
2 530
3 750
4 1000
5 1170
6 1380
7 1570
8 1800
9 2000
Name: Sales, dtype: int64
In this example, the cumsum()
function is applied to the ‘Sales’ Series. The resulting Series, cumulative_sales
, contains the cumulative sum of sales for each day.
Example 2: Cumulative Sum of a DataFrame Column
Consider a DataFrame representing the scores of students in different subjects:
data = {'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Math': [85, 70, 90, 78, 92],
'Science': [70, 82, 88, 95, 68],
'History': [60, 75, 80, 88, 72]}
df_scores = pd.DataFrame(data)
df_scores.set_index('Student', inplace=True)
cumulative_scores = df_scores.cumsum(axis=0)
print(cumulative_scores)
Output:
Math Science History
Student
Alice 85 70 60
Bob 155 152 135
Charlie 245 240 215
David 323 335 303
Emily 415 403 375
In this example, the cumsum()
function is used on the DataFrame df_scores
with axis=0
to calculate the cumulative sum of each subject’s scores for each student.
5. Use Cases of cumsum()
The cumsum()
function has various practical use cases in data analysis and manipulation:
- Running Totals: It’s often used to calculate running totals, which are useful for monitoring trends or tracking progress over time.
- Financial Analysis: Cumulative sums are useful in financial analysis for calculating accumulated gains or losses.
- Time Series Analysis: When working with time series data, cumulative sums can help identify trends and patterns over time.
- Probability and Statistics: In statistics, cumulative sums can be used to generate cumulative distribution functions (CDFs) and cumulative probability distributions.
6. Conclusion
The cumsum()
function in pandas is a valuable tool for calculating the cumulative sum of elements in Series and DataFrames. It helps in various data analysis scenarios, including running totals, trend identification, and statistical calculations. By understanding the syntax, parameters, and examples provided in this tutorial, you are now equipped to use the cumsum()
function effectively in your own data analysis projects. Whether you’re working with financial data, time series data, or any other dataset, the cumsum()
function can provide valuable insights into the cumulative progression of values.