Pandas cummax() Function Tutorial (With Examples)

Introduction to cummax()

In the world of data analysis and manipulation, the pandas library is a powerful tool that offers a wide range of functions to work with structured data. One of these functions is cummax(), short for cumulative maximum. The cummax() function allows you to compute the cumulative maximum of elements along a specified axis in a pandas DataFrame or Series. This can be particularly useful when analyzing time series data, financial data, or any situation where you want to track the running maximum value over time.

The basic syntax of the cummax() function is as follows:

pandas.cummax(axis=None, skipna=True, *args, **kwargs)

Here’s what each parameter means:

axis: This parameter specifies the axis along which the cumulative maximum should be computed. It can take values of 0 or ‘index’ for columns, and 1 or ‘columns’ for rows. If not specified, the default value is 0.
skipna: This parameter determines whether to exclude NaN (Not a Number) values when computing the cumulative maximum. If set to True, NaN values are ignored; if set to False, NaN values propagate through the calculations. The default value is True.

Now, let’s dive into some examples to understand how the cummax() function works in practice.

Example 1: Cumulative Maximum of a Series

Suppose we have the following sales data for a product over a period of time:

import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
        'Sales': [100, 150, 120, 200, 180]}

df = pd.DataFrame(data)

We have a DataFrame df with two columns: ‘Date’ and ‘Sales’. To calculate the cumulative maximum of the ‘Sales’ column, we can use the cummax() function:

df['Cumulative_Max_Sales'] = df['Sales'].cummax()
print(df)

The output will be:

         Date  Sales  Cumulative_Max_Sales
0  2023-01-01    100                  100
1  2023-01-02    150                  150
2  2023-01-03    120                  150
3  2023-01-04    200                  200
4  2023-01-05    180                  200

In this example, the ‘Cumulative_Max_Sales’ column shows the running maximum of the ‘Sales’ column as we move down the DataFrame.

Example 2: Cumulative Maximum of Multiple Columns

The cummax() function can also be applied to multiple columns simultaneously. Let’s consider a scenario where we have data on the stock prices of two companies, ‘Company A’ and ‘Company B’, over a period of time:

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
        'Company_A_Price': [50, 55, 53, 60, 58],
        'Company_B_Price': [120, 125, 123, 130, 128]}

df = pd.DataFrame(data)

To calculate the cumulative maximum of both stock prices, we can pass a list of column names to the cummax() function:

price_columns = ['Company_A_Price', 'Company_B_Price']
df[price_columns] = df[price_columns].cummax()
print(df)

The output will be:

         Date  Company_A_Price  Company_B_Price
0  2023-01-01               50              120
1  2023-01-02               55              125
2  2023-01-03               55              125
3  2023-01-04               60              130
4  2023-01-05               60              130

In this example, both ‘Company_A_Price’ and ‘Company_B_Price’ columns have been updated with their respective cumulative maximum values.

Practical Use Cases

The cummax() function is particularly useful in various data analysis scenarios:

Financial Analysis: When analyzing financial data, such as stock prices, you might want to track the highest price reached by a stock over time. The cummax() function helps you identify the peak price for each trading day.
Time Series Analysis: In time series data, you might want to understand trends and patterns in your data. The cumulative maximum provides insights into the historical high points, which can be crucial for making informed decisions.
Resource Management: In scenarios where resources are limited and you need to keep track of the highest resource utilization, the cummax() function can help you manage resources efficiently.
Quality Control: For quality control in manufacturing, you can use the cumulative maximum to track the highest measurements recorded, helping you identify deviations from the norm.

Conclusion

The cummax() function in pandas is a powerful tool for computing the cumulative maximum of elements in a DataFrame or Series. It is particularly valuable when working with time-based or sequential data, as it allows you to track the running maximum over time. This tutorial provided an in-depth understanding of the cummax() function, its parameters, and practical use cases through two illustrative examples. By applying this function to your own data analysis projects, you can gain valuable insights into trends, patterns, and extreme values in your data.

Remember that pandas offers a plethora of functions like cummax() that can help you manipulate and analyze data efficiently. Exploring and mastering these functions will undoubtedly enhance your data analysis capabilities.

Introduction to cummax()

Example 1: Cumulative Maximum of a Series

Example 2: Cumulative Maximum of Multiple Columns

Practical Use Cases

Conclusion

Leave a Reply Cancel reply