The glob
module in Python is a powerful tool for working with file paths and retrieving a list of file or directory paths based on specific patterns. It simplifies the process of finding files and directories that match certain criteria within a directory structure. Whether you’re working on a data processing project, managing files, or automating tasks, the glob
module can save you significant time and effort. In this tutorial, we’ll dive deep into the glob
module and explore its various features with comprehensive examples.
Table of Contents
- Introduction to the
glob
Module - Basic Usage
- Wildcard Patterns
- Recursive Pattern Matching
- Sorting Results
- Handling Non-Matching Patterns
- Real-world Examples
7.1. Example 1: Processing CSV Files
7.2. Example 2: Organizing Image Files - Conclusion
1. Introduction to the glob
Module
The glob
module provides a function called glob()
that allows you to search for files and directories using wildcard patterns in a specified directory. It’s a part of the Python standard library, so you don’t need to install any additional packages to use it.
2. Basic Usage
Let’s start with a simple example of how to use the glob
module. Suppose you have a directory named “files” containing several text files, and you want to retrieve a list of all the text files in that directory. Here’s how you can do it:
import glob
file_list = glob.glob("files/*.txt")
print("List of text files:", file_list)
In this example, the glob.glob()
function takes a single argument, which is a string containing the pattern you want to match. The "files/*.txt"
pattern specifies that you want to find all files with the “.txt” extension in the “files” directory. The returned file_list
will contain a list of paths that match the specified pattern.
3. Wildcard Patterns
The power of the glob
module lies in its support for wildcard characters that allow you to match multiple filenames based on a pattern. Here are some commonly used wildcard characters:
*
: Matches any sequence of characters (including none).?
: Matches any single character.[]
: Matches any character within the brackets.!
: Matches anything except the specified characters within brackets.
Let’s explore these wildcard patterns with examples:
Example 1: Match Files with Different Extensions
Suppose you have a directory with various types of data files such as “.csv”, “.json”, and “.xml”. You want to retrieve a list of all these files. Here’s how you can use the *
wildcard to match files with different extensions:
data_files = glob.glob("data/*.*")
print("List of data files:", data_files)
In this example, "data/*.*"
matches all files in the “data” directory with any extension.
Example 2: Matching Specific Characters
Suppose you have a directory with log files, and each log file is labeled with a date in the format “log_YYYYMMDD.txt”. You want to retrieve a list of log files from a specific month, let’s say July. You can use the []
wildcard to match specific characters in the filename:
july_logs = glob.glob("logs/log_202107*.txt")
print("List of July log files:", july_logs)
In this example, "logs/log_202107*.txt"
matches all log files from July 2022.
4. Recursive Pattern Matching
The glob
module also supports recursive pattern matching, allowing you to search for files in subdirectories as well. To perform recursive pattern matching, you can use the "**"
pattern.
Example 1: Recursive Search for Specific Files
Suppose you have a directory structure with multiple levels of subdirectories, and you want to find all files named “report.txt” regardless of their location within the directory structure. Here’s how you can achieve this using the "**"
pattern:
all_reports = glob.glob("**/report.txt", recursive=True)
print("List of all report files:", all_reports)
In this example, the "**/report.txt"
pattern matches “report.txt” files in all subdirectories.
5. Sorting Results
The glob.glob()
function returns a list of file paths in the order in which the operating system’s file system API returns them. If you want to sort the results, you can use the built-in sorted()
function.
Example: Sorting File Paths
Suppose you have a directory with image files, and you want to retrieve a sorted list of these image files. Here’s how you can do it:
import glob
image_files = glob.glob("images/*.jpg")
sorted_images = sorted(image_files)
print("Sorted image files:", sorted_images)
In this example, the sorted()
function is used to sort the image_files
list in alphabetical order.
6. Handling Non-Matching Patterns
When using the glob
module, it’s important to consider scenarios where the specified pattern doesn’t match any files or directories. The glob.glob()
function will return an empty list if no matches are found.
Example: Handling Non-Matching Patterns
Suppose you have a directory with potential backup files in the format “backup_*.zip”, and you want to retrieve a list of these backup files. If no backup files exist, you want to display a message indicating that no backups were found. Here’s how you can handle this scenario:
backup_files = glob.glob("backups/backup_*.zip")
if not backup_files:
print("No backup files found.")
else:
print("List of backup files:", backup_files)
In this example, the if not backup_files:
condition checks if the backup_files
list is empty, and if it is, the appropriate message is displayed.
7. Real-world Examples
7.1. Example 1: Processing CSV Files
Suppose you have a directory containing multiple CSV files, and you want to process each file one by one. Here’s how you can use the glob
module to achieve this:
import glob
import pandas as pd
csv_files = glob.glob("data/*.csv")
for csv_file in csv_files:
df = pd.read_csv(csv_file)
# Process the DataFrame
# ...
In this example, the glob.glob()
function is used to retrieve a list of CSV files in the “data” directory. The loop iterates over each file, reads it using the Pandas library, and processes the resulting DataFrame.
7.2. Example 2: Organizing Image Files
Suppose you have a directory containing a mix of image files with different extensions, and you want to organize them into separate subdirectories based on their extensions. Here’s how you can use the glob
module to achieve this:
import glob
import os
import shutil
image_files = glob.glob("images/*.*")
for image_file in image_files:
_, ext = os.path.splitext(image_file)
ext = ext.lower()
if ext in ['.jpg',
'.png', '.gif']:
os.makedirs(f"images/{ext[1:]}", exist_ok=True)
shutil.move(image_file, f"images/{ext[1:]}/{os.path.basename(image_file)}")
In this example, the glob.glob()
function retrieves a list of all image files in the “images” directory. The loop iterates over each file, extracts its extension using os.path.splitext()
, and moves the file to a subdirectory based on its extension using shutil.move()
.
8. Conclusion
The glob
module in Python provides a flexible and efficient way to search for files and directories based on wildcard patterns. With its support for various wildcard characters and recursive pattern matching, it becomes a powerful tool for tasks ranging from basic file handling to complex data processing. By mastering the concepts and examples covered in this tutorial, you’ll be well-equipped to leverage the glob
module in your Python projects and streamline your file manipulation tasks.