Python Check if Files Exist – os.path, Pathlib, try/except
LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.
A simple way of checking if a file exists is by using the exists()
function from the os
library. The function is shown below with example_file.txt
:
In this case, the file exists, so the exists()
function has returned True
. If the file didn't exist, the function would return False
. Today we'll look at some of the reasons you may want to check if a file exists.
We'll also look at a few different methods for opening files and some practical examples of where those methods would be beneficial.
Why Check if Files Exist
Many advanced Python programs rely on files for some functionality; this may include using log files for recording significant events, using a file containing program settings, or even using image files for an interface.
Looking more specifically at data science, you could be looking for a file with import data, a pre-trained machine learning model, or a backup file for recording results before export.
The presence of a file could signal what phase the program is in and influence what action your script needs to take next. Another critical factor to consider is that attempting to open a file that doesn't exist can cause your program to crash. As a result, you want to make sure you've got some processes that catch these problems before they happen.
Option 1: os
Example 1
This section will explore the use of the os
library for checking if files exist. os
is a good place to start for people who are new to Python and object-oriented programming. For the first example, let's build on the quick example shown in the introduction, discussing the exists()
function in better detail. Let's say we've developed a simple GUI for an application that we're developing.
As part of the program, we could keep a file called log.txt
, which tracks everything we do in the interface. The first time we run the program, the log file might not exist yet, so we'll need to check if it exists before performing any operations on the file.
Using the exists()
function, we could do this as follows:
If this is the first time running the script, then no log exists, and our if
block will create a fresh log for us. This feature will prevent a crash from happening during log operations.
Note
Note that we've used a context manager for operating on the file. For more examples, see the python close file solution.
Example 2
For a more complex example, imagine that we're deploying an automated machine learning model to predict the weather. Since our model relies on historical weather data, we've also developed a web scraper that automatically runs once a day and stores the data in a file called weather_data_today.csv
inside the input directory.
As part of our model deployment, we could add a function check_for_new_data()
, which uses exists()
to see when a new file gets uploaded. Our function will then move the file to a different directory using the shutil.move()
function. We can then use the data to update our weather predictions.
We could schedule the function to run every hour, so the program constantly checks for new data. With this setup, the next time our scraper uploads weather_data_today.csv
to the input
folder, our script will detect the change and update the predictions again, creating an automated solution. This process would look something like the code below:
A major limitation of using os.path.exists()
is that after checking if a file exists, another process running in the background could delete it.
Large programs are often a combination of moving parts, with different scripts running at a time. Suppose our if os.path.exists()
line returns True
, then another function deletes the file, the file operation coupled with our if statement
could cause our program to crash. This sequence of events is known as a race condition. One way around this issue is by using the pathlib
library, as seen in the next section.
Option 2: pathlib
An alternative approach to os
is to use the pathlib
library instead. pathlib
is a more object-oriented way of checking files, which offers many options for file interaction.
Here, we'll check for the existence of the log.txt file in a similar way to the first example of the os
section, but this time using pathlib
to do the job:
In this example, we've created the object log_file
using the Path()
class. Similar to the os
example, using exists()
here will return True
if the file exists. If the file doesn't exist, this is covered by our if
branch, creating a new log file for us.
A significant advantage of using Path()
is that it doesn't matter if the file is deleted elsewhere by a concurrent process running in the background because it stores files as an object. Thus, even if a file is deleted somewhere between our if statement and the file operation, the file operation won't cause a program crash because using Path()
creates a local copy of the file in the form of an object.
In terms of performance, os
executes slightly faster than pathlib
, so there are some situations where it may have an edge. Despite this, pathlib
is often considered a better choice than os
, with pathlib
allowing you to chain methods and functions together quickly. pathlib
also prevents the race condition issue associated with using os
.
Option 3: Exception Handling
If you aren't planning to do much with a file and only want to check if a file exists, then using the previous options is appropriate. In cases where you'd like to perform a file operation immediately after checking if the file exists, using these approaches could be viewed as inefficient.
In these situations, exception handling is preferable. With exception handling, your script can detect the FileNotFoundError: [Errno 2] No such file or directory
error as it occurs and react to it accordingly.
Building on the log.txt
example used in the os
section, let's say that we'd like to update the log to have a line reading 'New session' every time we start our program. We could do this as shown below:
The try-except
branches are working similarly to an if-else statement
. Python try
to open the file, and if it works, append 'New Session'. Python will enter the except
statement if a FileNotFoundError
is thrown. This means the file doesn't exist, so we create a new one.
As mentioned previously, in situations where you'd like to complete an operation on the file if it exists, using exception handling is the preferred approach. The exception handling solution performs the same functionality as an os
solution but doesn't require a library import. The latter approach also seems clumsy. It requires two interactions with the file (checking if it exists, then appending/writing), compared with just appending/writing in our exception handling solution.
This solution also helps us to avoid race conditions. The with
statement will hold files open even if they are deleted by a concurrent process, preventing the program from crashing.
In most cases, it's considered more Pythonic to avoid exception handling. But when it comes to files, exception handling is a bit of a gray area. It'll be up to you to find the correct balance between having a more efficient piece of code versus too much exception handling, which could cause you to miss vital errors further down the line.
Summary
Checking for the existence of particular files can be fundamental to preventing your program from crashing. We've explored the uses of the os
and pathlib
libraries, along with exception handling. Looking at the different situations that these approaches are better suited to and weighing up the pros and cons of each.
We recommend using the os
library for beginners, it's a lot clunkier, but it may be easier for you to wrap your head around if you're new to file operations. For more advanced users with a better understanding of object-oriented programming, pathlib
is the way to go. For certain situations where these libraries aren't required, exception handling can be an excellent alternative for checking if files exist, although you should use this approach with caution!