I need to delete a lot of things…
My top priority project at the moment involves quantifying the conservation status of plants that are the wild relatives of our modern agricultural crops. These Crop Wild Relatives hold a great deal of genetic diversity that can be breed back into agricultural crops to help improve the crop’s ability to withstand challenging conditions such as drought, cold and heat tolerance, and pest resistance.
If this sounds new and interesting, please check out this video by the Crop Trust, which explains the connection between CWRs and modern crops.
My team is currently evaluating the 650 crop wild relatives that are native to the United States. This process involves a lot of input data that is feed into a modeling workflow, which produces a lot of data. The modeling step is an iterative process, meaning there are multiple model runs for each species.
An example of the file structure created for each species for each model run ~ 11 folders per run
An example of some of the files created on each run. ~ 52 files per run. This process is then applied to 650 species which generates over 7000 file folders and 33,000 files.
Because most of these runs are just tests, they eventually need to be deleted. Deleting all the files by hand would be very cumbersome and time-consuming, so I’m using R to make the computer delete the files for me.
deleting things with R
As with most computer-related tasks, with a little bit of time and some digging, I was able to find a way to remove all unnecessary files associated with the older model runs efficiently. I relied on the four base R functions to make it happen: list.dirs() grep() list.files() unlink()
I point to the directory where all my files are saved and use
list.dirs() to gather all file paths within the directory.
Before deleting old model runs, this directory contain 37,000 sub directories.
With this list of files, we use the
grep() function to find all directories that end with “test20190827”. Without the “$” at the end of the pattern, the grep function will return all folders that contain the pattern. As we are just deleting files here, it’s more efficient to stop at the top directory. If we remove the top directory, we will also capture everything else within it.
To start, I just wanted to test the process by deleting a single folder. The
recursive=TRUE option forces the
unlink() function to delete file folders. If you are satisfied with the results, you can drop the index and remove all the files.
This process could be done to catch files from a specific run as well. I needed this because some files are saved with
Sys.Date() in the name outside of the folders directories which were deleted.
Deleting a lot of things with R
While this process is much much faster than manually deleting files, I still have a lot of things to delete. Let’s say I ran the modeling process four days in a row, but I only need to keep the 4th iteration. We can automate this process a bit more by wrapping the process into a function and rolling it across a list.
With great power comes great responsibility
I worked with a co-instructor at a summer camp where we could take grade school kids out in canoes and kayaks on lakes. While teaching skills and safety on land, he would frequently tell the kids that “Gone is forever.” A warning that anything the drop beneath the surface of the water was unrecoverable, lost, and completely gone from your life, forever.
It was a dire warning to attempt to create some sense of responsibility in the kids, and the advice applies to this process as well. The
unlink() function does not move items to the recycling bin; it removes them from your computer. They are not coming back. Be cautious and think before you unlink.