Cleaning up unused results
To reduce the storage, we can periodically remove result folders that are rarely used. For this, we first need to add a counter to the job database that is increased whenever a result is accessed. Or even better, we always add a timestamp to a database table or a file.
Then we can run a script every day:
- First it sorts the task by how much they have been used lately. Pseudocode:
for task in tasks:
c = Counter([d.day for d in access_timestamps_of_task])
priority = sum([c[day] / (today() - day) for day in range(days)])
- Then it looks at how much storage is available, goes through the prioritized list of tasks, checks their storage usage, and once the storage threshold is achieved, it removes the result folders of all the remaining tasks.