Scheduling Cleaning on a Data Domain System

Scheduling Cleaning on a Data Domain System

PURPOSE

 

The filesys clean operation reclaims physical storage occupied by deleted objects in the Data Domain file system.

 

When application software expires backup or archive images and when the images are not present in a snapshot, the images are not accessible or available for recovery from the application or from a snapshot. However, the images still occupy physical storage.

 

Only a filesys clean operation reclaims the physical storage used by files that are deleted and that are not present in a snapshot. The file system may never report 100% cleaned. The total space cleaned may always be a few percentage points less than 100.

APPLIES TO

 

  • All Data Domain Systems
  • All Software Releases
  • Cleaning

SUBTITLE

 

ContentData Domain recommends running a clean operation after the first full backup to a Data Domain System. The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate clean operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding amount of disk space.

 

A default schedule runs the clean operation every Tuesday at 6 a.m. (tue 0600) ) with 50% throttle.

 

To increase file system availability, and if the Data Domain System is not short on disk space, consider changing the schedule to clean less often.

 

Issues that can affect the cleaning process:

 

  • If system is filling up, changing default values to more frequent or aggressive cleaning cycles should not be used to compensate this. Running cleaning every day will fragment the data. E.g. read speeds can be severely impaired. Global compression algorithm is dependent on good locality during writes so too frequent clean cycle will in addition bring de-duplication numbers down.
  • Cleaning is a filesystem operation that will impact overall filesystem performance while it is running. Changing cleaning throttle higher from default of 50 will have impact performance during active cleaning cycle as the cleaning process will consume more resources.
  • Changing the local compression algorithm will cause following cleaning cycle to run significantly longer as all existing data needs to be read, uncompressed and compressed again.
  • Any operation that shuts down the Data Domain System filesystem or powers off the device (a system power-off , reboot or filesystem disable- command) stops the clean operation. The clean does not automatically continue when the system and file system starts again.
  • Replication between Data Domain systems can affect filesys clean operations. If a source Data Domain system receives large amounts of new or changed data while disabled or disconnected, resuming replication may significantly slow down filesys clean operations.
  • If the directory replication is running behind e.g. due insufficient network bandwidth between the replication pairs (resulting to a replication lag) cleaning may not be able to run fully. This condition requires either replication break (and resync once cleaning has ran) or replication lag to catch up (e.g. increasing network link or writing less new data to source directory).

 

A Data Domain system that is full may need multiple clean operations to clean 100% of the file system, especially when one or more external shelves are attached. Depending on the type of data stored, such as when using markers for specific backup software (filesys option set marker-type ... ), the file system may never report 100% cleaned. The total space cleaned may always be a few percentage points less than 100.

 

With collection replication, the clean operation does not run on the destination. With directory replication, the clean operation needs to be run on both the source and destination Data Domain systems.

 

To display the current date and time for the clean operation, use the filesys clean show schedule operation.

 

# filesys clean show schedule

 

1.jpg


To display the throttle setting for cleaning operations, use the filesys clean show throttle operation. Changes to the throttle setting will take effect without restarting cleaning.

 

filesys clean show throttle

 

2.jpg


Changing the Scheduled Cleaning

 

To change the date and time when clean runs automatically, use the clean set schedule operation. The default time is Tuesday at 6 a.m. (tue 0600). The operation is available only to administrative users.

 

  • Daily runs the operation every day at the given time (Not recommended).
  • Monthly starts on a given day or days (from 1 to 31) at the given time.
  • Never turns off the clean process and does not take a qualifier.
  • With the day-name qualifier, the operation runs on the given day(s) at the given time. A day-name is three letters (such as mon for Monday). Use a dash between days for a range of days. For example: tue-fri.
  • Time is 24-hour military time. 2400 is not a valid time. mon 0000 is   midnight between Sunday night and Monday morning.
  • The most recent invocation of the scheduling operation cancels the previous setting.

 

The command syntax is:

 

filesys clean set schedule daily time

filesys clean set schedule monthly day-numeric-1 [,day-numeric-2,...]time

filesys clean set schedule never

filesys clean set schedule day-name-1[,day-name-2,...]timeFile System Management 223 Clean Operations

     

For example, the following command runs the operation automatically every Tuesday at 4 p.m.:

 

# filesys clean set schedule tue 1600

 

3.jpg


To run the operation more than once in a month, set multiple days in one command. For example, to run the operation on the first and fifteenth of the month at 4 p.m.:

 

# filesys clean set schedule monthly 1,15 1600

 

4.jpg


To set the clean schedule to the default of Tuesday at 6 a.m. (tue 0600), the default throttle of 50%, or both, use the filesys clean reset operation.

 

filesys clean reset {schedule | throttle | all}

 

5.jpg

REFERENCE