Manual deletion of data from a DataDomain FAQ

Article Number: 499265                             Article Version:                               Article Type: How To


Product:

Data Domain

 

Instructions:


This document aims to answer questions about why data might need to be deleted from a DataDomain, and how this can be achieved.

There are 3 main reasons why manual removal of data from a DataDomain may be considered:

1) The system is close to full and more free space is required.
2) A backup application has been decommissioned, but its data remains on the DataDomain
3) Data has become orphaned from the backup application which wrote it.


What is the role of the DataDomain in the backup environment?

The backup application is responsible for writing data to the DataDomain during a backup; reading data from the DataDomain as part of a recovery; maintaining a catalog or database of this data; determining when data has reached its retention period; and expiring data.

The DataDomain is considered a passive device in that it does not control retention of the data, and it does not expire the data.  It only deletes files when commanded via one of its protocols (ie CIFS/NFS/DDBOOST.) 


Whose responsibility is the data on the DataDomain?

The customer has ultimate responsibility for the data on the DataDomain.


What is orphaned data, and what causes it?

Data is considered orphaned if it exists on the DataDomain, but is no longer controlled by the backup application which wrote it.  Data in this state will remain on the DataDomain and will not be freed by cleaning.  There are a number of ways in which this can occur:

- the backup application has been decommissioned.
- the DataDomain file system may have been unavailable at the moment the backup application attempted to expire data.  Some backup applications may not retry this operation.
- network issues can prevent the request to expire data from reaching the DataDomain.
- the backup application is instructed to remove data from its catalog/media database, but not from the underlying DataDomain storage.


Is it possible to request DataDomain Support to remove obsolete or orphaned data?

No, as the DataDomain is not aware that data is obsolete or orphaned.  For further information on how customers can determine whether data is obsolete or orphaned please review the later question: "How do I remove data which has become orphaned from the backup application which wrote it?"


Is it possible to request DataDomain Support to remove data from a given time range?

Not directly, as there may be backup application metadata files within the file system which fall within the given time range.  If this metadata is removed, the backup application may break or behave unpredictably.


How do I free space from my system which is nearly full?

This situation implies that the backup application is in control of all the data on the DataDomain.  If there is no reason why data expired from the backup application cannot be freed by the DataDomain's cleaning process (for example, but not limited to: active scheduled snapshots from before the expiration, replication lag) then the data should be expired from the backup application.  Please note that it may be necessary to expire savesets manually, as changing the retention policy may only impact future backups, rather than those which have already been ingested.  Although manually deleting the data from the DataDomain (and then running cleaning) would free space, it is likely to cause future issues with the backup application as the catalog/media database would be inaccurate.  For this reason, DataDomain support cannot recommend deleting data manually in this situation.


How do I remove data which remained after I decommissioned my backup application?

As noted above, the DataDomain is considered a passive device and does not expire or delete data.  If a backup application is decommissioned, then it cannot expire data from the DataDomain and therefore the data will remain on the DataDomain.  It will then be necessary to remove the data manually. This will depend on the protocol used to write the data to the DataDomain, and whether there is valid data from other applications (or instances of the same application) present on the DataDomain.

  • If this is the only data present on the DD then the Mtree containing the data should be deleted (unless it is the /data/col1/backup Mtree which cannot be deleted), and the next clean cycle will free space from the deleted Mtree.  Another option would be to destroy and recreate the file system on the DataDomain (which would be the best choice if the only data is present in /data/col1/backup.)
  • If there is other, valid data on this DataDomain, but all the decommissioned data is within a Mtree which does not contain any valid data, this Mtree should be deleted. However, if the Mtree is /data/col1/backup it will not be possible to delete the Mtree.  Please refer to the section on manual deletion
  • If there is other, valid data on this DataDomain and it is within the same Mtree as the decommissioned data, then it is not possible to delete the Mtree.  Please refer to the section on manual deletion


How do I remove data which has become orphaned from the backup application which wrote it?

Data is considered orphaned when there is a mismatch between the backup application's catalog or database of files on the DataDomain, and the actual files on the DataDomain.  The backup application's catalog or database maps between these 2 points and will include information on when the backup was written, its retention, and filenames used.  There are a number of reasons why data may have become orphaned, the most common are:

  1. Data has been manipulated outside the backup application.  This could occur if the data location has been made available via NFS or CIFS; or if the internal, support use only bash shell has been used.
  2. Network issues at the moment when the backup application expires the data, meaning the backup is removed from the catalog or database, but not from the DataDomain.
  3. Outdated or incompatible plugin (for DDBOOST.)


The best option at this point is for the backup application to rescan the DataDomain, rebuilding its catalog or database.  This can be time consuming, but is safer than attempting to manually delete the orphaned data.  Please contact the support provider for your backup application.

If there is still orphaned data at this stage, then it may be necessary to manually remove the orphaned files.  Please refer to the section on manual deletion


How can DDBOOST data be manually deleted?

It is not recommended to delete DDBOOST data manually via CIFS or NFS. Please see the section on "Deleting customer selected data via bash"


How can VTL data be manually deleted?

This is outside the scope of this article.  Please contact your contracted support provider.


How can CIFS or NFS data be manually deleted?

If there is still data which must be removed manually after following the previous steps, then the following steps should be considered:

  • Data on the DataDomain is the customer's responsibility.  DataDomain support are not able to determine which files require deletion.  As noted earlier, the DataDomain is a passive device, and is not aware of retention policies.


If the data is accessed by CIFS or NFS then the customer should take the following steps to remove it:

1. Identify data.

  • The first step is to identify the data needing to be deleted.  For systems with under 1 million files, the "how to list files on the DD and import into Excel" KB (https://support.emc.com/kb/335150) will explain how to create an Excel spreadsheet with a list of all files on the DD, one per row.  Excel can be used to sort and filter the list as required.  For systems with more than 1 million files, it will be necessary to work on this on a per-Mtree data. If any individual Mtree which needs checking has greater than 1 million files, please contact your contracted support provider for advice.
  • It is strongly recommended that you work with your backup application providers's support organisation as there may be metadata files present which, if removed will prevent the application from accessing the remaining data.  Examples of metadata files include: .ddboost, nsr.dir, volhdr, .nsr_serial.


2. Access the required Mtree using CIFS/NFS

  • Ensure you are using the same method to access the data as the backup application.  Where possible, use the existing NFS/CIFS share to the existing Unix/Linux/Windows system. If this is not possible then it may be necessary to create a CIFS share/NFS export <link> and connect from a suitable Unix/Linux/Windows system. 
  • You should authenticate as the same user as the backup application uses, in order to have the same privileges, and hence ability to delete data.


3. Protect the data in case of mistakes

  • Create a snapshot for the Mtree(s) which will have data deleted, with an expiry time of 1 week. This will allow recovery in case of mistakes, within this time period.
  • For example, if only the Mtree /data/col1/foo is impacted, the following command can be used:

snapshot create snap-deletions mtree /data/col1/foo retention 1week

  • The following KB should be followed if it is necessary to recover any deleted data from this snapshot:
  • How to restore deleted files from a file system/mtree snapshot on a Data Domain Restorer  https://support.emc.com/kb/446523


4. Delete the selected data

  • If there are few files then manual deletion (using Windows Explorer, or the 'del' command from the command prompt, or the 'rm' command prompt from a Unix/Linux shell) is straightforward. 
  • If there are many files, create a script to delete by iterating through the list of files to avoid typing mistakes.  Check with Microsoft or your operating system's information to learn more about how to create and run scripts.
  • In either case it is strongly recommended to specify each file absolutely, preferably using the full path.  Avoid using wildcards to reduce the risk of accidentally removing a required file.


5. Check the system

  • You may wish to re-run step 1) identify data to ensure that all the unwanted data has been deleted, and none of the required data is missing.
  • Once you are confident, the snapshot created in step 3 can be expired.  NB: if this snapshot is not expired here, it will remain on the system until its expiry time, and may prevent space from being freed by cleaning.
  • The example snapshot could be expired with the following command:

snapshot expire snap-deletions mtree /data/col1/foo


6. Run cleaning

  • In non-critical situations, cleaning can be allowed to run when next scheduled.  If space is required urgently then cleaning can be started manually (either from the command line or the GUI.)


How does removing data differ if my system is configured with Extended Retention?

If data is deleted from the archive tier, then it is the Space Reclamation process rather than the Cleaning cycle (GC) which is responsible for freeing space.  Please refer to the following KB: Space Reclamation on retention tier for Data Domain systems with DD Extended Retention software https://support.emc.com/kb/335877


How does removing data differ if my system is configured with Retention-Lock?

If the Mtree where the data resides is configured with Retention-Lock Governance, then it will be necessary to revert the retention lock on either the individual files to be removed, or to their parent directory before action to expire or delete data is taken.

If the Mtree where the data resides is configured with Retention-Lock Compliance then it is not possible to manually remove data.

More information can be found in the Retention Lock (RL) FAQ: https://support.emc.com/kb/492081


Is it possible to tell how much space will be freed by cleaning if I remove a selected group of files?

No.  However, if the "how to list files on the DD and import into Excel" KB (https://support.emc.com/kb/335150) has been used then the pre-comp value of each file can be summed to determine the overall pre-comp size.  The post-comp size can only be estimated from the summed pre-comp values divided by the overall deduplication factor.


What is the data loss waiver form?

The waiver form is a PDF document which advises the customer that the action they have requested from DataDomain Support will result in the loss of some of their customer data.  This is a requirement of the Service Agreement, and therefore the waiver form must be signed by an authorized customer representative.  The waiver form describes the specific action requested by the customer, along with the details of the DataDomain device(s) affected. It is sent to the customer via email (preferably through the support case) and once signed, a copy is attached to the SR.


Can DataDomain support delete customer data without having a signed data loss waiver form describing the work to be carried out?

No, this is a requirement of the customer's Service Agreement.


Who should sign the customer section of the waiver form?

This needs to be an authorized customer representative.


Does the form need to be hand signed by the customer?

No.  The authorized customer representative can complete the form electronically (using standard Acrobat reader software) or they can email their approval as noted in section 4 of the form.


Is a second waiver form required if further work needs to be carried out?

Yes, unless it is an extra list of files to be deleted from the same DataDomain, the same TSE and the same day.  We encourage customers to ensure all files required to be deleted are listed ready for deletion in one session.


Is a waiver form required if the data to be deleted was written by a Dell-EMC application?

Yes, if the data needs to be deleted by DataDomain Support.  If the data can be deleted through the application then a waiver form is not required.


Can customers delete data via bash?

No, as the internal bash shell is used, which is only for support usage.