There are various factors to consider when addressing capacity issues on an Avamar system. The most important distinction when dealing with capacity issues is to determine why a capacity issue has occurred. The first step is to collect data to enable this investigation.
There are several types of capacity limit in Avamar. Understanding the history of these helps understand the current and historical capacity issues which the system has experienced.
When t
he following thresholds are crossed, an event, warning, or error is generated in the UI.
- 80% - Capacity Warning
- 95% - Health Check Limit is reached
- 100% - Server Read-Only Limit is reached (grid goes to admin mode)
A full Avamar system can present the following symptoms or errors:
- Garbage collection has failed with MSG_ERR_DISKFULL or MSG_ERR_STRIPECREATE.
- Checkpoints fail with MSG_ERR_DISKFULL
- Backups cannot run or run and fail, due to full capacity.
- Backups fail with either MSG_ERR_STRIPECREATE or messages saying that the target server is full.
- The access state changes to admin mode (and maintenance is not running when this happens).
- The backup scheduler is disabled and cannot be resumed due to metadata capacity limits
Data to collect
Log in to the Avamar Utility Node and run all the following commands. These only collect information and do not apply any changes:
1. If not already known, this will provide you the Avamar server full name or FQDN (Fully Qualified Domain Name)
2. Verify all services are enabled, including the maintenance scheduler
3. The overall system state
4. Run the capacity
.sh script to collect 60 days worth of data and the top 10 contributing clients.
capacity.sh --days=60 --top=10
5. Logs showing basic garbage collection behavior over the last 30 days.
dumpmaintlogs --types=gc --days=30 | grep "4202"
6. The amount of data that was removed by garbage collection, how many passes it completed and for how long it ran.
For Avamar v5.x and v6.x, run:
dumpmaintlogs --types=gc --days=30 | grep passes | cut -d ' ' -f1,12,13,15
For Avamar v7.x, instead run:
dumpmaintlogs --types=gc --days=30 | grep passes | cut -d ' ' -f1,10,14,15,17
umpmaintlogs --types=gc --days=30 | grep passes | cut -d ' ' -f1,10,14,15,17
7.
Check how long hfscheck runs for:
dumpmaintlogs --types=hfscheck --days=30 | grep -i elapsed|cut -d ' ' -f1,12 | grep -v check
8. Details of file system Capacity usage per node and per partition:
avmaint nodelist | egrep 'nodetag|fs-percent-full'
umpmaintlogs --types=hfscheck --days=30 | grep -i elapsed|cut -d ' ' -f1,12 | grep -v check
9. A list of checkpoints available on the system
10. Maintenance job scheduled start/stop times:
avmaint sched status --ava | egrep -A 2 "maintenance-window|backup-window" | tail -16
11. Collect all disk settings:
avmaint config --ava | egrep -i 'disk|crunching|balance'
Never change values unless advised by an Avamar Subject Matter expert. Non-default values might be in place for a good reason. Understand the situation thoroughly.
12. Collect counts of different types of stripes per node per data partition:
avmaint nodelist --xmlperline=99 | grep 'comp='
13.
Check the amount of memory (and swap) in use on each node