VDP Stuck in Admin State and All Backups failing due to imbalanced usage of data volumes

I recently came across this issue with my VDP appliance went to Admin state and all my backups would fails.

It’s a 6.1.4.30 (Major Version) version appliance with dedupe capacity of 546GB created using 3 * 256 GB drives.

This happened after more VMs were added to the backup Job. One of those VMs seemed to be the Symantec server and consumed lot of space in the dedupe store,

The Night the backup ran I got below alert from VDP:

The vSphere Data Protection storage is nearly full. You can free the space on the appliance by manually deleting the unnecessary or older backups, and modifying the retention policies of the backup jobs to shorten the backup retention time.
————————————————————————————–

Current VDP storage usage is 84.15%.

————————————————————————————–

The usage went on increasing and it said Current VDP storage usage is 96.92%. Obviously I should have looked into it then but I missed out. Resulting in the appliance going in ReadOnly State.

The vSphere Data Protection storage is full. The appliance runs in the read-only mode till additional space is made available. You can free the space on the appliance by manually deleting unnecessary or older backups only.

status.dpn

Wed Sep 20 12:52:30 GST 2017  [dxb02-vlp-vdp1.dib.ae] Wed Sep 20 08:52:30 2017 UTC (Initialized Wed Apr 19 09:23:11 2017 UTC)

Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen  %Full   Percent Full and Stripe Status by Disk

0.0  172.22.250.109 7.2.80-129  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.63 5563  3463473  47.3%  47%(onl:574) 47%(onl:572) 47%(onl:572)

Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

System ID: 1492593791@00:50:56:AA:50:57

All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0hpu)

System-Status: ok

Access-Status: Admin

From the log: /space/avamar/var/mc/server_log/mcserver.log.0

avtar Info <17844>: – Server is in read-only mode due to Diskfull

avtar Info <17972>: – Server is in Read-only mode.

dpnctl status

dpnctl: INFO: gsan status: degraded

The 1st step I did was to manually delete high capacity old restore points for some clients to free up space. The appliance dedupe capacity came down to 82% and I was expecting that the backup jobs would run that night after letting the garbage collection run during the scheduled maintenance window.

The Appliance did not return to full Access state even when the old backups were deleted and the dedupe usage was decreased to 67.5%

Output for df –h shows my data volumes data01,data02 and data03 are not used proportionately.

admin@dxb02-vlp-vdp1:~/>: df -h

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda2        32G  6.1G   24G  21% /

udev            2.9G  148K  2.9G   1% /dev

tmpfs           2.9G     0  2.9G   0% /dev/shm

/dev/sda1       128M   37M   85M  31% /boot

/dev/sda7       1.5G  187M  1.2G  14% /var

/dev/sda9       138G  7.9G  123G   7% /space

/dev/sdb1       256G  173G   84G  68% /data01

/dev/sdc1       256G  170G   86G  67% /data02

/dev/sdd1       256G  143G  114G  56% /data03

To fix this set the freespaceunbalance value to a higher percentage depending upon the difference in usage you notice.

Steps:

  • Stop maintenance mode “avmaint sched stop –ava”
  • create checkpoint “avmaint checkpoint –ava” – Allow it to finish before moving ahead
  • Perform rolling integrity check “avmaint hfscheck –rolling –ava” – Allow it to finish before moving ahead.
  • Verify the checkpoint using “cplist”
  • Create another checkpoint “avmaint checkpoint –ava”
  • Check the current percentage utilization of each data volume.
  • If the difference is more than 10% (default value) run the below command.
  • “avmaint config –ava freespaceunbalance=20”
  • Check the status again using status.dpn
  • Start the maintenance mode “avmaint sched start –ava”

If the appliance status does not switch to Full-Access mode after changing the value increase it to 30% or more accordingly.

Let the appliance go through the maintenance window. Once you configure the more backups it is expected for the dedupe space to be utilized equally from all data volumes.

If the issue persists, open a ticket with VMware Support.

Happy Troublehsooting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s