Month: September 2017

Some of Basic and Important vCenter Alerts.

vCenter 6 Comes with lot of default alerts and I will list the configuration of few of them which could be very handful.

In vSphere C# client when you select vCenter and go to Alarms tab it displays all the alarms vCenter has to offer.

You can also select individual vCenter object like a VM, Datastore, Host, Cluster, DVS, Portgroup etc and go to alarm definitions to find out what alarms are available for that particular object.

Example: test1 is a VM and datastore1 is a vmfs datastore

Alarms

Alarms_1

Best way to make use of these alerts vCenter is to send email notifications to the admin.

The vCenter needs to be configured for email notifications and below are the steps:

To configure an email notifications for an alarm:

  1. Log in to vCenter Server.
  2. Click the Administration tab and select vCenter Server Settings.
  3. Select the Mail option.
  4. For the SMTP Server option, enter the IP address or the DNS name of the email/exchange server to which the alert notification must be sent.
  5. For the Sender Account, enter the email address from which the alert must be sent.
  6. We will need the vCenter IP to be configured in SMTP relays of Exchange server.

Alarms_2

I have chosen below Alarms as examples, however you can configure the ones that might interest you.

Datastore Usage on disk

A very useful alerts that monitors the usage of a vmfs Datastore for its utilization. The defaults in the alarms are Warning on 75% Utilization and Alert on 85% Utilization. I am going to go with the same thresholds assuming they are VMware recommendations.

Alarms_3

Alarms_4Alarms_5

VM Has Snapshots

Another Important alarm that I use is a snapshot alarm, it’s a custom alarm and below are the steps to create. Right click on the blank side of the alarm definition page and click New Alarm.

In the trigger type select VM Snapshot Size.

Set the trigger conditions for Warning and Alert and put your email ID in reporting actions so you know if a snapshot created is exceeding a particular size. please note if you are created a VM Snapshot with .vmsn file(including memory) than this alert considers that in the total VM snapshot size.

Example is the warning is set to 10GB and you create snapshot with memory of a VM which has more than 10GB RAM, the alert triggers immediately.

Alarms_6Alarms_7

Some other default Alarms which are useful include the below and can be configured similarly.

  1. Virtual Machine Consolidation Needed status.
  2. VM CPU/Memory Usage
  3. vSphere HA Virtual Machine Monitoring Action
  4. ESXi host CPU/Memory Usage
  5. Host Connection and Power state
  6. Network Connectivity redundancy lost
  7. Network Connectivity lost
  8. Thin Provisioned Volume Capacity threshold exceeded

For any questions regarding the configurations of the same please comment on my post and I will get back to you.

For someone who does not have vRealize Operations Manager these alarms could be really helpful. In the future I will posting blogs regarding monitoring from vRealize Operations Manager.

Have a Nice Day…!

VDP Stuck in Admin State and All Backups failing due to imbalanced usage of data volumes

I recently came across this issue with my VDP appliance went to Admin state and all my backups would fails.

It’s a 6.1.4.30 (Major Version) version appliance with dedupe capacity of 546GB created using 3 * 256 GB drives.

This happened after more VMs were added to the backup Job. One of those VMs seemed to be the Symantec server and consumed lot of space in the dedupe store,

The Night the backup ran I got below alert from VDP:

The vSphere Data Protection storage is nearly full. You can free the space on the appliance by manually deleting the unnecessary or older backups, and modifying the retention policies of the backup jobs to shorten the backup retention time.
————————————————————————————–

Current VDP storage usage is 84.15%.

————————————————————————————–

The usage went on increasing and it said Current VDP storage usage is 96.92%. Obviously I should have looked into it then but I missed out. Resulting in the appliance going in ReadOnly State.

The vSphere Data Protection storage is full. The appliance runs in the read-only mode till additional space is made available. You can free the space on the appliance by manually deleting unnecessary or older backups only.

status.dpn

Wed Sep 20 12:52:30 GST 2017  [dxb02-vlp-vdp1.dib.ae] Wed Sep 20 08:52:30 2017 UTC (Initialized Wed Apr 19 09:23:11 2017 UTC)

Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen  %Full   Percent Full and Stripe Status by Disk

0.0  172.22.250.109 7.2.80-129  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.63 5563  3463473  47.3%  47%(onl:574) 47%(onl:572) 47%(onl:572)

Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

System ID: 1492593791@00:50:56:AA:50:57

All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0hpu)

System-Status: ok

Access-Status: Admin

From the log: /space/avamar/var/mc/server_log/mcserver.log.0

avtar Info <17844>: – Server is in read-only mode due to Diskfull

avtar Info <17972>: – Server is in Read-only mode.

dpnctl status

dpnctl: INFO: gsan status: degraded

The 1st step I did was to manually delete high capacity old restore points for some clients to free up space. The appliance dedupe capacity came down to 82% and I was expecting that the backup jobs would run that night after letting the garbage collection run during the scheduled maintenance window.

The Appliance did not return to full Access state even when the old backups were deleted and the dedupe usage was decreased to 67.5%

Output for df –h shows my data volumes data01,data02 and data03 are not used proportionately.

admin@dxb02-vlp-vdp1:~/>: df -h

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda2        32G  6.1G   24G  21% /

udev            2.9G  148K  2.9G   1% /dev

tmpfs           2.9G     0  2.9G   0% /dev/shm

/dev/sda1       128M   37M   85M  31% /boot

/dev/sda7       1.5G  187M  1.2G  14% /var

/dev/sda9       138G  7.9G  123G   7% /space

/dev/sdb1       256G  173G   84G  68% /data01

/dev/sdc1       256G  170G   86G  67% /data02

/dev/sdd1       256G  143G  114G  56% /data03

To fix this set the freespaceunbalance value to a higher percentage depending upon the difference in usage you notice.

Steps:

  • Stop maintenance mode “avmaint sched stop –ava”
  • create checkpoint “avmaint checkpoint –ava” – Allow it to finish before moving ahead
  • Perform rolling integrity check “avmaint hfscheck –rolling –ava” – Allow it to finish before moving ahead.
  • Verify the checkpoint using “cplist”
  • Create another checkpoint “avmaint checkpoint –ava”
  • Check the current percentage utilization of each data volume.
  • If the difference is more than 10% (default value) run the below command.
  • “avmaint config –ava freespaceunbalance=20”
  • Check the status again using status.dpn
  • Start the maintenance mode “avmaint sched start –ava”

If the appliance status does not switch to Full-Access mode after changing the value increase it to 30% or more accordingly.

Let the appliance go through the maintenance window. Once you configure the more backups it is expected for the dedupe space to be utilized equally from all data volumes.

If the issue persists, open a ticket with VMware Support.

Happy Troublehsooting