Capacity planning using vRealize Operations Manager Version 6.x.

Difference between Demand based Model and Allocation Based Model

In this article I share few points which might help one get more understanding on these capacity planning models.

The Demand Model element:

If you adopt a demand based policy, the capacity planning engine will only consider the   resources that are actually being used, or demanded, as consumed. So, if your 4x vCPU, 8 GB RAM, 40 GB disk VM is only using 2 vCPU, 5 GB RAM, and 20 GB disk space, the remaining resources are reported as having Some capacity still available.

The Allocation Model element:

Determines how vRealize Operations Manager calculates capacity when you allocate a specific amount of CPU, Memory and storage resource to clusters.

As allocation calculations here can be done with certain ratio or percentage of actual physical resources.

For CPU its vCPU to pCPU ratio
For Memory it is over commitment
For Storage it is thin provisioning

Questions like which policy is Ideal for Me? What is the ideal vCPU/pCPU ratio if I am using allocation based model? Is memory over commitment recommended for production environment? etc are all design based and the answers differ from environment to environment.

Customer has to work out a vCPU/pCPU ratio that fits his environment. The ideal approach would be to start with a big ratio like 8:1 and then monitor the stats. If stress comes into picture then the ratio can be decreased to something like 5:1 and so on.

Example showing difference in capacity remaining with Allocation and demand models

pic1

Below are 4 default vRealize Management policies available for customers to do capacity planning although a custom policy can be created by customer starting from scratch.

VMware Production Policy (Demand only)
Optimized for production loads, without using allocation limits, to obtain the most capacity.

VMware Production Policy (with Allocation)
Optimized for production loads that require the demand and allocation capacity models.
No over commitment for Mem and storage. Over commitment available in CPU.

VMware Production Policy (without Allocation)
Optimized for production loads that require demand capacity models, and provides the highest overcommit without contention.
Over commitment available in CPU, Mem and Storage

VMware Test and Dev Policy (without Allocation)
Optimized for Dev and Test environments to maximize capacity without causing significant contention, because it does not include capacity planning at the virtual machine level.
The key deference in these policies comes to whether we allow over commitment or not.
And if we allow over commitment then is it allowed for CPU, Memory or Storage or for all.

A breakdown of each policy capacity analysis can be seen to decide which metrics we need to include to further tweak the policy to our needs. The accuracy and value of capacity management views and reports relies on an administrator’s ability to correctly translate their environmental requirements into policies.

To navigate to these policies and to check/edit the matrices covered.

pic2

pic4

pic3

pic5

Spend time and dig deep… this could be a onetime investment with lot of returns for vSphere Admin..

Setup vRealize Log Insight- Deployment & Integration

This article is about setting up vRealize Log Insight and integrating it with vSphere and vRealize Operations Manager.
Log Insight provides powerful real-time log management for VMware Environment with machine learning based intelligent grouping and faster search. This allows for swift troubleshooting and better analytics across physical and Virtual Environments.

It’s only a matter of exploring the options the tool gives for log analysis and you will enjoy using it.

The deployment can be broken down into below steps.

  • Pre-requisites
  • Deploy the Appliance
  • Configuration
  • Integration with vSphere and vROPS

Pre-Requisites

Log Insight accepts data from sources(Virtual/Physical/Cloud) which use syslog protocol, sources that write logs and can run the vRealize log insight Agent and sources that can post data with HTTP/HTTPS through the REST API.

Ports for syslog feeds: 514(UDP), 514(TCP), 1514(TCP) SSL.

log1

The appliance can be deployed in 4 configurations based on the sizing requirements. The configuration decide the amount of compute and storage resources the appliance requires.

log2.jpg

The below link for sizing calculator will help you determine sizing

http://www.vmware.com/go/loginsight/calculator

Deploying the Appliance

Login to the vSphere Web Client once the .OVA is available

Deploy OVF Template from vCenter and select Local file.

Browse and Select the .ova file

Click Next

Name the appliance and select location

Click Next

Select Resource and Review Details

OVF_1

Click Next

OVF_2

Accept the EULA and click Next

OVF_3

Choose the Configuration and Click Next

OVF_4

Select Storage and click Next

OVF_5

Select Network and click Next

OVF_6

OVF_7

Provide Network and Other Properties

OVF_8

Review and Click Finish.

Configuration

The initial configuration is available after the appliance deployment

https://vRealizeLogInsight.hostname

Welcome screen Click Next

Config_1

Click Start New Deployment unless you are deploying this as a second node to an existing Log Insight to make a cluster(HA)

Config_2

Provide email ID and admin password

Config_3

Provide License Key.

Config_4

 

Admin email for Notifications

Config_5

SMTP configurations

Config_6

Finish

Config_7

Integration with vSphere and vRealize Operations Manager

Login to https://vRealizeLogInsight.hostname

Go to Administration.

Click vSphere under Integration

vSphere2

Provide vCenter FQDN and Credentials.

vSphere3

It is recommended to create a Custom User for Integration with below privileges on vCenter root.

  • Configuration.Change settings
  • Configuration.Network configuration
  • Configuration.Advanced settings
  • Configuration.Security profile and firewall

Once configured you will notice the Syslog setting and firewall configuration done on the ESXi hosts.

vSphere4

Go back to the Administration page and click vRealize Operations under Integration.

vrops1

This allows you to access the Log Insight dashboard from the vRealize Operations Manager page.

vrops2

I would also suggest you to go through the below link for a quick go through on Searching and Filtering event. You will need this coz there will lot of events if it’s a big infrastructure and you’ll have to find what you are looking for.

https://docs.vmware.com/en/vRealize-Log-Insight/4.5/com.vmware.log-insight.user.doc/GUID-142258C3-B056-4D82-BD34-8E1A2E7A5093.html

vRealize Operations Manager Custom Views and Reports

 

I have been using vRealize Operations Manager for quite some time now. There is huge amount of data available to the admin in the form of views, reports, heat maps etc. to have a very clear picture of the vSphere Environment.
As much as I like the solution, I felt the need for some custom views/reports for my own ease.
Also, it’s nice to make use of such options and utilize Enterprise license features.

I recently created a custom view and report to monitor snapshot usage and age.

I will continue to create custom view/reports and update this page accordingly.

Custom Report for Snapshot size of the VMs

Every report is based on a view. If you want to create a custom report you can either use one or more of the many existing views in vROPs or create a custom view for yourself.

A View can be defined as a container that takes Data from a Subject and Presents it in a format desired by the admin.

In the below view created for snapshot usage the above terms are explained.

Subject is the entity for which the data is required. It can be a VM, ESXi host, Datastore or any other vSphere inventory object.

Data is the Metrics and Properties of the object that you are expecting. It can be IO, Memory Usage, VMtools status, Packet drop or any statistics of an object that can be measured.

Custom View

As an example if you select VM as a subject you have 2 types of data available

  • Metrics: If you select this you can list resource usage like CPU, Memory, disk space etc.
  • Properties: If you select this you can list Properties of the VM like Memory reservation, VMtools status, Policy, custom tag etc.

In the view for snapshot usage select the subject as Virtual Machine

In Data, Select Metrics and Select Snapshot Space (GB) under Disk Space. Add a filter for Snapshot Space greater than Zero to list only VMs which has snapshots. Else the view would list all the VMs with snapshot size as 0 GB.
To include snapshot age, swap to properties under Data and select Snapshot Age.

Snapshot Size

Snapshot Age

This view now lists all the VMs which have snapshot size greater than Zero and mention snapshot age.

We can use this view to create a Custom report which can be emailed to the admin using a schedule.

vRealize Operations Manager upgrade

I recently upgraded vRealize Operations Manager from 6.0.1 to 6.6.1. Although the steps are pretty simple, I thought I would share them.

The upgrade was done in 2 Batches which is the required procedure as per the linked VMware KB below.

Stage 1: Upgrade form 6.0.1 to 6.3.1
Stage 2: Upgrade from 6.3.1 to 6.6.1

As I am using the vROPs Appliance I downloaded below files.

vRealize_Operations_Manager-VA-OS-6.3.1.5571307.pak
vRealize_Operations_Manager-VA-6.3.1.5571307.pak

vRealize_Operations_Manager-VA-OS-6.6.1.6163041.pak
vRealize_Operations_Manager-VA-6.6.1.6163041.pak

It is recommended to take a VM snapshot with memory and avoid Quiesing before you begin.

Take the cluster offline and proceed with upgrade, as long as the cluster is offline there will be no data collection. You might want to export any Custom policies, alerts etc. before you upgrade so that you can import them back after the upgrade has been successful.

Each Stage has 2 parts.

  • Operating System Upgrade
  • Upgrade Virtual Appliance

The 1st file to be uploaded is the OS.pak file, if the file upload reports that the file has already been installed you can check the option install the PAK file even if it is already installed.

It is very critical that we apply the OS upgrade first and then the appliance upgrade.
In case the Appliance upgrade is done 1st you will have to revert the snapshot or follow the below KB.

http://kb.vmware.com/kb/2127135

The upgrade is performed from software update from the admin console.

https://<vrops_IP>/admin

pic1

pic2

pic4

pic6

The appliance reboots multiple times throughout the whole upgrade process.

pic7

pic8

After upgrade is successful, login to the user interface and explore the new dashboards and other features.

Among multiple new features and the new look and feel of the product, you will realize that the plugin from vSphere Web client which redirected to the vRealize operations for a vSphere object has disappeared.

It’s something which has been removed in vRealize Operations Manager 6.6

Check the release notes below:

https://docs.vmware.com/en/vRealize-Operations-Manager/6.6/rn/vrops-66-release-notes.html

vCenter Mob after upgrade(reporting old version)

pic9

Follow the article to remove the plugin from vCenter MOB

https://kb.vmware.com/s/article/2150394 

Hopefully VMware will put the feature back, as it was quite handy…

Replace vRealize Operations Manager SSL certificates.

This article shares the steps performed to replace/configure custom (CA signed) SSL certificate for vRealize Operations Manager. The replaced certificates are only for the web UI component of the solution and will be used to secure the communication with clients over the user interface.

I have vRealize 6.6.1 build 6163035 in my lab.

Below are the requirements for setting up custom SSL certificates for vRealize Operations manager.

The requirements along with the certificate replacement link can be found in the vRealize Operations Manager admin console.

https://<vRealizeOperationsManagerIO>/admin

installpage

There are 2 steps in the procedure

  1. Getting the SSL certs created from openSSL and CA.
  2. Installing the generated pem file.

OpenSSL version Win32OpenSSL_Light-1_1_0g used.

  • Create a folder to place all your Certificate related files.

“C:\Users\hussain\Desktop\vsom”

  • Create the config file as below(Change the req_distinguished_name and  v3_req as per your appliance.

Openssl.cfg file:

 

 [ req ]
default_bits = 2048
default_keyfile = rui.key
distinguished_name = req_distinguished_name
encrypt_key = no
prompt = no
string_mask = nombstr
req_extensions = v3_req
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = digitalSignature, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = DNS:vsom.mylab.local, DNS:192.168.1.111, DNS:vsom
[ req_distinguished_name ]
countryName = AE
stateOrProvinceName = DXB
localityName = DUBAI
0.organizationName = mylab.local
organizationalUnitName = IT
commonName =vsom.mylab.local
  • openssl req -new -nodes -out  C:\Users\hussain\Desktop\vsom\vsom.csr -keyout C:\Users\hussain\Desktop\vsom\vsom-orig.key -config C:\Users\hussain\Desktop\vsom\openssl.cfg
  • openssl rsa -in C:\Users\hussain\Desktop\vsom\vsom-orig.key -out C:\Users\hussain\Desktop\vsom\vsom.key
  • Submit the generated csr to the CA and download the certificate in Base-64 format.
  • Download the root certificate of the CA.
  • Generate the .pem file containing the .cer, the root certificate and private key.
    type C:\Users\hussain\Desktop\vsom\vsom.cer C:\Users\hussain\Desktop\vsom\vsom.key C:\Users\hussain\Desktop\vsom\root.cer > vsom.pem
  • The generated .pem file looks something like this

pem

2) Install the generated pem file using the admin UI.

cert

Although the process is pretty straightforward I encountered an error due to incorrect time configuration of the vROPS appliance. once the date and time of the appliance were correctly set, the certificate could be replaced.

error: Certificate is not yet valid

error

helpful links:

https://kb.vmware.com/s/article/2046591 

https://kb.vmware.com/s/article/2108686 

 

 

VMFS locking and ATS Miscompare issues

Brief about VAAI

By now, almost all the VMware techies are aware of what VAAI is, however I will add a little brief for those who are not.

VMware vSphere Storage APIs – Array Integration (VAAI), also referred to as hardware acceleration or hardware offload APIs, are a set of APIs to enable communication between VMware vSphere ESXi hosts and storage devices. The APIs define a set of “storage primitives” that enable the ESXi host to offload certain storage tasks like cloning, zeroing to the storage array and improve performance.

The goal of VAAI is to help storage vendors provide hardware assistance to speed up VMware I/O operations that are more efficiently accomplished in the storage hardware.

Most of the new arrays (FC/iSCSI/NAS) which support vSphere 5 and later usually support VAAI also (vSphere APIs for Array integration).However this can be verified from VMware HCL.

https://vmware.com/go/hcl

Listing the fundamental operations are controlled by these advanced settings:

Advanced Parameter name Description
HardwareAcceleratedLocking Atomic Test & Set (ATS), which is used during creation of files on the VMFS volume
HardwareAcceleratedMove Clone Blocks/Full Copy/XCOPY, which is used to copy data
HardwareAcceleratedInit Zero Blocks/Write Same, which is used to zero-out disk regions

 

This is a list of commonly used SCSI opcodes related to VAAI operations:

0x93 WRITE SAME(16)
0x41 WRITE SAME(10)
0x42 UNMAP
0x89 SCSI COMPARE and WRITE – ATS
0x83 EXTENDED COPY

 

NOTE: Check the below link for more info on VAAI

https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-integration-10337.html

In a shared storage environment, when multiple hosts access the same cluster filesystem (VMFS datastore), specific locking mechanisms are used. These locking mechanism prevent multiple hosts from concurrently writing to the metadata and ensure no data corruption.

VMFS supports SCSI Reservations and Atomic Test and Set (ATS) locking.

ATS

ATS is a lock mechanism designed to replace SCSI reservations. With the amount of SCSI reservation conflict issues on older versions of ESX/ESXi this was a much needed feature.

ATS modifies only a disk sector on the VMFS volume whereas a SCSI reservation locks the whole LUN. When successful, it enables an ESXi host to perform a metadata update on the volume. This includes allocating space to a VMDK during provisioning, because certain characteristics must be updated in the metadata to reflect the new size of the file. The introduction of ATS addresses the contention issues with SCSI reservations and enables VMFS volumes to scale to much larger sizes. ATS has the concept of a test-image and set-image. So long as the image on-disk is as expected during a “compare”, the host knows that it can continue to update the lock.

A change in the VMFS heartbeat update method was introduced in ESXi 5.5 Update 2, to help optimize the VMFS heartbeat process which meant there was a significant increase in the volume of ATS commands the ESXi kernel issued resulting in increased load on the storage system. Under certain circumstances, VMFS heartbeat using ATS may fail with ATS miscompare which causes the ESXi kernel to again verify its access to VMFS datastores.

In this case, a heartbeat I/O (1) got timed-out and VMFS aborted that I/O, but before aborting the I/O, the I/O (ATS “set”) actually made it to the disk. VMFS next re-tried the ATS using the original “test-image” in step (1) since the previous one was aborted, and the assumption was that the ATS didn’t make it to the disk. Since the ATS “set” made it to the disk before the abort, the ATS “test” meant that the in-memory and on-disk images no longer matched, so the array returned “ATS miscompare”. When an ATS miscompare is received, all outstanding IO is aborted with host sense 8 (H:0x8 SCSI reset). This led to additional stress and load being placed on the storage arrays and degraded performance.

There are some EMC and IBM arrays which have known issues and recommend disabling the functionality (Use of ATS for vmfs Heartbeat) on all hosts accessing the set of LUNs.

https://kb.vmware.com/s/article/2113956
Sample log messages in the vmkernel logs of ESXi hosts and events in vCenter confirming the issue.

In the /var/run/log/vobd.log file and Virtual Center Events, you see the VOB message:

Lost access to volume <uuid><volume name> due to connectivity issues. Recovery attempt is in progress and the outcome will be reported shortly

In the /var/run/log/vmkernel.log file, you see the message:

ATS Miscompare detected between test and set HB images at offset XXX on vol YYY

In the /var/log/vmkernel.log file, you see similar error messages indicating an ATS miscompare:

2015-11-20T22:12:47.194Z cpu13:33467)ScsiDeviceIO: 2645: Cmd(0x439dd0d7c400) 0x89, CmdSN 0x2f3dd6 from world 3937473 to dev “naa.50002ac0049412fa” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0.

Disable ATS on vmfs5 and vmfs6 datastores:

esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5

Disable ATS on vmfs3 datastores:

esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS3

To review the results of changing options, run this command:

esxcli system settings advanced list -o /VMFS3/UseATSForHBonVMFS3
esxcli system settings advanced list -o /VMFS3/UseATSForHBonVMFS5
You see output similar to:
Path: /VMFS3/UseATSForHBOnVMFS3
Type: integer
Int Value: 0 <— Check this value
Default Int Value: 0
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Use ATS for HB on ATS supported VMFS3 volumes

Some additional helpful links:

https://kb.vmware.com/s/article/2146451

https://storagehub.vmware.com/#!/vsphere-storage/vmware-vsphere-apis-array-integration-vaai-1/atomic-test-set-ats/1

https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.storage.doc/GUID-DE30AAE3-72ED-43BF-95B3-A2B885A713DB.html

https://kb.vmware.com/s/article/52486 

Options to copy files from VM Guest to Client PC or vice versa

Not many of us use the clipboard option available in vSphere client to copy and paste files across VMs.

I just wanted to share couple of options that can be used other than normal RDP or SMB.

  1. Enable clipboard copy in vSphere Client
  • Log in to a vCenter Server system using the vSphere Client and power off the virtual machine.
  • Select the virtual machine and click the Summary tab.
  • Click Edit Settings.
  • Navigate to Options > Advanced > General and click Configuration Parameters.
  • Click Add Row.
  • Type these values in the Name and Value columns:

 

isolation.tools.copy.disable        FALSE
isolation.tools.paste.disable       FALSE

To enable the same across all VMs on a hosts.

  • Log in to the ESX/ESXi host as a root user.
  • Take a backup of the /etc/vmware/config file.
  • Open the /etc/vmware/config file using a text editor.
  • Add these entries to the file:

                vmx.fullpath = “/bin/vmx”
isolation.tools.copy.disable=”FALSE”
isolation.tools.paste.disable=”FALSE”

  • Save and Close the file.

The VM has be restarted for the changes to take effect.

2) Copy-VMGuestFile

This can used with powercli (32 bit) and it’s very helpful if you want to move some file from across VMs.

Especially for the ones which are in DMZ and Clipboard or RDP is disabled for security reasons.

Copy-VMGuestFile -Source c:\test.txt -Destination c:\temp\ -VM myVM -GuestToLocal -HostUser root -HostPassword pass1 -GuestUser user -GuestPassword pass2

Make sure you are connected to vCenter using the FQDN.

The Copy-VMGuestFile uses the vSphere API ad uses the below procedure.

CopyVMGuestFIle2

  1. A vSphere Web services client program calls a function in the vSphere API.
  2. The client sends a SOAP command over https (port 443) to vCenter Server.
  3. The vCenter Server passes the command to the host agent process hostd, which sends it to VMX.
  4. VMX relays the command to VMware Tools in the guest.
  5. VMware Tools has the Guest OS execute the guest operation.

The same cmdlet can be used to copy files from clients to VM.

Copy-VMGuestFile -VM myVM -LocalToGuest -Source “C:\temp\test.txt” -Destination “c:\test” -GuestUser “Guestaccoutdetails” -GuestPassword “passowrd”

RHEL Clustering in vSphere with fence_vmware_soap as fencing device.

This article is only a simple guide for Configuring RHCS in vSphere 6 using shared disks and VMware fencing using the fence_vmware_soap device in Red hat Cluster Suite.

#The cluster node VMs can use RDM’s (Physical or Virtual) or they can use shared vmdk’s with multi-writer option enabled for the scsi drives.

Below is my Cluster node configuration:

  • RHEL6
  • 4vCPU and 12 GB RAM
  • 80 GB Thick Provisioned Lazy Zeroed disk for OS.
  • 5GB Shared Quorum on Second SCSI controller shared physical (scsi1:0 = “multi-writer” scsi1:1 = “multi-writer”)
  • 100GB shared Data drive for Application
  • Single Network card

The after creating the node 1 cln1.mylab.local, I cloned the machine cln2.mylab.local, assigned a new IP and added the shared resources.

Added the Quorum drive and Data drive on node 1 and the add the same drives on node 2 by using add existing harddisk option in vCenter

As I wanted to keep my nodes on separate physical servers (ESXi hosts) to provide hardware failure resiliency, I had to use physical bus sharing. As my Quorum and data drive are not shared SAN LUN but vmdks I have to enable scsi multi-writer in vmx advanced configuration for my scsi nodes. This is because although vmfs is clustered file system at any given point generally a vmdk can only be accessed by one powered on VM.

Also make sure ctk is disabled for the VM. Check the linked KB.

http://kb.vmware.com/kb/2110452

From the conga GUI in RHEL follow the instructions mentioned in the below KBs to create a cluster, add the cluster nodes and add VMware fence device. The article would make a lot more sense once you go through the Redhat KBs

For someone who is new to fencing below explanation from Redhat is Awesome.

A key aspect of Red Hat cluster design is that a system must be configured with at least one fencing device to ensure that the services that the cluster provides remain available when a node in the cluster encounters a problem. Fencing is the mechanism that the cluster uses to resolve issues and failures that occur. When you design your cluster services to take advantage of fencing, you can ensure that a problematic cluster node will be cut off quickly and the remaining nodes in the cluster can take over those services, making for a more resilient and stable cluster.

After the cluster creation, GFS has to be created to use the shared storage for the cluster. RHEL cluster is a mandatory requirement for creating clustered file system called the GFS.

https://access.redhat.com/solutions/63671

https://access.redhat.com/node/68064

cluster1

Run the clustat command to verify the cluster creation.

Configure the nodes by adding VM node details, Guest name and UUID

cluster2

cluster3

In the shared fence device option under the cluster TAB please provide vCenter server details and the account details used for fencing.

Hostname: DNS name of your vCenter

Login: fencing account created, I preferred to create a domain account (fence@mylab.local) as my vCenter is Windows and provide Specific permissions to the domain account.

A vCenter role dedicated for the fencing task was created and assigned to “fence@mylab.local” user. The role requires permission to perform VM power operations.

cluster4

Run the below command to list the guest name and UUID.

fence_vmware_soap -z -l “fence@mylab.local” -p mypasswd -a vcenter.mylab.local -o list

cln1.mylab.local, 5453d1874-b34f-711d-4167-3d9ty3f24647

cln2.mylab.local, 5643b341-39fc-1383-5e6d-3a71re4c540d

The cluster is now to be tested. And you encounter any issues with a particular node you can expect the fencing device to shut it down to avoid any issues.

Leave a comment if you have queries….

Some of Basic and Important vCenter Alerts.

vCenter 6 Comes with lot of default alerts and I will list the configuration of few of them which could be very handful.

In vSphere C# client when you select vCenter and go to Alarms tab it displays all the alarms vCenter has to offer.

You can also select individual vCenter object like a VM, Datastore, Host, Cluster, DVS, Portgroup etc and go to alarm definitions to find out what alarms are available for that particular object.

Example: test1 is a VM and datastore1 is a vmfs datastore

Alarms

Alarms_1

Best way to make use of these alerts vCenter is to send email notifications to the admin.

The vCenter needs to be configured for email notifications and below are the steps:

To configure an email notifications for an alarm:

  1. Log in to vCenter Server.
  2. Click the Administration tab and select vCenter Server Settings.
  3. Select the Mail option.
  4. For the SMTP Server option, enter the IP address or the DNS name of the email/exchange server to which the alert notification must be sent.
  5. For the Sender Account, enter the email address from which the alert must be sent.
  6. We will need the vCenter IP to be configured in SMTP relays of Exchange server.

Alarms_2

I have chosen below Alarms as examples, however you can configure the ones that might interest you.

Datastore Usage on disk

A very useful alerts that monitors the usage of a vmfs Datastore for its utilization. The defaults in the alarms are Warning on 75% Utilization and Alert on 85% Utilization. I am going to go with the same thresholds assuming they are VMware recommendations.

Alarms_3

Alarms_4Alarms_5

VM Has Snapshots

Another Important alarm that I use is a snapshot alarm, it’s a custom alarm and below are the steps to create. Right click on the blank side of the alarm definition page and click New Alarm.

In the trigger type select VM Snapshot Size.

Set the trigger conditions for Warning and Alert and put your email ID in reporting actions so you know if a snapshot created is exceeding a particular size. please note if you are created a VM Snapshot with .vmsn file(including memory) than this alert considers that in the total VM snapshot size.

Example is the warning is set to 10GB and you create snapshot with memory of a VM which has more than 10GB RAM, the alert triggers immediately.

Alarms_6Alarms_7

Some other default Alarms which are useful include the below and can be configured similarly.

  1. Virtual Machine Consolidation Needed status.
  2. VM CPU/Memory Usage
  3. vSphere HA Virtual Machine Monitoring Action
  4. ESXi host CPU/Memory Usage
  5. Host Connection and Power state
  6. Network Connectivity redundancy lost
  7. Network Connectivity lost
  8. Thin Provisioned Volume Capacity threshold exceeded

For any questions regarding the configurations of the same please comment on my post and I will get back to you.

For someone who does not have vRealize Operations Manager these alarms could be really helpful. In the future I will posting blogs regarding monitoring from vRealize Operations Manager.

Have a Nice Day…!

VDP Stuck in Admin State and All Backups failing due to imbalanced usage of data volumes

I recently came across this issue with my VDP appliance went to Admin state and all my backups would fails.

It’s a 6.1.4.30 (Major Version) version appliance with dedupe capacity of 546GB created using 3 * 256 GB drives.

This happened after more VMs were added to the backup Job. One of those VMs seemed to be the Symantec server and consumed lot of space in the dedupe store,

The Night the backup ran I got below alert from VDP:

The vSphere Data Protection storage is nearly full. You can free the space on the appliance by manually deleting the unnecessary or older backups, and modifying the retention policies of the backup jobs to shorten the backup retention time.
————————————————————————————–

Current VDP storage usage is 84.15%.

————————————————————————————–

The usage went on increasing and it said Current VDP storage usage is 96.92%. Obviously I should have looked into it then but I missed out. Resulting in the appliance going in ReadOnly State.

The vSphere Data Protection storage is full. The appliance runs in the read-only mode till additional space is made available. You can free the space on the appliance by manually deleting unnecessary or older backups only.

status.dpn

Wed Sep 20 12:52:30 GST 2017  [dxb02-vlp-vdp1.dib.ae] Wed Sep 20 08:52:30 2017 UTC (Initialized Wed Apr 19 09:23:11 2017 UTC)

Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen  %Full   Percent Full and Stripe Status by Disk

0.0  172.22.250.109 7.2.80-129  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.63 5563  3463473  47.3%  47%(onl:574) 47%(onl:572) 47%(onl:572)

Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

System ID: 1492593791@00:50:56:AA:50:57

All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0hpu)

System-Status: ok

Access-Status: Admin

From the log: /space/avamar/var/mc/server_log/mcserver.log.0

avtar Info <17844>: – Server is in read-only mode due to Diskfull

avtar Info <17972>: – Server is in Read-only mode.

dpnctl status

dpnctl: INFO: gsan status: degraded

The 1st step I did was to manually delete high capacity old restore points for some clients to free up space. The appliance dedupe capacity came down to 82% and I was expecting that the backup jobs would run that night after letting the garbage collection run during the scheduled maintenance window.

The Appliance did not return to full Access state even when the old backups were deleted and the dedupe usage was decreased to 67.5%

Output for df –h shows my data volumes data01,data02 and data03 are not used proportionately.

admin@dxb02-vlp-vdp1:~/>: df -h

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda2        32G  6.1G   24G  21% /

udev            2.9G  148K  2.9G   1% /dev

tmpfs           2.9G     0  2.9G   0% /dev/shm

/dev/sda1       128M   37M   85M  31% /boot

/dev/sda7       1.5G  187M  1.2G  14% /var

/dev/sda9       138G  7.9G  123G   7% /space

/dev/sdb1       256G  173G   84G  68% /data01

/dev/sdc1       256G  170G   86G  67% /data02

/dev/sdd1       256G  143G  114G  56% /data03

To fix this set the freespaceunbalance value to a higher percentage depending upon the difference in usage you notice.

Steps:

  • Stop maintenance mode “avmaint sched stop –ava”
  • create checkpoint “avmaint checkpoint –ava” – Allow it to finish before moving ahead
  • Perform rolling integrity check “avmaint hfscheck –rolling –ava” – Allow it to finish before moving ahead.
  • Verify the checkpoint using “cplist”
  • Create another checkpoint “avmaint checkpoint –ava”
  • Check the current percentage utilization of each data volume.
  • If the difference is more than 10% (default value) run the below command.
  • “avmaint config –ava freespaceunbalance=20”
  • Check the status again using status.dpn
  • Start the maintenance mode “avmaint sched start –ava”

If the appliance status does not switch to Full-Access mode after changing the value increase it to 30% or more accordingly.

Let the appliance go through the maintenance window. Once you configure the more backups it is expected for the dedupe space to be utilized equally from all data volumes.

If the issue persists, open a ticket with VMware Support.

Happy Troublehsooting