2 Replies Latest reply: Mar 15, 2017 8:14 AM by dlevine4 RSS

IaaS Platform - Wich metrics and KPI to use with CI and vSphere infra

mfauvet

Hello users and Experts,

One of our big customer request that we help them to define which metrics and KPI they have to use to monitor their hardware and software infrastructure and generate alerts.

Today, they use vROPS and ViPR SRM to generate reports about storage usage, storage and servers CPU load, VM load, etc. but they don't want to spend time to read these report to identify something wrong. They want to be able to monitor vBlock and vSphere peformance and availability and receive alert if one of the KPI is not longer valid.


I believe it is not required to use too many KPI (and metrics) to do that but the right metric. Ex use CPU %ready for VM CPU instead CPU Usage.

I think too that we do have some  experience about that at Dell EMC : Dell EMC IT, EHC, Virtustream, VMWARE, etc.


Do you have some customer experience, personal experience and documentation about that please ?


Thanks for you help.

Regards,

Mat

  • 1. Re: IaaS Platform - Wich metrics and KPI to use with CI and vSphere infra
    dlevine4

    Hi mfauvet,

     

    We are reaching out to our subject matter experts and will have an answer for you shortly.

     

    Thanks for the question!

  • 2. Re: IaaS Platform - Wich metrics and KPI to use with CI and vSphere infra
    dlevine4

    Hey mfauvet


    Here is an answer from Anthony Foster


    "So there are three ways I see to tackle this.

     

    1. They need to redefine why they are using reports. Reports are a point in time “everything is good” check list. You shouldn’t be using them for monitoring you should be using them to show auditors and suites that stuff is happening.
    2. They need PS to come and build valid dashboards for what they are concerned about. vROPS supports it but it doesn’t sound like they are using it. In other words, hide all the stuff that is working and only show stuff that’s out of range or broken (been broke, just broke, will break). If there is stuff that is out of range / broken that they don’t want to see because they are already aware of it, then there needs to be a conversation about why they ignore things the system is flagging as abnormal. Both vROPS and ViPR use intelligent algorithms to calculate ‘normal.’ (if anyone took stats in college think two standard deviations out and add some Bollinger Bands.)
    3. If the intelligence that vROPS and ViPR provide aren’t enough for what they are wanting to do and they need to scale up to something like SIOS. Or they need to pay someone to build a deep learning system for them similar to what I describe in my blog post about using deep learning to manage a data center. https://www.wondernerd.net/blog/deep-learing-in-virtual-data-center-management/ (I keep wanting to build it but haven’t had the time or the TB log files to train a system.)"

     

    Thanks Mat!