(originally authored by preetham.m)

 

Early during our development for the Azure Stack integrated system, we applied some important lessons from CPS and DHCS to how we build and deliver these systems. The learnings were further validated with some of our early Azure Stack TP3 deployments as well. As Dell and EMC came together over the last year, the VCE model not just provided additional feedback, but also gave us some of the factory processes and tools to enable us to deliver to our goals.


When we set out to engineer Azure Stack, one of our primary goals was to get our customers operational in days. This meant we had to ensure there was enough engineering rigor before the system got to the customer. As a result we not only aim to spend the least amount of time on-site (keep deployments predictable and costs low), but also ensure a smooth transition to getting started building plans and onboarding tenants.


To achieve this goal, the process begins in our engineering labs and an investment in standardization and automation. With the latest Software from Microsoft and the HW, Software and Firmware from Dell EMC, we run a suite to cover functional, performance and reliability tests. Any changes to the HW or Software are then cataloged and handed to the factory.


The next phase begins with developing a rigorous factory process. The goal is to ensure that the system not only ships fully integrated, but also has undergone a set of pre-deployment tests to ensure all issues that could be a result of faulty components, firmware levels and configurations are eliminated as early as possible. While engineering tests are done in the engineering labs, these tests are run at the factory on every system prior to ship to the customer. Learning from DHCS pilots and early TP3 deployments has shown us that shipping components directly without factory validation has resulted in faulty components, incorrect firmware levels and triggered support and troubleshooting after the system was setup in the customer premises. This not only proved to be time consuming, but also involved multiple dispatches and troubleshooting steps.


One of the reasons most hybrid cloud deployments fail is because the components (SW and HW) and their dependencies are not fully validated prior. As a result, we do not ship incomplete systems. We would like to avoid turning our customer’s data center into a lab for our support and deployment teams with minimal engineering support. Particularly for our early adopters, this does not help get operational early as the level of changes could potentially run broad (component choices) and deep (firmware levels and configs). For any product in development, things will change. If any component has to be updated on-site, you not only lose time, but you introduce risk late in the process.


Our goal is to keep the engineering and factory rigor while investing in engineering resources to turn around any changes quickly to enable customers to get operational fast.  With the pre-engineering done in our factory and minimal time on-site you are not just an early adopter, you are a successful one. While the first wave of deployments are our early adopters, our goal is to bring this process and rigor to all our customer deployments.