Cloud Native E-Commerce App Solution on Azure
The following is taken from the first Continuous Assessment (CA) assignment for the part time Masters course I am undertaking in DevOps. In this assignment I got an A grade, which I was very happy with. The assignment was to propose a cloud infrastructure solution for a hypothetical E-commerce web application along with a DevOps solution to go with it. The YAML and Bicep code for my solution can be found on my Github repo.
Introduction
This technical report is the proposed solution to our client’s current issues with their E-commerce based web application. After the initial requirements gathering phase we pinpointed the main pain points they were having which is affecting the app’s overall reliability and ability to scale. These 3 main issues were:
Unreliability of tests moving between development, testing and production environments.
Manual tedious processes to be performed by the Operations team on the on-premises infrastructure.
Overall agility in meeting deadlines and implementing new features to remain competitive in the market.
In effect, we propose moving this on premises (on-prem) web application entirely to the public cloud (including customer data, which we have taken precautions to address).What follows is the cloud native design for both the new environment for the app on the public cloud (which will be on Azure) and also the design for the DevOps practices and pipelines on how the application and IAC code should move from development to production. By the conclusion of this report, it will be shown that this solution meets the 5 tenets of the Azure Well Architected Framework [1.] (which are - Reliability, Security, Cost Optimization, Operational Excellence and Performance Efficiency) so that as a proof of concept the client can be assured it meets all best practices in design.
Current Setup and Issues
Currently, the client’s e-commerce application environment is hosted on-prem on VMWare provisioned Windows virtual machines (VM’s) on a (virtual network) VNET sitting behind a load balancing tool which also acts as a central gateway in and out of the VNET. On the same VNET, there is also a VM that runs SQL Server which acts as the database for the application. The codebase of the application is a .NET MVC type app running on a IIS server on these Windows VM’s. The e-commerce application’s main components are a gallery (to display products), a shopping cart (to place products into) and a checkout service (this is the most critical part that is the most error prone under intense user load).
They have separate environments of this setup for development, testing and production which have been all manually provisioned on the VMWare GUI interface, so that there is no version control on what has been done. When the software developers are finished with their code they package it up unto a Zip file on their development environment and send it over to the QA team who copy it down into the testing environment for testing and the same procedure is done then for passing the code onto the production environment for final deployment.
This has all resulted in a lot of manual processes that introduce the possibility of human error that can then cause delays in time to ship the code and be agile in the marketplace.As the environments are manually provisioned, bugs are showing up in one environment and not in another. Also the operations team have to constantly worry about scaling the production environment up and down to respond to user load on the e-commerce app and maintaining the same configuration settings between the environments.
Cloud Infrastructure Design
The proposal is to move this entire setup to Azure, which is Microsoft’s public cloud provider. Azure being the world’s 2nd biggest cloud after Amazon Web Services (AWS), was chosen as (while being entirely cloud agnostic) there is a general perception in the industry that .NET applications run better and have less migration issues on Azure, as they are both within the Microsoft eco system. There is also one big cost saving that can be made moving to Azure and that is in utilising the Azure Hybrid Licensing Benefit [2.]. As this client is already using and paying for a SQL Server license in their current on-prem setup, that existing license can be used to offset the licensing costs of moving to a Azure SQL managed database (which we will discuss later).
To relive the operational stress of the current setup there is a need to move away from VM’s and onto a more managed way of hosting web applications. The industry phrase for this would be “cloud native”, a definition of which would be “Cloud-native architecture and technologies are an approach to designing, constructing, and operating workloads that are built in the cloud and take full advantage of the cloud computing model.” [3] So really its not enough to just put an application on a VM in the cloud, which could just as well be on-prem, but also to take advantage of services that are only availble in the cloud as this solution will do. These solutions will all be managed (PaaS as opposed to IaaS) as well to reduce operational issues.
So the majority of the application code (the product gallery and the shopping cart) will be migrated from the IIS website on the on-prem VM to an Azure App Service. There are migration tools to perform this. [4] An Azure App Service is a managed PaaS resource ideally suited for hosting HTTP based web applications. All scaling, patching and general maintenance duties are managed by Azure, so that the operations team do not need to be as hands on administrating the Azure App Service as they are with a VM. It also has built in compatibility to deploy from any modern CI/CD tool (which we will cover more on later), as well as all the features you would expect from hosting IIS websites such as SSL and custom domains. The end user before coming to the Azure App Service in their browser will first be exposed to the Azure CDN service (this will be covered in the Additional Features section).
The checkout service code which handles payment requests to a third party payment provider (Stripe etc…) and updates the database with a finished order is kept separate from the gallery and shopping cart code in the Azure App Service and instead is kept in a Azure Function. Azure Functions are managed serverless pieces of code that can be activated based on certain triggers. The Azure Function will perform the necessary actions which will be authenticating with the payment provider and then placing the order in permanent storage in the database.
The Azure Function will not be directly coupled to the Azure App Service though. Instead an intermediary step will be to put the order on an Azure Queue Storage. Orders from the Azure App Service will come into the Azure Queue Storage in the form of messages which will then trigger the Queue Storage trigger of the Azure Function, which will read the messages off it in a FIFO fashion. While the Azure Queue storage step here is not entirely necessary to a working e-commerce application it helps by ensuring all orders are processed and not lost even under intense loads and helps smooth out the flow of finished orders onto the database also when it also under intense loads.
It is the belief that a design like this, that splits out the checkout code (which has been identified as the most critical part of the e-commerce application to fail under load in the present on-prem solution) is a good design practice. By then also placing an Azure Queue Storage in front of the Azure Function this decouples the application from the Azure App Service to the Azure Function and also the database in turn. In this technical blog post [5.], the argument is made that the use of Azure Functions to decouple application code in this manner is a good practice that should be followed when designing cloud native applications.
The database in this case will be Azure SQL, a managed instance of SQL Server. As before with the Azure App Service, managed services help ensure less work for the operations team and there are also tools and practices for migrating SQL Server databases to Azure SQL [6.] We can also assume the this database is only just for holding orders and not products or user details. The connection strings for both the Azure Queue Storage and the Azure SQL instance will be safely stored in an Azure Key Vault to be accessed by the Azure Function. More about the Azure Key Vault will covered in the Security Considerations section of this report.
DevOps Design
We recommend the use of IAC stored in a Git repo to provision all the necessary Azure resources for this solution across the three different environments - development, testing and production. As this removes the manual human element in their creations, we can be assured that all tests will act equally across all three environments. The only difference in environments will be in usage and costs plans - development and testing will be on cheaper plans, whereas production will be on more expensive and performant resources.
We will be using Bicep as our IAC tool, with Azure DevOps being used to store it in a repo and provision it through a pipeline. If this was a hybrid or multi cloud deployment, you could make the case for Terraform instead of Bicep, but as it will be an all Azure setup we don’t need to worry about the extra complexity of managing state that comes with Terraform. ARM templates used to be the best way to provision Azure resources with IAC but recently Bicep has succeeded ARM as the better and more easier method, with Bicep being a Domain Specific Language (DSL) rather than JSON, and as a result being far less verbose and simpler to modularise.
The Azure DevOps suite of CI/CD tools will be used to deploy all application and IAC code. Within Azure DevOps, Azure Repos will store the code in Git Repos, which will be automated that upon any commit to those repos will trigger an Azure Pipeline and deploy to the environments. Governance will be in place so that any code committed by pull request (PR) in the development environment will be passed to the testing environment (after automated tests, code review and approval) before final approval by the QA and Operations team before it gets deployed to the production environment. As all changes are in Git and versioned controlled as a result we can rollback any changes if the need arises. The code that controls the flow of this through the pipeline is contained in a YAML file, which in itself can also be committed and version controlled.
One possible issue with using an Azure Function in production for our checkout service is the warmup delay with using serverless functions [7.]. As we want the execution of our Azure Function to be as fast as possible, we recommend using a Premium Plan for our production Azure Function, which keeps instances of the Azure Function constantly “warmed-up” and waiting to be triggered. For our development and testing environments, the Azure Function can be run on a cheaper consumption based plan, which is a pay as go service based on the number of times an Azure Function gets triggered (fractions of a cent per trigger), which should save considerably on overall cost for these environments. Likewise all the resources on the development and testing environments will be provisioned using the cheapest free or Shared tiers available, while the production environment will get the more expensive and performant options.
Security Considerations
As can be observed, this infrastructure design is all on the public cloud including the database details for the customers orders. To justify this to the client who may naturally have secure considerations in mind we can make a number of declarations (assuming no compliance or legal reasons to keep the data on-prem).
Some sort of hybrid design could be employed with the SQL Server database remaining on-prem whilst having some sort of virtual gateway link between the database and the rest of the Azure resources. This would be a more costly solution that what is proposed. We can assure the client that the customer’s credit card details are not stored in the database regardless if it is on-prem or in the cloud, as a third party payment provider is used to provide an authorisation token.
As for customer or order details being comprised, we can assure that the Azure SQL instance uses hardware encryption to provide encryption at rest of all data. [8.] Only the Azure Function will be able to access the database and it stores the connection string using Azure Key Vault (which once again using hardware based encryption to make it very secure).
We can also assure the client that we will train all the software developers to never check in their code with the connection strings to the database. Instead these should be variables that reference the Azure Key Vault which in turn references the actual connection strings.
Additional Features
We have gone beyond what was initially asked for by the customer and applied three additional features to the solution to help achieve the Azure Well-Architected Framework principles. These extra features are:
Blue/Green deployment strategy for production environment.
CDN service for global delivery of static content for production environment.
Parameterisation of IAC to allow for easy self service deployment.
Deployment slots on both the production environment’s Azure App service and Azure Function will facilitate the use of blue/green deployments. This is a deployment strategy that allows us to have two active versions of these resources running in the same environment, with one running the newly deployed code and the other running the previous version of the code. These can be swapped to be the end user facing one at a moment’s notice without redeployment. This gives us the convenience to not having to redeploy if something goes wrong with production deployment and also during the usual few minutes that a new version of the code is being deployed, the services are never down even momentarily.
An Azure Content Delivery Network (CDN) sits in front of the end user before they reach the Azure App service on the production environment. This caches all static content of the web application (HTML, CSS, images etc…) and stores it on a globally distributed network that the user an access that is closet to them, while also reaching land on the Azure App Service. This reduces the latency to fetch our web app and decreases page loading times which is always a plus for e-commerce applications.
Certain basic variables in the Bicep file will be parametrised and configurable at deployment time. These variables will be the name of the resource group and what Azure region to deploy it to for this example. When running the pipeline to deploy the Bicep file you will then have the option to enter whatever you want for these values instead of them being hardcoded in the Bicep file. This gives more deployment options to the Operations teams when deploying across different environments.
Conclusion
The cloud and DevOps infrastructure solution presented here addresses the clients current concerns while also hitting the five main tenets of the Azure Well Architected Framework. How it does so is as follows:
Performance Efficiency - All resources used are managed PaaS resources that have in built scaling and performance management. The Azure Function and Azure Queue Storage help with decoupling the solution which is good software design practice. The Azure CDN helps with offloading the loading of the frontend elements of the site and reduce latency for the end user.
Security - Credit card information is never accessed or stored by the application but instead authorised by a third party payment provider. The connection strings for the database are protected behind Azure Key Vault.
Reliability - We can be sure that the three different environments will act exactly the same as they will be provisioned from the same IAC code (with minor changes to pricing tiers for development/testing vs. production). If something goes wrong with a production deployment we can instantly swap back without redeployment thanks to deployment slots on the Azure App Service and Azure Function.
Cost Optimisation - In switching to managed PaaS resources over on-prem VM’s, costs in general should be saved. Azure also has built in budgeting alerts that can be employed to prevent over spending if need be.
Operational Excellence - The operations team can deploy the IAC solution using Azure DevOps, which provides version control in Git and governance controls in the form of PR’s and code reviews to ensure only good quality code passes through to each environment. The Bicep file has been parameterised to allow the operations team change a few basic details about each infrastructure deployment.
References
david-stanford (n.d.). Microsoft Azure Well-Architected Framework - Azure Architecture Center. [online] learn.microsoft.com. Available at: https://learn.microsoft.com/en-us/azure/architecture [Accessed 5 Feb. 2023].
azure.microsoft.com. (n.d.). Azure Hybrid Benefit FAQ | Microsoft Azure. [online] Available at: https://azure.microsoft.com/en-us/pricing/hybrid-benefit/faq/ [Accessed 5 Feb. 2023].
robvet (n.d.). What is Cloud Native? [online] learn.microsoft.com. Available at: https://learn.microsoft.com/en-us/dotnet/architecture/cloud-native/definition. [Accessed 5 Feb. 2023].
GitHub. (n.d.). Home. [online] Available at: https://github.com/Azure/App-Service-Migration-Assistant/wiki [Accessed 5 Feb. 2023].
Perkins, B. (n.d.). The decoupling of software solutions using some Azure products and features | The Best C# Programmer In The World - Benjamin Perkins. [online] Available at: https://www.thebestcsharpprogrammerintheworld.com/2020/08/05/the-decoupling-software-solutions-using-some-azure-products-and-features/ [Accessed 5 Feb. 2023].
croblesm (n.d.). SQL Server to Azure SQL Database: Migration guide - Azure SQL Database. [online] learn.microsoft.com. Available at: https://learn.microsoft.com/en-us/azure/azure-sql/migration-guides/database/sql-server-to-sql-database-guide?view=azuresql [Accessed 5 Feb. 2023].
markheath.net. (n.d.). Avoiding Azure Functions Cold Starts. [online] Available at: https://markheath.net/post/avoiding-azure-functions-cold-starts [Accessed 6 Feb. 2023].
msmbaldwin (n.d.). Azure Data Encryption-at-Rest - Azure Security. [online] learn.microsoft.com. Available at: https://learn.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest.