If you have been following Jatheon blog over the last couple of months, you could’ve noticed several blog posts explaining AWS core concepts and most important services, such as IAM, VPC, S3, EC2 and others.
These introductory blog posts are an excellent first step for any company that’s starting its cloud journey, as well as those that have a light presence in the cloud and are looking “to do more”, in order to really embrace all the power AWS can offer.
Now it’s time to go a step further. In today’s blog post, you will get acquainted with AWS Well-Architected Framework. This will help you learn AWS environment best practices for designing and maintaining a reliable, scalable, secure, but also cost-effective cloud environment for your company’s needs.
What Is, Really, AWS Well-Architected Framework?
AWS Well-Architected Framework is a set of five pillars which serve as the foundation for your AWS cloud environment. You can consider them as a blueprint for your workload on Amazon’s public cloud. These five pillars are:
- Operational Excellence
- Performance Efficiency
- Cost Optimization
The goal of the Operational Excellence pillar is to give you guidelines and best practices on how to run and maintain your AWS workloads in order to support your business needs, while continuously improving. There are six main design principles within these guidelines:
- Perform all your operations as code. Rely on tools such as AWS CloudFormation or HashiCorp Terraform for all the initial deployments or changes in your AWS environment. Try to completely eliminate manual actions through the AWS web console.
- Always have up-to-date documentation at hand. Using Infrastructure as Code will greatly eliminate the need for large documentation of your environment, but you should still make sure you document properly the main architectural designs, such as VPC network diagrams and IAM permissions. Always try to have extensive documentation.
- Make frequent, small, reversible changes. This is probably the most important design principle in the Operational Excellence pillar and also one of the crucial DevOps concepts. Never perform several changes at once (like deploying both VPC, IAM and EC2 changes simultaneously), and always make sure that the change can easily be reverted. For example, if you’re deleting a subnet in VPC, make sure that you do it properly with an IaC tool, so that it can be reversed.
- Anticipate failure. Always design your systems with an idea that they will fail. This means that you should utilize all AWS services and features that give you fault tolerance—multi region services, multi AZ deployments, auto-scaling with supporting services, such as EC2, ECS, DynamoDB, RDS, etc.
- Learn from your failures. When a service goes down and you experience downtime, it goes without saying that the most important thing to do is to restore the service availability as soon as possible. But once the dust has settled and everything is back to normal, you should perform a post-mortem and document why the downtime happened and how you can prevent the failure next time.
The six design principles we’ve presented here are just an overview of this pillar. To really get acquainted with it, we suggest you study this whitepaper in detail.
In our two-blog post series on cloud security, we covered the majority of the concepts from the AWS Well-Architected Framework Security pillar. Here we will mention only what AWS emphasizes as the six main design principles:
- Implement a strong identity foundation. Rely on IAM and external identity foundation systems like AD or LDAP. Implement IAM best practices.
- Enable traceability. Configure and use AWS services such as CloudTrail, Config, GuardDuty, Artifact.
- Apply security on all layers. Follow the VPC best practices, enforce NACL and SG, enable the VPC flow logs (network protection on L2-L4 OSI layers). Protect the application layer with WAF or DAST, SAST and similar scanning tools.
- Automate security best practices. This design principle relies heavily on operational excellence and the usage of infrastructure-as-code tools to have all changes automated, audited and easily reversible.
- Protect data both in transit and at rest. Encrypt everything!
- Prepare for security events. Establish a testing plan for all security mechanisms, perform regular audits and rotate all of your encryption keys and IAM credentials.
Remember, before you deploy any workload on AWS, even those in testing or staging environments, you must first ensure security, and set up proper auditing and reporting mechanisms.
And it’s not just enough to have the tools up and running. There should be a DevOps or security team regularly checking the findings of these tools. A good piece of reading we recommend here is the Security pillar whitepaper.
AWS describes the Reliability pillar as the ability of any AWS cloud system to recover from any failure or disruption. In a nutshell, it’s a set of five main design principles on how to make your services fault-tolerant to any disaster happening due to either your mistake or downtime in AWS global infrastructure. These are:
- Test your own recovery procedures. Once you have designed your systems for fault tolerance, you need to regularly test them. For this design principle, it’s essential that you get yourself familiar with the concept of chaos engineering and tools such as Netflix ChAp.
- Automatically recover from failure. Systems should be designed in such a way that they always automatically recover from failure.
- Scaling horizontally is preferred to scaling vertically. Always try to increase the number of instances / workers compared to increasing the resources in one instance. This also is tied to using load balancing and auto-scaling, where applicable.
- Don’t guess capacity. You cannot predict your load, especially in today’s world of connected devices.
- Manage your changes with automation. This guideline is echoed in pretty much every pillar of the AWS Well-Architected Framework, because Amazon cannot stress enough how important it is to manage your changes with AWS CloudFormation or HashiCorp Terraform.
The Reliability pillar is interconnected on many levels with the Operational Excellence pillar and you can master most of its principles by studying the latter. Still, we recommend spending some time studying the Reliability pillar whitepaper as well.
By following the pillars of the Well-Architected Framework in the given order, it’s easy to understand that you first need to properly design your system to be reliable, secure and operational at all times before deploying your workloads.
But, this is not enough.
If your service doesn’t have enough resources to provide the required functions to its users, the fact that it’s fault tolerant and secure from breaches won’t help. That’s why Amazon introduced the Performance Efficiency pillar into the framework, with the following five design principles:
- Leverage new technologies and managed services. Perform an internal assessment of “build versus buy” for any service that AWS offered as a managed endpoint. Should you host your own databases in EC2 instances or use DynamoDB / RDS? Or should you run your own Kubernetes cluster or leverage the power of EKS?
These aren’t easy questions to answer right away, but if you carry out a proper internal review of your company’s needs, you can better understand the actual requirements from the AWS services, and then proceed to utilize them properly while consulting the AWS documentation.
- Go global in minutes. Utilize regions and AZs around the world to provide a global service which offers fast response and low latency, if required by your customers. Also remember to leverage managed services such as CloudFront, Route53, and many others which allow you to go global in a couple of minutes with just a few API calls (or clicks from the web console, even though that’s not something AWS recommends).
- Use serverless architecture. If you’re still not using serverless services, consider doing so. Serverless isn’t just Lambda computing, there are also serverless databases such as Aurora / DynamoDB or serverless container hosting solutions such as AWS Fargate, which eliminates the need of hosting and maintaining your own Docker clusters.
- Experiment more often. But remember that when you do experiment with the new services and technologies, don’t do it in your production environment 🙂 Have a separate AWS account, or at least dedicated VPC for testing purposes.
- Mechanical sympathy. This guideline is tied to the previous point, which is, experiment as much as you can, but use the technologies that are in line with your business. If your business relies on old, monolith application, don’t try to break it by enforcing microservices and containers, as it simply won’t work. Better find a way to host it in EC2 with auto-scaling and load balancing in multi-AZ setup.
For an overview of compute, storage, network and database AWS services, and a take on performance testing and fault tolerance, check out the AWS Performance Efficiency whitepaper.
The Cost Optimization pillar focuses on avoiding unnecessary costs in the cloud. While this pillar won’t affect how your cloud workloads functions on a day-to-day basis, it’s imperative for any business to run optimized operations in order to achieve sustainability. Amazon defines five design principles for this pillar:
- Adopt a consumption model. Pay only for the resources you use, when you use them. When the resource is not in use, either shut it down, or try to achieve only “lights-on” operational mode utilizing automation.
- Measure overall efficiency. Always measure the amount of resources you’re using compared to business efficiency you’re getting from them. It’s nice to have 1:1 staging to production environment, but if your business cannot support this decision, then scale down the staging environment. Don’t overprovision resources if you don’t have the business justification to do so.
- Stop spending money on data center operations. During the transition period from data center to the cloud, it’s quite reasonable to maintain both environments. But once you’re up and running in cloud, and your workloads are stable and secured, lose the data center. Or just leave those services that cannot be migrated to the cloud, and scale your hardware presence proportionally so you don’t lose additional money.
- Analyze and attribute expenditure. Use AWS billing tools to analyze your monthly spending and utilize AWS tags to properly attribute costs to different cost centers (departments in your company, or specific teams). If you have multiple RDS clusters, it’s smart to tag them so you know which teams or individuals are running those relational database clusters. Once you know the culprit, it’s advisable to reach out to them in case a resource can be optimized and further save costs.
- Use managed resources to reduce the cost of ownership. Calculate the total cost of ownership and see how it impacts running services by yourself compared to managed service. Maybe you will spend equal money running MySQL in EC2 instances compared to RDS, but what about the engineering time required to maintain such a solution? In this case, it will be up to your engineers to upgrade, backup and recover MySQL databases running in EC2 instances compared to managed RDS instance, so it’s a good idea investing some time into comparing TCO of hosted solutions vs managed ones.
Besides security, costs are the topic with most concerns when it comes to migrating to cloud. So, we will dedicate a separate article to cloud costs in a couple of weeks. In the meanwhile, be sure to check out the Cost Optimization whitepaper.
Which Tools Accompany AWS Well-Architected Framework?
You’ve probably realised reading these last few paragraphs that implementing all of the guidelines, best practices and advice is quite a daunting task. It will take you quite some time to draw all of your cloud blueprints on paper and follow all five pillars.
And that’s correct, designing a cloud environment while considering all the guidelines will take you and your cloud team probably a couple of months. But how to know when you’re ready, how to verify your designs? Luckily, Amazon can help you with that as well.
Well-Architected Tool is available as an AWS service, residing inside your account, and with just a couple of clicks, you can submit your workloads to this tool, answer some questions and get a report on your environment. Before using it, please check the region availability, since this tool is not available in all regions, but it should be present in the most popular ones in the US and Europe. Besides the end result, this tool will also guide you with videos and documentation on how to strengthen your weak points and improve overall results in all five points.
Besides the Well-Architected Tool, AWS customers can also rely on AWS Solutions Portfolio, a knowledge base of vetted AWS reference diagrams and implementations, in use by many technology leaders through the globe.
Each configuration in the AWS Solutions Portfolio has been verified by Amazon-employed AWS solution architects, and these implementations and diagrams are updated on a regular basis. And you can browse through such solutions either via technology category (such as databases, ML / AI or containers / microservices) or by industry.
Last but not least, if you’re really in doubt, you can always reach out to companies in your area from AWS Partner Network, MSPs that can help you on your cloud journey and be a practical advisor while you lay your foundation following principles from WA pillars.
In our next blog post, we’ll be looking at the costs associated with moving to cloud. Stay tuned in the following weeks.
Jatheon Cloud is a fourth-generation, cloud-based email archiving platform that runs entirely on AWS. To learn more about Jatheon’s cloud archive, contact us or schedule a personal demo.