I last wrote about this topic following a widely publicized AWS outage. This time, I am writing following the biggest cloud computing event of the year, AWS re:Invent 2015. Clearpath Solutions Group was proud to again sponsor this exciting and growing event which had over 18,000 people in attendance. There were several big themes to re:Invent this year including the Internet of Things (IoT), DevOps, Big Data and Business Intelligence. Check out the full list of new services and features here. It’s a lot to absorb and the pace of innovation never ceases to amaze me. Despite all the big news, these are my two big takeaways:
Recently, Amazon Web Services (AWS) suffered another highly publicized outage. AWS’s explanation is very much worth reading, but at a high level, the DynamoDB (AWS’s NoSQL database) service experienced timeout issues due to problems with how the database handles metadata. This had a cascading effect on other widely used AWS services (those that depend on DynamoDB) such as EC2 autoscaling, Cloudwatch and Simple Queue Service (SQS). Many popular internet facing websites and applications were affected, as were countless enterprises running their critical workloads on AWS. However some websites like Netflix, who is perhaps AWS largest and most noteworthy tenant, weathered the outage with no noticeable issues. Netflix, like all savvy AWS users, understands how to build incredibly resilient, fault tolerant systems on AWS. In fact, they design and build in failure into everything they do. Enterprises of all sizes can learn valuable lessons from Netflix.