City Furniture and their transition to AWS for their VMWare based workloads
City Furniture is a Florida based retailer on a mission to change the way people live with beautiful home furnishings at incredible value. Their roots date back to the 1970s when they opened as Waterbed City, and over the years evolved into a leading furniture and home accents destination. Style and value are at the core of everything, and with nearly 20 showrooms across the state and an expansive e-commercial suite, they are constantly evolving to bring customers the very best.
The Challenge
City Furniture’s business model for technology included outsourcing x86 and networking capabilities to IBM Cloud to support their company’s non-eCommerce technology platforms. At the time, the customer did not have resources to support this endeavor and went to IBM for this type of solution. As time went on, City Furniture had numerous issues with availability as well as timely updates and was dissatisfied as IBM did not meet management or performance expectations for the environments networking and VMWare components. More specifically, the customer’s VMWare based workloads (servicing their warehouse, logistics and corporate functions), hosted on an IBM Softlayer managed VMWare Cluster were operationally expensive (compared to similar AWS Workloads), Technically Complex (network routing to/from datacenter did not adhere to their corporate standards, and experienced a disproportionate amount of downtime (due to technical complexities stated above, Pod structure of IBM Softlayer data centers and Muti-Region all-or-none Active/Passive failover approach). As the customer was already experienced with AWS, having operated their eCommerce workloads there for numerous years, the customer, in consultation with CloudHesive opted to migrate these workloads to AWS (EC2).
The Solution
The solution was comprised of two key milestones: migration and operation.
From a migration perspective, once CloudHesive was engaged by City Furniture, we started assessing the customer’s current environment, utilizing a Migration Readiness Assessment (MRA), created a to-be architecture after evaluating the current architecture and developing a migration strategy. This included CloudHesive establishing a new AWS Organization, Provisioning Sub-Accounts via Control Tower and establishing a Transit Gateway to provide connectivity to their MPLS/Dedicated networks. With the establishment of this new Organization, client’s existing accounts were moved under, to provide a single billing model, utilizing CloudCheckr for cost allocation. Inventory, Assessment and Migration Planning was performed using CloudChomp and Workloads were grouped by Environment (Production/Non-Production), tier (Application/Database), and Workload (Warehouse, Logistics, etc.). Using these groupings, specific migration windows were developed, along with appropriate destination accounts, VPCs, Subnets and Security Groups, providing for a scalable approach for future workloads and minimizing the blast radius otherwise found in the current environment’s (flat) network. VMWare virtual machines were migrated using Application Migration Service and, replication was established to alternative Availability Zones (within the same Region), utilizing the same, for Disaster Recovery Purposes. AWS Backup was incorporated for EBS Snapshot scheduling/retention, for Data Recovery Purposes.
From a operation perspective, as workloads were migrated from VMWare to AWS, they were put under management in which pre-determined metrics, events and logs were ingested and monitored against predefined thresholds and patterns. Of potential responses to these thresholds and patterns playbooks and run books were utilized to perform recovery, if required, including service/server restart, recovery from predetermined EBS backups and/or failover to alternative Availability Zones utilizing Elastic Disaster Recovery.
Third Party Services Used
CloudChomp, Datadog, Sumologic, CloudCheckr, CloudChomp
AWS Services Used
AWS Organizations, Control Tower (and related services), Transit Gateway, VPC, EC2, EBS, Transit Gateway and AWS Backup were utilized in creation of a Landing Zone, migration and operation of workloads, Application Migration Service, Elastic Disaster Recovery
The Results
All workloads were successfully migrated to AWS utilizing the above proposed architecture and approach, through multiple windows with minimal downtime. Among other achievements, the customer observed lower cost, faster responsiveness to changes in demand and better visibility through observability solutions. Individual workloads are able to be failed over between Availability Zones from from a proactive validation perspective, for maintenance activities that require individualized servers to be placed offline as well as wholesale failover of the suite-of-services from one Availability Zone to another. Given the simplification of network architecture, comprised of entirely AWS native services (and in turn multi AZ), alongside diverse Direct Connect connections, no network downtime has been experienced, even during individual carrier maintenance events. In addition to highly-available network connectivity (and shared services, such as Directory Services) and active/passive workloads, each workload follows a backup scheme utilizing AWS backup that allows point in time recovery using VSS and EBS Snapshots. These recovery processes are tested both on a periodic basis and battle-tested in real-world situations.
Lessons Learned
The customer’s journey was architected and deployed by CloudHesive with emphasis placed on the adherence of customer standards. This required working with the customer to establish a baseline understanding, baseline standard and baseline operational processes ahead of the implementation and transition to support. Given the timeline of the overall migration project, consideration was also given to scheduling around critical customer sales events and managing customer resource availability through holiday periods. From an ongoing resiliency perspective, we have found that the design, as implemented has been able to support the customers RPO and RTO requirements, both through proactive testing and in response to events that required failover or recovery of services.