Journey to the Center of Our AWS Migration Battle: Part I

Mahmoud Yasser
11 min readMar 19, 2024

Introduction

The AWS migration is a journey filled with a series of challenges, surprises, and opportunities for learning in detail. Probably led by some key drivers — better cost efficiency, security, or perhaps an operational shift on the strategic level — this migration has to be planned, executed, and overseen comprehensively. This very discussion will now delve into the nitty-gritty details of these complexities, with an aim to actually provide a complete walkthrough of the entire process. The experience is very wide, covering even the early stages where planning starts, to the reflective conclusion stages and capturing the essence of what this actually involves.

During my time at the current company, I played a crucial role in evolving the cloud infrastructure, with a particular focus on Amazon Web Services (AWS). My involvement covered the full spectrum of cloud resource management, starting with the initial establishment of the MVP architecture, progressing through to the intricate processes of resource migration and environmental segmentation.

Initial Architecture and Responsibilities

The adventure began with the strategic development and implementation of the AWS infrastructure. This configuration level involved setting up the Virtual Private Cloud (VPC) network, configuring the Domain Name System (DNS), provisioning the load balancers, launching EC2 (Elastic Compute Cloud) instances, and ECS (Elastic Container Service) services. I developed Lambda functions and state machines to streamline and improve our operations. In this process, I implemented the CI/CD pipeline with AWS tools, including CodePipeline, CodeBuild, and CodeDeploy, to fully automate the software delivery process.

Initially, the infrastructure setup was managed using the AWS Console, but with the increasing volume and complexity of the data and resources, it became apparent that a more efficient management method was necessary. Consequently, we transitioned to using more sophisticated tools and processes to manage the data and resources more effectively and efficiently.

Migration Strategy and Risk Assessment

Our migration strategy was critical, aimed at preventing data loss and minimizing downtime as much as possible. We started by creating detailed documentation of our current AWS setup, identifying each resource and how they were connected. This process included a deep dive into the IAM users, their roles, and the policies in place, ensuring that the migration would be secure and meet compliance standards.

It was essential to understand how different services depended on each other. Our cloud setup was complex, with many services and microservices linked together. To prevent any operational issues, we needed a clear plan that outlined how each part of our system would be moved, ensuring everything continued to work smoothly during the transition.

Identifying and moving the most critical resources first was a key strategy to reduce the risk of any system outages. I explored various migration tools, assessing them for both effectiveness and cost. For example, when we needed to move large amounts of data stored in S3 buckets, we chose AWS DataSync because it was efficient, even though it was more expensive than some other options.

Putting together a comprehensive backup and recovery plan was also a crucial step. This plan included regular backups of important data and system configurations, along with detailed procedures for restoring services quickly if anything went wrong. This approach was vital for ensuring that we could maintain continuous business operations, even in the event of data loss or system failure.

Estimating Migration Costs

The financial aspects of moving to a new AWS setup were complex, encompassing more than just the upfront costs of tools and resources. We also had to consider the hidden or indirect costs, which could arise from possible downtime or reduced system performance during the transition. These indirect expenses might include lost revenue if our services were unavailable, decreased productivity from our teams, or lower customer satisfaction due to service disruptions, and these costs could surpass the direct expenses on migration tools.

Taking a balanced approach was essential in analyzing the costs and benefits of different migration tools. This meant carefully evaluating how well each tool met our specific migration needs, such as their potential to lessen the need for manual work, speed up the migration process, and minimize any service interruptions. Additionally, we looked for ways to optimize costs, such as using AWS’s free service tier for eligible services or securing volume discounts, which played a significant part in controlling our overall migration budget.

Transition to Terraform and Account Segmentation

In an attempt to enhance the efficiency and manage the system with ease, I undertook the initiative to migrate our resource provisioning process to Terraform. This proactive move allowed us to organize and manage our AWS resources in a more structured and modular way.

In the beginning, our different environments — development, production, and staging — were all housed within a single AWS account. I created a Terraform framework tailored to this consolidated account, which helped in managing the resources more effectively and in a modular fashion.

As our company expanded, the need arose to separate these environments into distinct AWS accounts, specifically for development, shared services, and production. I was tasked with orchestrating this separation. The shared account was particularly designed to centralize the billing process and to oversee shared resources, such as CodeCommit repositories, CodePipeline CI/CD pipelines, and general S3 storage buckets. To ensure security and efficient management, we integrated these with the AWS Identity Center, providing a unified control and security mechanism for our cloud resources.

Strategic Account Structure

Development Account: This is essentially a testing ground where new software, updates, and features are developed and rigorously tested. By keeping this environment separate, we protect our live production environment from any unexpected disruptions. This setup ensures that all new developments are thoroughly tested and validated before they are rolled out in the production environment.

Production Account: This is the live environment where our applications and services are actively used by end-users. Isolating this account is crucial for maintaining operational stability and security. It helps prevent unintended changes or disruptions that could arise from development activities, thereby ensuring a stable and secure user experience.

Shared Services Account: This account serves as the central hub for the AWS organization, managing essential shared services such as centralized logging and auditing. It also includes shared resources like AWS Directory Service for directory-based features. Additionally, this account oversees management and financial aspects, centralizing the billing processes to facilitate better tracking and management of cloud-related costs.

This strategic division of accounts enhances security by adhering to the principle of least privilege, which means that individuals and services only have the access necessary to perform their specific roles. It also simplifies the management of cloud resources, enabling a more transparent and efficient allocation of costs and responsibilities across different departments and projects. This structured approach aids in the clear delineation of roles and facilitates easier monitoring and optimization of cloud usage and expenses.

Implementing IAM Identity Center for Robust Access Management

Implementing IAM Identity Center (formerly AWS SSO) in our shared services account has been a crucial move to improve access control and make managing identities across our AWS organization easier. This process includes several important steps:

  • Creating Permission Sets: This involves setting up the permissions for AWS services that users or groups can have. By designing these permission sets carefully, we can make sure that each team has just the right access they need to do their jobs. This helps keep our operations secure and running smoothly.
  • Managing Groups and Users: We organize our users into groups based on what they do in the company. This structure helps us give out permissions efficiently. It means we can manage access for many users at once, while still meeting the specific needs of different parts of our business.
  • Assigning Access: We then assign these permission sets to the groups or individual users, depending on what each person needs to do their work. This requires a good understanding of our company’s structure, and the specific access needs of each role. It’s important to get this right so that everyone has the access they need, without having more than necessary. This approach follows the principle of least privilege, ensuring tight security and effective management.

Codebase Migration

Moving your codebase to a new AWS shared account, particularly from an existing CodeCommit repository, is a key step to maintain continuity and efficiency after the migration. This process involves not just the technical transfer of code but also a strategic shift in how development resources and practices are aligned within the new operational framework.

  • Cloning Existing Repositories: Begin by cloning your current repositories from the old AWS organization. This ensures the entire codebase, including branches, commit history, and tags, is transferred intact. Keeping this historical data is crucial for ongoing development and tracking past work.
  • Updating Configuration Settings: After cloning, check and adjust service-specific configurations that may be affected by the move. This might involve updating CI/CD settings, environment variables, or SDK settings that were linked to the previous AWS organization.
  • Auditing and Remapping Dependencies: Conduct a detailed check of all dependencies to guarantee they are correctly linked to the new AWS environment. This step is vital to avoid any interruptions in development processes or service connections after migrating.
  • Referencing AWS Migration Documentation: Utilize AWS’s detailed guide on migrating to CodeCommit for a smooth transition. This documentation offers step-by-step instructions and best practices, ensuring a complete and efficient migration, including preserving the full commit history and branches. See AWS Documentation for detailed migration to CodeCommit
  • Implementing Branch Policies: Once the codebase is moved, set up branch policies in the new repository. These policies dictate branch management practices, such as merge rules, push restrictions, and protections, keeping the code secure and in line with organizational standards.
  • Creating Notification Rules: Set up notifications for key activities like pull requests, merges, and pushes to keep the development team informed and collaborative. These alerts can be linked to email, SMS, or other communication tools to ensure timely updates on repository activities.
  • Integrating with CI/CD Pipeline: Lastly, link the repository to the CI/CD pipeline in the shared account. This setup should automatically detect repository changes, initiate builds, and manage deployments, creating an efficient end-to-end process from code update to production in the new environment.

By following these detailed steps for migrating and establishing the codebase in the new AWS shared account, you ensure a smooth technical transition that aligns with your organization’s broader goals. This careful planning and execution lay the groundwork for ongoing innovation and development success.

Data Migration Strategies

To ensure a smooth and accurate transition of data between AWS organizations, we implement detailed migration strategies, particularly for essential data storage like Amazon S3 buckets and Amazon Redshift databases. This process requires careful planning and precise execution to keep data intact, reduce downtime, and maintain ongoing operations.

Migrating S3 Buckets Using DataSync

Transferring data from S3 buckets in the original AWS organization to the new one is a detailed task that needs careful management to avoid data loss and ensure data consistency. AWS DataSync is crucial here, offering a robust and efficient way to move large amounts of data. It automates data transfer and includes features for validating data and scheduling transfers, ensuring a reliable and timely migration.

Important aspects of S3 bucket migration include:

  • Data Mapping: Thoroughly planning how data will move from the original to the new buckets, taking care to rename them if necessary to meet S3’s unique naming rules.
  • Transfer Setup: Setting up DataSync tasks to use network resources wisely, including limiting bandwidth use and running multiple transfer threads to minimize the effect on other operations.
  • Data Validation: Using DataSync’s built-in data validation to check that all data transferred is accurate and intact, giving peace of mind that the data is transferred correctly.
  • Copying Permissions and Policies: Making sure that all S3 bucket policies, access permissions, and lifecycle rules are either copied over or adjusted in the new buckets to keep data secure and in compliance.

Redshift Database Migration via Snapshot and Restore

Moving Amazon Redshift databases to a new AWS organization involves taking snapshots of the existing databases, capturing their current state and data. These snapshots are then shared with the new AWS account, where they can be used to set up new Redshift clusters or integrate with existing ones in a serverless Redshift setup.

The key steps for Redshift database migration are:

  • Taking Snapshots: Creating either manual or automatic snapshots of Redshift clusters to capture all the current data in a format that can be restored later.
  • Sharing Snapshots: Adjusting settings to share these snapshots with the new AWS account, enabling the transfer of database information.
  • Restoring Data: Using the shared snapshots in the new AWS account to recreate Redshift clusters or to merge the data into serverless Redshift workgroups, ensuring continuity in data analysis.
  • Adjusting Settings Post-Restoration: After the data is restored, it’s important to go through and tweak the cluster settings, like VPC configurations, security groups, and IAM roles, to match the security and operational requirements of the new setup.

By following these detailed strategies for migrating data, we can successfully move critical data storage systems like Amazon S3 buckets and Amazon Redshift databases to a new AWS organization. This careful approach not only secures historical data but also ensures uninterrupted data analytics and storage functions in the new setting, providing a strong basis for ongoing and future data-centric activities.

Containerization and ECR Repositories

Containerization is essential in modern software deployment, offering benefits like consistent performance in various environments, easy scalability, and process isolation. Once we have migrated our code, the next important step is to create Docker images and upload them to Amazon Elastic Container Registry (ECR). We achieve this by automating the process within our CI/CD pipeline, which builds the images every time there are changes in the code. This automation ensures that the Docker images are always current and ready for deployment. ECR serves as a secure and scalable place to store our container images and works well with Amazon ECS, which helps in managing and running these containers.

To improve teamwork and consistency across different development stages, it’s important to set up image replication in ECR. This means that when an image is uploaded to the repository in our shared account, it automatically gets copied to the ECR repositories in both the development and production accounts. This automatic replication supports a smooth process for testing and deploying the software, cuts down on manual work, and helps prevent mistakes.

It’s vital to automatically check container images for security weaknesses. By setting ECR to scan the images when they are uploaded or replicated, we can find and fix security issues early. Additionally, using AWS tools like event rules and AWS CodeDeploy allows us to update our applications without downtime, using methods like green/blue deployment. This approach helps maintain a stable and secure environment for running our applications in ECS clusters.

In conclusion, the AWS migration journey has been a multifaceted and enlightening experience that spanned across various stages of planning, execution, and optimization. By adopting a meticulous approach to migration strategy, risk assessment, and cost estimation, we were able to mitigate potential challenges and ensure a smooth transition. The shift towards using Terraform and segmenting accounts into development, production, and shared services, not only streamlined operations but also bolstered security and efficiency. Implementing IAM Identity Center further refined access management, while the strategic migration of our codebase and data, including the use of AWS DataSync and Redshift snapshot restores, safeguarded our assets and facilitated operational continuity. Containerization and the use of ECR repositories enhanced our deployment processes, ensuring scalability and security. This journey, while complex, underscored the importance of thorough planning, effective tools, and a proactive approach to managing cloud resources. It has set a strong foundation for future growth and innovation, highlighting the transformative potential of cloud technology in optimizing and advancing our organizational infrastructure.

Thank you for spending time on reading the article. I genuinely hope you enjoyed it. I also recommend that you read the articles in the series to help you connect the dots.

If you have any questions or comments, please don’t hesitate to let me know! I’m always here to help and would love to hear your thoughts. 😊

--

--