A meal kit company that specializes in delivering healthy, organic, ready meal kits was about to start its journey to the cloud. The company was recently acquired by a leader in the food and beverage industry who intended to merge the business under one umbrella, with the same brand, customer management, operations and technologies, all managed by the parent company. The Client engaged Grid Dynamics to integrate data into their ecosystem through the development of an effective data ingestion solution that provides data model reconciliation and data backfill.
During the pandemic, the client grew substantially, released to new markets, and made several acquisitions, leading to the need for a new approach to manage business, run operations and maintain technical solutions. Due to this tremendous business growth, consistent operations improvements were required to compete in the market. Multiple IT operations, platform solutions, technical departments and integrations across acquisitions made it hard to manage a sophisticated technical landscape.
Grid Dynamics had a specific focus on integrating the acquired business into the parent technological ecosystem. The biggest challenge of any acquisition is merging businesses that have a greater number of different components than common components. Unification of business processes for this client involved:
Further considerations for integrating acquisitions into the parent architecture included:
The rest of the derived use cases, like unified customer 360, marketing campaigns, customer acquisition and retention policies, were out of scope for the engagement.
For this case study, we’ll focus on the unification of the technical architecture, including the approaches we used, and the solutions we built on top of AWS. We also tackle the other major goal of the integration, which was to create a defined technical roadmap for future acquisition integrations.
With these defined requirements, Grid Dynamics developed a lightweight solution hosted on AWS. Below we explain why certain AWS services were beneficial for this particular integration use case.
At the beginning of our engagement, the client was running an on-premise platform, with some infrastructure components migrated to AWS. Coming from an on-premise world, where supporting hardware, infrastructure, services and applications is a prerequisite, the client wanted to build a serverless platform that required close to zero infrastructure support.
Integration between the two businesses required data transformation and exposure to the parent company. While considering the serverless approach we would take, AWS Glue as a serverless data integration service stood out for its features that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Furthermore, AWS Glue provides all the capabilities needed for data integration out of the box, enabling greater speed to market.
There are three AWS Glue components:
Grid Dynamics opted for a solution based on AWS serverless to help the client achieve their data integration goal fast. Using serverless scalable services like AWS Glue and AWS Redshift enabled us to optimize operating costs and development expenses.
The Analytics Platform that we built, based on AWS Glue capabilities, involved data ingestion from MongoDB to the data lake with an intermediate data lake in AWS S3. For ingestion and transformation, Glue ETL services based on Apache Spark were used. To meet best practices, the intermediate data lake was split into several logical layers:
The data ingestion process can be summarized as follows:
The project timeline was aggressive: the integration needed to go live within three months, including production infrastructure, pipelines, data quality, monitoring and support runbooks. Grid Dynamics completed the project within the timeline, providing the client with:
The solution was built on top of serverless components of the AWS cloud, and since all data pipelines are batch in nature, there is no need to run infrastructure constantly - all services are provisioned on demand and released after pipeline completion. This approach resulted in drastic infrastructure cost reductions, no more infrastructure support engineers, and greater scalability as the client grows.