SaaS Growth and CI/CD Process Support with Smart AWS Infrastructure
Our client is a US-based SaaS vendor that provides a cloud platform for gathering, consolidation, analysis, and presenting data received from user devices.
The product gathers information from monitored mobile devices and desktops using installed agents and provides users with web-based dashboards to track geolocation, online activities, sent and received data, and other details.
Apriorit was hired to support our client's existing solution and gradually improve its performance while aggressively broadening its functionality. For this assignment, we needed to provide adequate support for a growing product audience from both the architectural and technical perspectives.
At the start of this project, our client's had 30,000 active users with a backend hosted on 12 Windows servers run by a US-based third-party provider.
From the technical point of view, the general purpose of the project is to gather unstructured data from multiple sources, process that data according to special rules, and then show the results to the end user. Part of this data is coming from Android devices via pull requests from an agent service. Other data is coming from the web via push requests made through a custom C++ library, which encapsulates a private protocol.
The first task for our team was to establish the right DevOps processes and structure to support aggressive development plans and continuous integration / continuous development (CI/CD). Our second task was to accommodate the growing number of users by providing an adequate architecture and infrastructure. Finally, we needed to optimize infrastructure costs.
The tricky thing was that our team needed to accomplish these three tasks while fulfilling ambitious new feature development plans.
First of all, Our team reconsidered the development environment.
We built two additional environment levels: dev (for development testing) and staging (for QA testing before deployment to production). Setting up these environments took our DevOps engineer about two weeks.
To automate the build and deployment processes, we used the Bamboo tool by Atlassian. Configuring the system and setting up the corresponding plans took our specialist six weeks.
Although at first the platform and this new scheme worked well with the existing hosting provider, at some point as the user base increased it became hard to manage the system growth using the mechanism provided by HSP. Apriorit team analyzed the pros and cons of several solutions and proposed migrating to Amazon Web Services (AWS).
AWS Migration and Cutting Costs
Once our client approved the AWS migration plan, we got to work.
Through the end of the migration process, our client was expecting user base growth that would potentially require about 28 servers in the old environment (as compared with 12 servers at the start of the project). It was obvious that the new AWS-based scheme required cost optimizations right from the start.
We decided to replace the Windows servers with Linux servers to both increase performance and lower costs. Our developers ported the custom C++ library encapsulating the main data exchange protocol from Windows to Linux (CentOS). We also rewrote the web service and implemented workers using Python (Django framework).
In nine months of development with a team of four developers, a database engineer, a DevOps engineer, and two QA specialists on Apriorit's side - along with a DBA specialist and several PHP developers on the client's side - we developed a system based on:
- DynamoDB (to store all kinds of data)
- MySQL on RDS (mostly to serve configs and logs)
- ECS with two types of clusters (web services and workers)
- An EC2 instance to host RabbitMQ
At first, we were going to use EBS as a straightforward task queueing solution, but Amazon has some issues with long-running tasks on their workers, which is why we decided to use Celery. However, we found out that Celery has a problem with a push model of serving queue imposed by SQS, so we ended up going with RabbitMQ.
In the new setup, all code is sitting in a Docker container, and shipping to the AWS environment is done using CodeBuild. Our team also managed to reduce the cost of using the Amazon infrastructure during development by applying Docker Compose to set up a local development environment (with MySQL, Rabbit, Dynamo, etc.).
Results and Final Remarks
Apriorit's first achievement was rebuilding the development and deployment environment and optimizing the corresponding processes. Besides the DevOps part, our team introduced JIRA for task management and Bitbucket for code hosting. All these changes allowed us to release several features and improvements every week without having to worry about how the system would handle further growth.
The project still relies on Bamboo for CI/CD purposes, but we're also thinking about trying Jenkins, which is faster in some cases and a bit more agile than Bamboo.
The results in terms of cost savings have been impressive. We cut the platform maintenance costs by about 40% compared with the project start point and by more than 300% compared with the old environment accounting for estimated growth. We achieved this with a smart environment setup that uses fewer and less expensive resources: instead of estimated 28 Windows Server 2008 R2 servers, we now use only 6 CentOS servers and provide a simple mechanism of new server onboarding to support further system growth. We also replaced expensive file system storage with unified DynamoDB storage.
Now our client's system serves more than 100,000 active users who perform over one million requests per day - and this system meets all performance and UX requirements.