Telco Order Management scaling optimization — A critical requirement when deployed on private cloud.

Chinmohan Biswas
4 min readAug 16, 2021

Authorship credit: Debajyoti Mukherjee

Dynamic orchestration-based, containerized, scalable Telco Order Management is replacing Linear, monolithic legacy Order management to embrace new digital products. However, that poses an interesting challenge about deployment model, where order management workload still to be deployed and scale within a bounded elasticity. It becomes critical to determine a scaling pattern for such order management architecture.

Now to determine scaling pattern, we should start with demand patterns by channels for Order Management system for telcos. Below diagram provides a high-level view of applications surrounding the Order Management System to generate Order requests.

And below table shows how the various orders can be characterized by its impact on business, customer experience (CXX), SLA requirements, Demand Volume and Load pattern:

The above characterization clearly allows categorization of various orders by scaling need, so that we can ensure business KPIs, SLA adherence with optimum TCO.

Then when volume distribution by systems gets applied, this poses another input how the allocation of resources should happen to optimally use the resources. Below table outlines data from 3 different client from different business paradigm and from different markets:

When combined the scaling pattern with volume distribution by system, it is clear, for CSP 1 and CSP 2, given 75%+ of orders require “24 X 7 availability with Auto-scaling” compared to CSP 3, as much as ~40% of the orders require “Scheduled availability with Forced scaling”. Now, that tells us CSP 1 and CSP 2 require larger compute available on-demand. On top of that, there also must be a way to build isolation so that failure of one type of order processing shouldn’t impact other order types.

If order apps is hosted on private cloud, then based on above distribution a single cluster can be separated as follows:

a) Order Apps runtime for Assisted Channel with ability to auto-scale, however, to be available only during contact center operating hours

b) Order Apps runtime for Self-service and Retail with ability to auto-scale 24 X7

c) MNP and Dunning Order Apps runtime with forced scaling, with scheduled availability.

First step is to use <Namespace> to define virtual clusters for above 3 types of order processing use cases. And allocate limit of individual Namespace based on volume/loads. Obviously for Client 1, Assisted Channel should get highest limit, while for Client 2, focus would be for orders coming from Self-Service.

Then use helmchart to create separate deployable units from same Order services container image with different scaling parameters. Here we have a scenario where the same docker image for Order services can be used for different deployment units. We will have same image used for all activation, change order, etc but will have different deployment units created in separate helm charts. The sole purpose of having separate deployment units for same order services with same image is to provide separate resource limits and concurrency setup based on input channel. Helm chart will consist of all necessary resources like deployment, services, ingress etc. Jenkins agent image with helm client needs to created and to be assigned in Jenkins configurations. So, for each deployment unit as configured within a helm chart we should configure the resource limits and auto-scaling configurations (HPA) based on channel specific load as mentioned before.

The last step in the equation to build a framework to allow the deployable units for MNP/Dunning to be only start a certain point of day (or based on an event). Administrator should deploy theses services based on need from Jenkins dashboard or by Jenkins remote access api / or triggering from certain scripts. These deployments are also done from separate helm charts and has separate deployment configurations like resource allocation, minimum number of pods to start with as mentioned in previous step.

With the above setup, we now need a routing capability based on channel and/or Order type, so that the orders requests get assigned to appropriate runtime. This will require certain architectural changes at the order dispatcher layer. Most of the CSPs, use ingress API/ESB layer, and the same needs to have ability to parse either payload to determine originating application/channel and/or order type (in case no changes to interfaces can be made because of various drivers) or rewrite the api/esb flow and introduce header parameters to make routing decision.

In summary, dynamically orchestrated new age Order management system needs to leverage scaling capability availability from container orchestrator to reduce TCO without compromising SLA requirements.

--

--

Chinmohan Biswas

An enterprise architect from Telecom and Hybrid Cloud