Consideration for concurrency parameters from a severless based integration model on AWS for integration layer for Telecom business.

Chinmohan Biswas
6 min readMar 31, 2023

--

Author Credit: Debajyoti Mukherjee

In recent times, we have come across several asks about building a cloud native integration layer leveraging cloud services like serverless. Either clients are looking for that as an serious option to replace traditional middleware or have already implemented the same. Below is a typical component model from a channel integration perspective to accept various types of orders for wireless products:

While building this architecture, there have been number of aspects to be looked upon from middleware functions perspective like guaranteed delivery, failure tracking, retry, throttling etc. However, given serverless, the aspect of concurrency along with auto-scaling is an important element to design for, given the same has impact both from technical as well as commercial side. In the below section, we would focus on:

· What are parameters to look for in context of AWS Lambda functions.

· How to estimate those parameters using Telco domain knowledge.

What are various parameters to look for:

a>Reserved concurrency: We need an effective way to distribute the regional account quota(default in 1000) among the lambda functions. Reserved concurrency limits the concurrency of a particular lambda function.

b>Cold start and Provisioned concurrency: Each invocation of a lambda function runs in an ephemeral environment which get spined up on demand. Lambda downloads the code and initializes the runtime as well as packages, dependencies, or global variables that the function includes. This is a pre-cursor step before executing the Handler method of lambda function and called cold start. Cold start adds latency to request turnaround time. Once the environment is ready, it can satisfy other requests without cold start. Lambda function ephemeral environment can run only 1 request thread at a time and if not used for some time, then Lambda will tear down the environment. Provisioned concurrency is required to reserve some active environments all the time to avoid cold start latency. This can be applied to a particular alias or version of a Lambda function.

How to estimate those parameters using Telco domain knowledge.

Next question is “how do we estimate” these parameters. Here, domain knowledge and data play a key role. As we work in Telecom domain, we have laid out an approach based on Telecom order categories.

Of course, the method to estimate above parameters is from performance modeling technique. And the primary factor used here is “Load Distribution” in context of Telecom Order types. Below table summarizes the distribution from two example clients from “Pay As-you Go” services perspective:

Based on above distribution, we attached two business drivers to these order types, namely “Business Impact” and “SLA Requirement”. And it is no brainer that we “reserve” higher concurrency for revenue generating and CXX impacting services and rest we distribute depending on load distribution.

Now how to estimate “Provisioned Concurrency”. There are two balancing factors to address, 1)Cost and 2)Meeting availability, reliability, performance related NFRs. To derive the cost, below model can be one of the approaches.

First, we derive the memory requirements for Lambda functions of various sizes and evaluate provisioned concurrency cost. Here is a mapping of memory requirements and corresponding indicative monthly cost of 1 provisioned Lambda instance of various sizes. The memory can be altered based on Lambda function implementation.

Now say there is a distribution of 50 Lambda functions for an engagement with allocated reserved concurrency . Below scenarios are the indicative costs for provisioned concurrency:

Scenario1 : 100% provisioned concurrency allocation

Scenario2: 0 provisioned concurrency allocation

No additional cost on top of normal Lambda usage cost. But this will lead to lot of cold starts. As mentioned before there will not be any active Lambda instance always available to serve requests. If a lambda is not used for a considerable amount of time say 5 mins, then subsequent request will lead to a cold start and have an impact on NFR requirements.

As a result, we configure provisioned concurrency to a number which is lesser than reserved concurrency but can handle majority of requests without cold start. However, it’s a trade-off across cost, performance, business impact, SLAs. To balance these forces, one possible approach is to allocate higher weightage to revenue generating and CXX impact related services and go for higher % of “Provisioned Concurrency” to address “Peak” load for those services, while for rest go for least allocation to satisfy “Average” or concurrency needed to address the minimum load without disrupting the business outcome.

Below table shows possible distribution across Order types.

While the above concurrency distribution allows BAU demands to be met, what if for certain services it does breach the estimated “Peak Load” (e.g. Unprecedented demand when popular OTT platform launches new series and CSPs see higher load of subscriptions). In case load becomes higher than allotted “Provisioned Concurrency”, depending on performance and SLA requirement, we can go for autoscaling using below techniques:

Based on expected traffic and an analysis of metrics and requests patterns, a combination of provisioned concurrency and on-demand concurrency may be the cost-effective approach. If the number of concurrent requests crosses the number of provisioned instances, then those requests will get a Lambda environment on demand with cold start. If we want to fully avoid cold starts, then we can use scaling of provisioned instances of a Lambda function with AWS application auto scaling. The below techniques can be adapted for auto scaling of provisioned concurrency

a> Schedule Based: Provisioned concurrency scaling can be scheduled at regular intervals if request spikes are well anticipated. If there is a bulk processing of activation of gpsi numbers at anticipated schedule, then this method is very effective. Here is a AWS white paper : https://aws.amazon.com/blogs/compute/scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage/

b> Dynamic Scaling: This is based on target tracking scaling based on CloudWatch metric. Target value is usage of provisioned concurrency. This is the most effective scaling method as no manual action is required. Here is AWS white paper : https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html.

Our learnings from implementation

Choosing correct lambda implementation.

Depending on your function’s runtime and memory configuration, it’s possible to see some latency variability on the first invocation on an initialized execution environment. For example, .NET and other JIT runtimes can lazily load resources on the first invocation, leading to some latency variability (typically tens of milliseconds). This is suggested to use lightweight lambda implementations with technologies like python. If we need to use java based implementation, it is prescribed to use graalvm based implementations.

Use snapstart instead of provisioned concurrency.

This is also another performance improvement technique for Lambda functions which triggers an optimization process when a new Lambda function version is published. Here are the details : https://aws.amazon.com/blogs/aws/new-accelerate-your-lambda-functions-with-lambda-snapstart/

This incurs no additional cost like provisioned concurrency but also does not provide same performance gains. This can be applied to non-orchestrating lambdas , small and tiny sized lambdas as mentioned before.

For a medium sized lambda here is a latency comparison for a request turnaround time between provisioned Lambda concurrency and snapstart (~33% slower).

In summary, Serverless is emerging as an great alternatives to build hybrid integration platforms for Telco integration landscapes and with right set of performance models, estimated load patterns utilising various concurrency parameters, it is also delivering to an optimised cost points without impacting the business outcome.

--

--