Skip to content

AWS Python Lambda Timeout Handler


August 2, 2024

AWS Serverless functions are ideal for handling small tasks such as API calls, retrieving data from databases or S3, or calling an LLM and returning the response. However, challenges arise when tasks take longer than expected due to processes running longer or encountering exceptions that do not terminate the lambda.

In such scenarios, the lambda may run into a timeout. By default, AWS Lambda functions have a timeout of 3 seconds, extendable up to 15 minutes. When a lambda times out, it restarts with the same initial event (unless disabled), which may be beneficial for resolving temporary issues but problematic for inherently lengthy operations.

Problem Scenario

Consider a scenario where we need to calculate a specific prime number and store it in a database. Below is an example code to find a prime number at a specific position:

python
def find_exact_prime_number_at_position(position: int, start_prime: int = 2, start_counter: int = 0):
    counter = start_counter
    prime = start_prime
    while counter < position:
        prime += 1
        if is_prime(prime):
            counter += 1
    return prime

def is_prime(number: int):
    if number < 2:
        return False
    for i in range(2, int(number ** 0.5) + 1):
        if number % i == 0:
            return False
    return True

Our lambda function could call this function with the user input position of the prime number like this:

python
def lambda_handler(event, context):
    position = event["position"]
    answer = find_exact_prime_number_at_position(position)
    # store answer in database
    return {"status": 200}

If our lambda function gets a position as input, it can store the associated prime number in a database reliably. However, calculating the 10,000th prime number would take around 2 minutes, which exceeds a 3-second lambda timeout. The solution for this would be to adjust the lambda settings and up the timeout to lets say 5 mimutes. But what if somebody want even higher prime numbers?

Solution: Timeout Handler

To handle this, we introduce a timeout catcher that intercepts the lambda timeout before it occurs, allowing the lambda to restart with an updated state and continue processing. Here’s how it’s implemented:

Lambda Timeout Handler Implementation

python
import json
import os
import signal
import boto3

# Global variables to store the latest prime number and the counter
latest_prime = 2
latest_counter = 0

def timeout_handler(_signal, _frame):
    print("Timeout Handler!")   
    payload = {
        "latest_prime": latest_prime,
        "latest_counter": latest_counter,
        "position": position,
    }
    client = boto3.client("lambda")
    client.invoke_async(
        FunctionName=os.environ["LAMBDA_ARN"],
        InvokeArgs=json.dumps(payload),
    )
    print("Invoked Lambda Async!")
    return {
        "statusCode": 200,
        "body": json.dumps("Invoked Lambda Async!"),
    }

signal.signal(signal.SIGALRM, timeout_handler)

def lambda_handler(event, context):
    print("Received event: %s" % json.dumps(event))

    global latest_prime
    latest_prime = event.get("latest_prime", 2)
    global latest_counter
    latest_counter = event.get("latest_counter", 0)
    global position
    position = event.get("position", 10000)

    # The timeout is set to 15 seconds less than the remaining time
    timeout = int(context.get_remaining_time_in_millis() / 1000) - 15
    signal.alarm(timeout)
    try:
        prime_number_we_search = find_exact_prime_number_at_position(position, latest_prime, latest_counter)
        # Store the prime number in the database
        print(f"Latest Prime: {prime_number_we_search}")
        # Disable the alarm
        signal.alarm(0)
    except Exception as e:
        print(f"Error: {e}")
        return {"statusCode": 500, "body": json.dumps("Error!")}
    return {"statusCode": 200, "body": json.dumps("Function finished!")}

By using the signal package, we set handlers for asynchronous events. The timeout is set to 15 seconds before the actual lambda timeout, giving ample time to catch and handle the timeout by invoking the lambda again with updated state, effectively resetting the timeout. We update the state by reading the global variables and passing them to the new invokation of the lambda

Updated Prime Number Function

The prime number find function needs to update our global variables with every prime number it finds

python
def find_exact_prime_number_at_position(position: int, start_prime: int = 2, start_counter: int = 0):
    global latest_prime, latest_counter 
    counter = start_counter
    prime = start_prime
    while counter < position:
        prime += 1
        if is_prime(prime):
            latest_prime = prime 
            counter += 1
            latest_counter = counter 
    return prime

Considerations

  • Asynchronous Lambdas: This approach is suitable only for lambdas called asynchronously, where a response is not expected by the caller, such as lambdas that call an LLM and store the result in a database for frontend access.
  • Main Event Loop: The signal handler can only be started in the main event loop of the python process. The lambda will not terminate if threads are still running.

Last updated: