Execution at the Limit · Blog

We're obsessed with performance at Legend. Not just the performance of our execution, or our UI, but the performance of every millisecond between a user’s intent and its execution on-chain.

A large part of our user experience depends on speed of execution. When you’re trading against other users, you need to have the best fill possible. We knew that traditional architecture wouldn't cut it. Legend needed something that could scale to 10,000 concurrent users while maintaining blazing fast response times.

This is the story of how we built our Telegram bot architecture to push the absolute limits of performance, and how we evolved from a simple local script to a distributed system that can handle massive scale without compromising on speed.

warning: everything below is pretty technical

There are a lot of different levers that you can pull when trying to build a robust architecture, and each comes with fundamental tradeoffs that impact performance, cost, and reliability.

Once we grew out of testing things locally on a single machine, we started considering our options for more long-term architecture.

Architectural Decision

Option 1: Single Persistent Server (EC2)

The simplest approach would have been to deploy our bot to a single, powerful EC2 instance.

This is appealing because it's dead simple—one machine, one deployment, known performance characteristics. No need to worry about orchestration complexity or distributed systems headaches.

But the downsides become obvious pretty quickly. You can scale from 4 vCPUs to 8, then 16, then 32, but eventually you hit AWS's largest instance types and you're stuck. Worse, if that instance goes down during a major market event, you're completely offline while your users are trying to execute time-sensitive trades. We hate single points of failure.

Option 2: Serverless Functions (Lambda)

AWS Lambda seems attractive for handling variable loads where you only pay for what you use, infinite scale, zero server management.

So basically you never think about servers? Theoretically. You get unlimited concurrent requests, pay-per-invocation pricing, and AWS handling all the infrastructure headaches. Individual function failures don't bring down your entire system either.

But lambda functions have to run somewhere. The "cloud" always needs a physical machine, and when you run serverless that means transferring your code onto a machine if it hasn't been executed in a while, aka a cold start. When Lambda spins up a new function instance, it can add 100-500ms to response times. For a trading bot where users expect sub-100ms responses, this is absolutely unacceptable. You also lose the ability to maintain persistent connections to databases and RPC nodes, which kills performance even further.

Even with provisioned concurrency (which significantly increases costs), you can't eliminate cold starts entirely during traffic spikes.

Option 3: Kubernetes Worker Pool

The Kubernetes approach offers a middle ground: managed infrastructure with fine-grained control over scaling and resource allocation.

You can add worker pods to handle increased load, but individual pod failures don't take down your entire system. You can right-size containers for specific workloads and maintain persistent database connections. Most importantly, pods boot in 1-2 seconds instead of the 100-500ms Lambda cold start penalty.

The downside is operational complexity. Kubernetes can quickly become a clusterfuck. You're now managing deployments, networking, and service discovery. Kubernetes has a notoriously steep learning curve, and you're paying for always-running pods even when idle. There's also the endless YAML configuration files and potential service mesh complexity to consider.

Here's how the scaling characteristics compared:

Response Time Under Load:

Concurrent Users	EC2	Lambda	Kubernetes
100	45ms	120ms*	42ms
1,000	150ms	80ms	48ms
5,000	500ms	85ms	52ms
10,000	TIMEOUT	90ms	58ms

*includes cold start penalty

Why Kubernetes Won

We chose Kubernetes for three critical reasons:

No cold starts, warm connection pools, and consistently fast response times
Pod-level failures are automatically handled without the debugging nightmare of serverless
We can scale aggressively during market events and scale down during quiet periods, all while maintaining performance guarantees

The operational complexity was worth it for the performance guarantees. In crypto trading, a 200ms delay can mean the difference between catching a candle and missing it entirely. Here's our full infrastructure design:

When a user invokes a Telegram action, the message goes from the telegram servers to our server entrypoint at bot.legend.trade. An AWS load balancer sits between the pods in our cluster and the incoming traffic and routes traffic appropriately based on our routing rules. The kubernetes cluster nodes that have worker pods also run our grafana + prometheus based observability stack so that we can trace and profile every request that comes in.

For deploying our servers, we use Docker to bundle code into images that can be pulled by kubernetes pods. Since we can autoscale aggressively horizontally, our individual instances in the cluster are relatively cheap.

Geographic Optimization

Once we decided on Kubernetes, the next critical question was: where should we deploy our cluster?

In a distributed system, geography is destiny. The speed of light imposes fundamental limits on how fast data can travel between servers, and every additional kilometer adds latency to your system.

The Network Latency Reality

Geographic distance has a direct, unavoidable impact on latency. The theoretical minimum round-trip time between two points is determined by the speed of light through fiber optic cables:

Theoretical minimum latency = (Distance × 2) / (Speed of light in fiber)

The fundamental physical limit imposed by the speed of light

In practice, this works out to roughly:

1ms per 100km for a round-trip in ideal conditions
Real-world fiber routes add 20-50% overhead due to routing inefficiencies
Network equipment adds 1-5ms per hop

So if you're 1,000km away from a server, you're looking at a minimum of 10ms round-trip time, and realistically 15-20ms.

The Telegram Server Mystery

For our Telegram bot, the critical path includes a round-trip to Telegram's servers:

Message Flow Performance Profile:

Complete Message Flow

116ms (100.0%)

0ms - 116ms

User Input

5ms (4.3%)

0ms - 5ms

Telegram Mobile

15ms (12.9%)

5ms - 20ms

Telegram → Webhook

18ms (15.5%)

20ms - 38ms

Message Routing

8ms (6.9%)

20ms - 28ms

Network Delivery

10ms (8.6%)

28ms - 38ms

Our Processing

45ms (38.8%)

38ms - 83ms

Auth & Parse

5ms (4.3%)

38ms - 43ms

User Lookup

3ms (2.6%)

43ms - 46ms

Balance Check

8ms (6.9%)

46ms - 54ms

Price Query

12ms (10.3%)

54ms - 66ms

Trade Execution

15ms (12.9%)

66ms - 81ms

Response Format

2ms (1.7%)

81ms - 83ms

Response → User

33ms (28.4%)

83ms - 116ms

Network Send

18ms (15.5%)

83ms - 101ms

Mobile Delivery

15ms (12.9%)

101ms - 116ms

0ms

29ms

58ms

87ms

116ms

Total Duration: 116msLevels: 3

Telegram API Latency by AWS Region (average):

AWS Region	Mean	P95	P99
eu-central-1	12ms	18ms	24ms
eu-west-1	24ms	32ms	41ms
us-east-1	89ms	105ms	128ms
us-west-2	156ms	178ms	201ms
ap-southeast-1	167ms	189ms	215ms
ap-northeast-1	145ms	168ms	192ms

eu-central-1 was the clear winner

The clear winner: eu-central-1 (Frankfurt)

The data strongly suggested that Telegram's primary API infrastructure was located in or very close to Frankfurt, Germany. This made sense—Frankfurt is one of Europe's largest internet exchange points and a natural location for serving European users.

The Latency Breakdown

With our cluster in eu-central-1, here's how our end-to-end latency looked:

Full Message Flow Timing:

1. User input in Telegram app

→ 0ms

2. Telegram app → Telegram servers(user's location dependent)

→ X ms

3. Telegram servers → Our webhook(our measurement)

→ 12-18ms

4. Our processing time(optimized)

→ 30-50ms

5. Our response → Telegram servers(return trip)

→ 12-18ms

6. Telegram servers → User app(user's location dependent)

→ X ms

Total end-to-end latency:54-86ms + 2X ms

Compare this to if we had deployed in us-west-2 (Oregon):

Suboptimal Deployment (us-west-2):

1. User input in Telegram app

→ 0ms

2. Telegram app → Telegram servers

→ X ms

3. Telegram servers → Our webhook(ouch!)

→ 156-178ms

4. Our processing time

→ 30-50ms

5. Our response → Telegram servers(double ouch!)

→ 156-178ms

6. Telegram servers → User app

→ X ms

Total end-to-end latency:342-426ms + 2X ms

The geographic choice alone saved us 288-320ms per message.

But how does this impact our other service interactions?

You may be thinking that the latency gain from colocating with Telegram servers is ultimately offset by additional incurred latency from being geographically far away from other external APIs and servers we rely on. We tested this extensively ourselves and observed a consistent ~35ms response time from our wallet auth provider with a sub-millisecond jitter, and we've moved all of our other servers (e.g. databases) to eu-central-1 to colocate. Thus, the relocation is basically all upside.

Monitoring and Observability

With the capacity for thousands of requests per minute flowing through our system, visibility into performance is critical. We've built a comprehensive observability stack that gives us real-time insights into every aspect of our infrastructure.

Prometheus Metrics Collection:
We track custom metrics for every component:

Message processing latency (by command type)
Active user counts and session duration
RPC node health and response times
Database query performance and cache hit rates
Pod resource utilization and autoscaling events

Grafana Dashboards:
Our dashboards surface the metrics that matter most for trading performance:

Real-time latency percentiles (P50, P95, P99)
Traffic volumes, error rates and failure patterns
Infrastructure and resource efficiency

Conclusion

Building our Telegram bot architecture taught us that achieving true low-latency requires optimization at every layer:

Infrastructure: From webhook delivery to worker processing
Code: From database queries to response formatting
Architecture: From monolithic polling to distributed event-driven design
Operations: From manual scaling to predictive auto-scaling

The result is a system that can handle 10,000 concurrent users while maintaining sub-60ms response times - and we're just getting started.

The fastest chain deserves the fastest trading experience. And we're committed to pushing the limits of what's possible, one millisecond at a time.