← Back to Blog

Execution at the Limit

The fastest chain deserves the fastest trading experience

· 12 min read

Execution at the Limit

We're obsessed with performance at Legend. Not just the performance of our execution, or our UI, but the performance of every millisecond between a user’s intent and its execution on-chain.

A large part of our user experience depends on speed of execution. When you’re trading against other users, you need to have the best fill possible. We knew that traditional architecture wouldn't cut it. Legend needed something that could scale to 10,000 concurrent users while maintaining blazing fast response times.

This is the story of how we built our Telegram bot architecture to push the absolute limits of performance, and how we evolved from a simple local script to a distributed system that can handle massive scale without compromising on speed.

warning: everything below is pretty technical

There are a lot of different levers that you can pull when trying to build a robust architecture, and each comes with fundamental tradeoffs that impact performance, cost, and reliability.

Once we grew out of testing things locally on a single machine, we started considering our options for more long-term architecture.

Architectural Decision

Option 1: Single Persistent Server (EC2)

The simplest approach would have been to deploy our bot to a single, powerful EC2 instance.

This is appealing because it's dead simple—one machine, one deployment, known performance characteristics. No need to worry about orchestration complexity or distributed systems headaches.

But the downsides become obvious pretty quickly. You can scale from 4 vCPUs to 8, then 16, then 32, but eventually you hit AWS's largest instance types and you're stuck. Worse, if that instance goes down during a major market event, you're completely offline while your users are trying to execute time-sensitive trades. We hate single points of failure.

Option 2: Serverless Functions (Lambda)

AWS Lambda seems attractive for handling variable loads where you only pay for what you use, infinite scale, zero server management.

So basically you never think about servers? Theoretically. You get unlimited concurrent requests, pay-per-invocation pricing, and AWS handling all the infrastructure headaches. Individual function failures don't bring down your entire system either.

But lambda functions have to run somewhere. The "cloud" always needs a physical machine, and when you run serverless that means transferring your code onto a machine if it hasn't been executed in a while, aka a cold start. When Lambda spins up a new function instance, it can add 100-500ms to response times. For a trading bot where users expect sub-100ms responses, this is absolutely unacceptable. You also lose the ability to maintain persistent connections to databases and RPC nodes, which kills performance even further.

Even with provisioned concurrency (which significantly increases costs), you can't eliminate cold starts entirely during traffic spikes.

Option 3: Kubernetes Worker Pool

The Kubernetes approach offers a middle ground: managed infrastructure with fine-grained control over scaling and resource allocation.

You can add worker pods to handle increased load, but individual pod failures don't take down your entire system. You can right-size containers for specific workloads and maintain persistent database connections. Most importantly, pods boot in 1-2 seconds instead of the 100-500ms Lambda cold start penalty.

The downside is operational complexity. Kubernetes can quickly become a clusterfuck. You're now managing deployments, networking, and service discovery. Kubernetes has a notoriously steep learning curve, and you're paying for always-running pods even when idle. There's also the endless YAML configuration files and potential service mesh complexity to consider.

Here's how the scaling characteristics compared:

Response Time Under Load:

Concurrent UsersEC2LambdaKubernetes
10045ms120ms*42ms
1,000150ms80ms48ms
5,000500ms85ms52ms
10,000TIMEOUT90ms58ms

*includes cold start penalty

Why Kubernetes Won

We chose Kubernetes for three critical reasons:

  1. No cold starts, warm connection pools, and consistently fast response times
  2. Pod-level failures are automatically handled without the debugging nightmare of serverless
  3. We can scale aggressively during market events and scale down during quiet periods, all while maintaining performance guarantees

The operational complexity was worth it for the performance guarantees. In crypto trading, a 200ms delay can mean the difference between catching a candle and missing it entirely. Here's our full infrastructure design:

When a user invokes a Telegram action, the message goes from the telegram servers to our server entrypoint at bot.legend.trade. An AWS load balancer sits between the pods in our cluster and the incoming traffic and routes traffic appropriately based on our routing rules. The kubernetes cluster nodes that have worker pods also run our grafana + prometheus based observability stack so that we can trace and profile every request that comes in.

For deploying our servers, we use Docker to bundle code into images that can be pulled by kubernetes pods. Since we can autoscale aggressively horizontally, our individual instances in the cluster are relatively cheap.

Geographic Optimization

Once we decided on Kubernetes, the next critical question was: where should we deploy our cluster?

In a distributed system, geography is destiny. The speed of light imposes fundamental limits on how fast data can travel between servers, and every additional kilometer adds latency to your system.

The Network Latency Reality

Geographic distance has a direct, unavoidable impact on latency. The theoretical minimum round-trip time between two points is determined by the speed of light through fiber optic cables:

Theoretical minimum latency = (Distance × 2) / (Speed of light in fiber)
The fundamental physical limit imposed by the speed of light

In practice, this works out to roughly:

  • 1ms per 100km for a round-trip in ideal conditions
  • Real-world fiber routes add 20-50% overhead due to routing inefficiencies
  • Network equipment adds 1-5ms per hop

So if you're 1,000km away from a server, you're looking at a minimum of 10ms round-trip time, and realistically 15-20ms.

The Telegram Server Mystery

For our Telegram bot, the critical path includes a round-trip to Telegram's servers:

Message Flow Performance Profile:

Complete Message Flow
Complete Message Flow
116ms (100.0%)
0ms - 116ms
User Input
5ms (4.3%)
0ms - 5ms
Telegram Mobile
Telegram Mobile
15ms (12.9%)
5ms - 20ms
Telegram → Webhook
Telegram → Webhook
18ms (15.5%)
20ms - 38ms
Message Routing
8ms (6.9%)
20ms - 28ms
Network Delivery
Network Delivery
10ms (8.6%)
28ms - 38ms
Our Processing
Our Processing
45ms (38.8%)
38ms - 83ms
Auth & Parse
5ms (4.3%)
38ms - 43ms
User Lookup
3ms (2.6%)
43ms - 46ms
Balance Check
8ms (6.9%)
46ms - 54ms
Price Query
Price Query
12ms (10.3%)
54ms - 66ms
Trade Execution
Trade Execution
15ms (12.9%)
66ms - 81ms
Response Format
2ms (1.7%)
81ms - 83ms
Response → User
Response → User
33ms (28.4%)
83ms - 116ms
Network Send
Network Send
18ms (15.5%)
83ms - 101ms
Mobile Delivery
Mobile Delivery
15ms (12.9%)
101ms - 116ms
0ms
29ms
58ms
87ms
116ms
Total Duration: 116msLevels: 3

Telegram API Latency by AWS Region (average):

AWS RegionMeanP95P99
eu-central-112ms18ms24ms
eu-west-124ms32ms41ms
us-east-189ms105ms128ms
us-west-2156ms178ms201ms
ap-southeast-1167ms189ms215ms
ap-northeast-1145ms168ms192ms

eu-central-1 was the clear winner

The clear winner: eu-central-1 (Frankfurt)

The data strongly suggested that Telegram's primary API infrastructure was located in or very close to Frankfurt, Germany. This made sense—Frankfurt is one of Europe's largest internet exchange points and a natural location for serving European users.

The Latency Breakdown

With our cluster in eu-central-1, here's how our end-to-end latency looked:

Full Message Flow Timing:

1. User input in Telegram app
0ms
2. Telegram app → Telegram servers(user's location dependent)
X ms
3. Telegram servers → Our webhook(our measurement)
12-18ms
4. Our processing time(optimized)
30-50ms
5. Our response → Telegram servers(return trip)
12-18ms
6. Telegram servers → User app(user's location dependent)
X ms
Total end-to-end latency:54-86ms + 2X ms

Compare this to if we had deployed in us-west-2 (Oregon):

Suboptimal Deployment (us-west-2):

1. User input in Telegram app
0ms
2. Telegram app → Telegram servers
X ms
3. Telegram servers → Our webhook(ouch!)
156-178ms
4. Our processing time
30-50ms
5. Our response → Telegram servers(double ouch!)
156-178ms
6. Telegram servers → User app
X ms
Total end-to-end latency:342-426ms + 2X ms

The geographic choice alone saved us 288-320ms per message.

But how does this impact our other service interactions?

You may be thinking that the latency gain from colocating with Telegram servers is ultimately offset by additional incurred latency from being geographically far away from other external APIs and servers we rely on. We tested this extensively ourselves and observed a consistent ~35ms response time from our wallet auth provider with a sub-millisecond jitter, and we've moved all of our other servers (e.g. databases) to eu-central-1 to colocate. Thus, the relocation is basically all upside.

Monitoring and Observability

With the capacity for thousands of requests per minute flowing through our system, visibility into performance is critical. We've built a comprehensive observability stack that gives us real-time insights into every aspect of our infrastructure.

Prometheus Metrics Collection:
We track custom metrics for every component:

  • Message processing latency (by command type)
  • Active user counts and session duration
  • RPC node health and response times
  • Database query performance and cache hit rates
  • Pod resource utilization and autoscaling events

Grafana Dashboards:
Our dashboards surface the metrics that matter most for trading performance:

  • Real-time latency percentiles (P50, P95, P99)
  • Traffic volumes, error rates and failure patterns
  • Infrastructure and resource efficiency

Conclusion

Building our Telegram bot architecture taught us that achieving true low-latency requires optimization at every layer:

  • Infrastructure: From webhook delivery to worker processing
  • Code: From database queries to response formatting
  • Architecture: From monolithic polling to distributed event-driven design
  • Operations: From manual scaling to predictive auto-scaling

The result is a system that can handle 10,000 concurrent users while maintaining sub-60ms response times - and we're just getting started.

The fastest chain deserves the fastest trading experience. And we're committed to pushing the limits of what's possible, one millisecond at a time.