We're obsessed with performance at Legend. Not just the performance of our execution, or our UI, but the performance of every millisecond between a user’s intent and its execution on-chain.
A large part of our user experience depends on speed of execution. When you’re trading against other users, you need to have the best fill possible. We knew that traditional architecture wouldn't cut it. Legend needed something that could scale to 10,000 concurrent users while maintaining blazing fast response times.
This is the story of how we built our Telegram bot architecture to push the absolute limits of performance, and how we evolved from a simple local script to a distributed system that can handle massive scale without compromising on speed.
warning: everything below is pretty technical
There are a lot of different levers that you can pull when trying to build a robust architecture, and each comes with fundamental tradeoffs that impact performance, cost, and reliability.
Once we grew out of testing things locally on a single machine, we started considering our options for more long-term architecture.
Architectural Decision
Option 1: Single Persistent Server (EC2)
The simplest approach would have been to deploy our bot to a single, powerful EC2 instance.
This is appealing because it's dead simple—one machine, one deployment, known performance characteristics. No need to worry about orchestration complexity or distributed systems headaches.
But the downsides become obvious pretty quickly. You can scale from 4 vCPUs to 8, then 16, then 32, but eventually you hit AWS's largest instance types and you're stuck. Worse, if that instance goes down during a major market event, you're completely offline while your users are trying to execute time-sensitive trades. We hate single points of failure.
Option 2: Serverless Functions (Lambda)
AWS Lambda seems attractive for handling variable loads where you only pay for what you use, infinite scale, zero server management.
So basically you never think about servers? Theoretically. You get unlimited concurrent requests, pay-per-invocation pricing, and AWS handling all the infrastructure headaches. Individual function failures don't bring down your entire system either.
But lambda functions have to run somewhere. The "cloud" always needs a physical machine, and when you run serverless that means transferring your code onto a machine if it hasn't been executed in a while, aka a cold start. When Lambda spins up a new function instance, it can add 100-500ms to response times. For a trading bot where users expect sub-100ms responses, this is absolutely unacceptable. You also lose the ability to maintain persistent connections to databases and RPC nodes, which kills performance even further.
Even with provisioned concurrency (which significantly increases costs), you can't eliminate cold starts entirely during traffic spikes.
Option 3: Kubernetes Worker Pool
The Kubernetes approach offers a middle ground: managed infrastructure with fine-grained control over scaling and resource allocation.
You can add worker pods to handle increased load, but individual pod failures don't take down your entire system. You can right-size containers for specific workloads and maintain persistent database connections. Most importantly, pods boot in 1-2 seconds instead of the 100-500ms Lambda cold start penalty.
The downside is operational complexity. Kubernetes can quickly become a clusterfuck. You're now managing deployments, networking, and service discovery. Kubernetes has a notoriously steep learning curve, and you're paying for always-running pods even when idle. There's also the endless YAML configuration files and potential service mesh complexity to consider.
Here's how the scaling characteristics compared:
Response Time Under Load:
Concurrent Users | EC2 | Lambda | Kubernetes |
---|---|---|---|
100 | 45ms | 120ms* | 42ms |
1,000 | 150ms | 80ms | 48ms |
5,000 | 500ms | 85ms | 52ms |
10,000 | TIMEOUT | 90ms | 58ms |
*includes cold start penalty
Why Kubernetes Won
We chose Kubernetes for three critical reasons:
- No cold starts, warm connection pools, and consistently fast response times
- Pod-level failures are automatically handled without the debugging nightmare of serverless
- We can scale aggressively during market events and scale down during quiet periods, all while maintaining performance guarantees
The operational complexity was worth it for the performance guarantees. In crypto trading, a 200ms delay can mean the difference between catching a candle and missing it entirely. Here's our full infrastructure design:
When a user invokes a Telegram action, the message goes from the telegram servers to our server entrypoint at bot.legend.trade. An AWS load balancer sits between the pods in our cluster and the incoming traffic and routes traffic appropriately based on our routing rules. The kubernetes cluster nodes that have worker pods also run our grafana + prometheus based observability stack so that we can trace and profile every request that comes in.
For deploying our servers, we use Docker to bundle code into images that can be pulled by kubernetes pods. Since we can autoscale aggressively horizontally, our individual instances in the cluster are relatively cheap.
Geographic Optimization
Once we decided on Kubernetes, the next critical question was: where should we deploy our cluster?
In a distributed system, geography is destiny. The speed of light imposes fundamental limits on how fast data can travel between servers, and every additional kilometer adds latency to your system.
The Network Latency Reality
Geographic distance has a direct, unavoidable impact on latency. The theoretical minimum round-trip time between two points is determined by the speed of light through fiber optic cables:
In practice, this works out to roughly:
- 1ms per 100km for a round-trip in ideal conditions
- Real-world fiber routes add 20-50% overhead due to routing inefficiencies
- Network equipment adds 1-5ms per hop
So if you're 1,000km away from a server, you're looking at a minimum of 10ms round-trip time, and realistically 15-20ms.
The Telegram Server Mystery
For our Telegram bot, the critical path includes a round-trip to Telegram's servers:
Message Flow Performance Profile:
Telegram API Latency by AWS Region (average):
AWS Region | Mean | P95 | P99 |
---|---|---|---|
eu-central-1 | 12ms | 18ms | 24ms |
eu-west-1 | 24ms | 32ms | 41ms |
us-east-1 | 89ms | 105ms | 128ms |
us-west-2 | 156ms | 178ms | 201ms |
ap-southeast-1 | 167ms | 189ms | 215ms |
ap-northeast-1 | 145ms | 168ms | 192ms |
eu-central-1 was the clear winner
The clear winner: eu-central-1 (Frankfurt)
The data strongly suggested that Telegram's primary API infrastructure was located in or very close to Frankfurt, Germany. This made sense—Frankfurt is one of Europe's largest internet exchange points and a natural location for serving European users.
The Latency Breakdown
With our cluster in eu-central-1, here's how our end-to-end latency looked:
Full Message Flow Timing:
Compare this to if we had deployed in us-west-2 (Oregon):
Suboptimal Deployment (us-west-2):
The geographic choice alone saved us 288-320ms per message.
But how does this impact our other service interactions?
You may be thinking that the latency gain from colocating with Telegram servers is ultimately offset by additional incurred latency from being geographically far away from other external APIs and servers we rely on. We tested this extensively ourselves and observed a consistent ~35ms response time from our wallet auth provider with a sub-millisecond jitter, and we've moved all of our other servers (e.g. databases) to eu-central-1 to colocate. Thus, the relocation is basically all upside.
Monitoring and Observability
With the capacity for thousands of requests per minute flowing through our system, visibility into performance is critical. We've built a comprehensive observability stack that gives us real-time insights into every aspect of our infrastructure.
Prometheus Metrics Collection:
We track custom metrics for every component:
- Message processing latency (by command type)
- Active user counts and session duration
- RPC node health and response times
- Database query performance and cache hit rates
- Pod resource utilization and autoscaling events
Grafana Dashboards:
Our dashboards surface the metrics that matter most for trading performance:
- Real-time latency percentiles (P50, P95, P99)
- Traffic volumes, error rates and failure patterns
- Infrastructure and resource efficiency
Conclusion
Building our Telegram bot architecture taught us that achieving true low-latency requires optimization at every layer:
- Infrastructure: From webhook delivery to worker processing
- Code: From database queries to response formatting
- Architecture: From monolithic polling to distributed event-driven design
- Operations: From manual scaling to predictive auto-scaling
The result is a system that can handle 10,000 concurrent users while maintaining sub-60ms response times - and we're just getting started.
The fastest chain deserves the fastest trading experience. And we're committed to pushing the limits of what's possible, one millisecond at a time.