Load Testing Aurora
Load testing plays a crucial role in understanding the performance boundaries of real-time analytics platforms. By simulating realistic usage patterns, load testing helps identify bottlenecks, optimize resource allocation, and ensure system reliability under high-volume conditions.
We conducted comprehensive load testing on Aurora to evaluate its performance across multiple dimensions: Kafka stream ingestion, ClickHouse query processing, AI-powered natural language query generation, and end-to-end user experience. Our testing methodology simulates real-world scenarios where data analysts and business users interact with streaming data through both traditional SQL queries and natural language interfaces.
Note: As a small open-source team, we designed our tests to be cost-effective while still demonstrating Aurora’s capabilities. Our infrastructure budget for testing was approximately $200 over a 3-day period.
Methodology
Section titled “Methodology”Infrastructure Setup
Section titled “Infrastructure Setup”| Component | Configuration | Details |
|---|---|---|
| Kafka Cluster | Single broker, 3 partitions per topic | Docker container on test machine |
| ClickHouse | Instance Type: t3.large | vCPUs: 2, Memory: 8 GB, Storage: 50 GB gp3 |
| Aurora Backend | Instance Type: t3.small | vCPUs: 2, Memory: 2 GB, FastAPI + Uvicorn |
| Aurora Frontend | Local development | Next.js application running locally |
| AI Service | OpenAI GPT-3.5-turbo | Cost-optimized model for natural language processing |
Data Generation and Ingestion
Section titled “Data Generation and Ingestion”To demonstrate Aurora’s real-world applicability, we simulated an e-commerce scenario where a small online retail platform needs to analyze real-time user behavior, product interactions, and sales metrics. This use case showcases Aurora’s ability to handle moderate-frequency data streams while providing both technical and business users with flexible query interfaces.
We generated synthetic e-commerce data at varying ingestion rates:
- Peak Load: 500 events per second for 2 minutes
- Sustained Load: 200 events per second for 10 minutes
- Burst Load: 1,000 events per second for 30 seconds
Each event contained 12 fields including:
timestamp: Event occurrence timeproduct_id: Product identifierprice: Product pricequantity: Quantity purchaseduser_id: Customer identifieraction_type: View/add_to_cart/purchasesession_id: User session identifierregion: Geographic regionuser_type: New/returning customercategory: Product categorybrand: Product branddevice_type: Mobile/desktop/tablet
Dataset Description
Section titled “Dataset Description”Our test dataset consisted of 5 million historical e-commerce records spanning 1 month of simulated retail data. The dataset included:
- User Interactions: 3.5 million records of page views, cart additions, and purchases
- Product Events: 1 million product catalog updates and price changes
- Session Data: 0.5 million user session events
The data was partitioned by date for optimal query performance. We designed 6 test queries of varying complexity that represent common analytical tasks in e-commerce:
- Real-time Conversion Monitoring:
SELECT COUNT(*) FROM events WHERE action_type = 'purchase' AND timestamp > now() - INTERVAL 1 HOUR - Sales Analysis:
SELECT product_id, SUM(quantity) as total_sold FROM events WHERE action_type = 'purchase' AND timestamp >= '2024-01-01' GROUP BY product_id ORDER BY total_sold DESC LIMIT 10 - Customer Behavior:
SELECT user_id, COUNT(*) as interaction_count, AVG(price) as avg_price FROM events WHERE user_type = 'returning' GROUP BY user_id HAVING interaction_count > 50 - Category Performance:
SELECT category, AVG(price) as avg_price FROM events WHERE action_type = 'purchase' AND timestamp >= '2024-01-01' GROUP BY category ORDER BY avg_price DESC - Regional Analysis:
SELECT region, COUNT(*) as event_count, AVG(price) as avg_price FROM events WHERE timestamp >= '2024-01-01' GROUP BY region ORDER BY event_count DESC - Time Series Analysis:
SELECT toStartOfHour(timestamp) as hour, category, AVG(price) as avg_price FROM events WHERE category IN ('electronics', 'clothing', 'books') GROUP BY hour, category ORDER BY hour DESC LIMIT 50
Natural Language Query Testing
Section titled “Natural Language Query Testing”Aurora’s unique value proposition lies in its AI-powered natural language query interface. We tested 4 natural language queries that business users might ask:
- “Show me the top 5 best-selling products this week”
- “What’s the average order value for returning customers?”
- “Which region has the most sales activity?”
- “Show me hourly sales trends for electronics category”
Measurement Methodology
Section titled “Measurement Methodology”We implemented monitoring across system components using open-source tools:
- Kafka Metrics: Message throughput, consumer lag (via Kafka Manager)
- ClickHouse Performance: Query execution time via built-in system tables
- Aurora Backend: API response times using custom logging
- AI Processing: OpenAI API response times and token usage
- End-to-End Latency: Total time from user request to response delivery
Resource constraints: We used a combination of CloudWatch free tier, custom logging, and manual observation due to budget limitations.
Load Testing Objectives
Section titled “Load Testing Objectives”Our primary objectives were:
- Maintain Query Performance: Ensure sub-2 second response times for 90% of SQL queries under sustained load
- AI Query Reliability: Process natural language queries within 5 seconds
- Concurrent User Support: Handle 15+ simultaneous users without significant performance degradation
- Data Freshness: Maintain data availability with <10 second ingestion lag
- System Stability: Achieve stable performance during moderate load conditions
Results
Section titled “Results”Stream Ingestion Performance
Section titled “Stream Ingestion Performance”Peak Load (500 events/sec for 2 minutes):
- Average ingestion latency: 1.2 seconds
- 99th percentile latency: 3.1 seconds
- Zero message loss
- ClickHouse insertion rate: 480 events/sec sustained
Sustained Load (200 events/sec for 10 minutes):
- Average ingestion latency: 0.8 seconds
- 99th percentile latency: 2.1 seconds
- Zero message loss
- ClickHouse insertion rate: 195 events/sec sustained
Burst Load (1,000 events/sec for 30 seconds):
- Average ingestion latency: 2.1 seconds
- 99th percentile latency: 5.8 seconds
- 0.3% message loss (temporary buffer overflow)
- ClickHouse insertion rate: 950 events/sec peak
SQL Query Performance
Section titled “SQL Query Performance”Query performance tests were conducted on the 5 million row dataset while maintaining continuous data ingestion at 200 events per second. Each query was executed 5 times, and results include end-to-end API response times:
| Query | Complexity | Avg Time (s) | Min Time (s) | Max Time (s) | 95th Percentile (s) |
|---|---|---|---|---|---|
| Real-time Conversion Monitoring | Simple | 0.31 | 0.24 | 0.42 | 0.38 |
| Sales Analysis | Medium | 0.68 | 0.59 | 0.81 | 0.76 |
| Customer Behavior | Medium | 0.89 | 0.76 | 1.05 | 0.98 |
| Category Performance | Medium | 0.72 | 0.65 | 0.83 | 0.79 |
| Regional Analysis | Simple | 0.45 | 0.38 | 0.54 | 0.51 |
| Time Series Analysis | Complex | 1.24 | 1.12 | 1.41 | 1.35 |
Natural Language Query Performance
Section titled “Natural Language Query Performance”AI-powered query processing results (using GPT-3.5-turbo):
| Natural Language Query | Avg Processing Time (s) | Manual Validation Result |
|---|---|---|
| ”Show me the top 5 best-selling products this week” | 3.2 | Correct SQL generated |
| ”What’s the average order value for returning customers?“ | 2.8 | Correct SQL generated |
| ”Which region has the most sales activity?“ | 2.1 | Correct SQL generated |
| ”Show me hourly sales trends for electronics category” | 4.1 | Mostly correct (minor syntax adjustment needed) |
Average Natural Language Processing Time: 3.1 seconds Manual Validation Success Rate: 87.5% (3.5/4 queries generated correct SQL)
Note: Due to resource constraints, we manually validated AI-generated queries rather than implementing automated accuracy testing.
Concurrent User Performance
Section titled “Concurrent User Performance”15 Simultaneous Users:
- Average API response time: 1.2 seconds
- 95th percentile response time: 2.1 seconds
- Error rate: 0.5%
- System CPU utilization: 68%
- Memory usage: 1.6 GB / 2 GB
25 Simultaneous Users:
- Average API response time: 2.8 seconds
- 95th percentile response time: 4.2 seconds
- Error rate: 3.1%
- System CPU utilization: 89%
- Memory usage: 1.9 GB / 2 GB
Resource Utilization
Section titled “Resource Utilization”| Component | CPU Usage | Memory Usage | Network I/O |
|---|---|---|---|
| ClickHouse | 58% | 6.2 GB / 8 GB | 120 Mbps |
| Aurora Backend | 42% | 1.4 GB / 2 GB | 15 Mbps |
| Kafka Consumer | 18% | 512 MB / 1 GB | 25 Mbps |
Performance Analysis
Section titled “Performance Analysis”Strengths
Section titled “Strengths”- Good Query Performance: Most SQL queries completed within 1.5 seconds for moderate-sized datasets
- Reliable Data Ingestion: Maintained 99.7% message delivery rate during normal operations
- Functional AI Integration: Natural language queries processed successfully in under 5 seconds
- Cost-Effective Architecture: System handled 15 concurrent users on budget-friendly infrastructure
- Real-time Capabilities: Data available for querying within 5 seconds of ingestion
Limitations and Bottlenecks
Section titled “Limitations and Bottlenecks”- Memory Constraints: ClickHouse memory usage approached limits during complex queries
- AI Processing Overhead: Natural language queries require 3-4x more processing time than direct SQL
- Concurrent User Limits: Performance degraded significantly beyond 20 simultaneous users
- Infrastructure Limits: Single Kafka broker creates potential bottleneck for higher loads
- Cost Constraints: Limited testing duration due to cloud infrastructure costs
Optimization Opportunities
Section titled “Optimization Opportunities”- Resource Scaling: Upgrading to larger instances would improve performance but increase costs
- Caching Strategy: Implementing query result caching for frequently accessed data
- AI Model Optimization: Consider using local models to reduce API costs and latency
- Kafka Scaling: Multiple brokers would improve ingestion reliability
- Connection Pooling: Optimizing database connections for better concurrency
Conclusion
Section titled “Conclusion”Aurora successfully demonstrates the viability of combining real-time analytics with AI-powered natural language querying on a modest budget. The platform handles realistic small-to-medium workloads with 15+ concurrent users, processing 200+ events per second while maintaining reasonable response times for most queries.
The platform shows promise for small to medium-sized e-commerce organizations needing both technical and business user access to streaming data. While our testing was necessarily limited by budget constraints, the results indicate that Aurora’s architecture is sound and could scale with additional resources.
With proper resource allocation as the project grows, Aurora can scale to support larger deployments while maintaining its core value proposition of simplifying and broadening data access through natural language interfaces.