Skip to content

Future Work and Evolution: Building on Aurora's Foundation

Aurora’s development has been guided by a clear vision: to simplify and broaden access to real-time data analytics while maintaining the performance and reliability required for production use. As a small open-source project, our development focuses on high-impact features that leverage community contributions and existing open-source technologies.

Immediate Development Priorities (Next 6-12 Months)

Section titled “Immediate Development Priorities (Next 6-12 Months)”

Current State: The system currently supports basic natural language to SQL translation using local models or simple OpenAI API integration. While functional, this setup has limitations in handling complex queries.

Future Vision: Improve query understanding through better prompt engineering and integration with open-source language models, enabling more accurate SQL generation and basic follow-up questions.

Planned Enhancements:

  • Improved prompt templates for better SQL generation accuracy using existing models
  • Query validation to catch and correct common SQL errors before execution
  • Basic query suggestions showing related queries based on current table schema
  • Integration with open-source models like Code Llama or DeepSeek Coder for local deployment options

Expected Impact: 25% improvement in query accuracy and 30% reduction in query refinement iterations through better prompts and validation.

Current State: The platform currently relies on basic Kafka-to-ClickHouse pipelines with manual YAML configuration.

Future Vision: Simplify pipeline creation through better documentation, templates, and basic monitoring using existing open-source tools.

Planned Enhancements:

  • Pipeline templates for common data sources (PostgreSQL, MySQL, CSV files)
  • Configuration validation to catch setup errors early
  • Basic monitoring dashboard using Grafana templates
  • Docker Compose presets for different deployment scenarios

Expected Impact: 50% reduction in initial setup time through better templates and documentation.

Current State: Users create Grafana dashboards manually, which works but requires expertise.

Future Vision: Build a community-contributed library of dashboard templates and improve the dashboard creation experience.

Planned Enhancements:

  • Dashboard template repository with community contributions
  • One-click dashboard deployment from templates
  • Schema-based dashboard suggestions that recommend relevant templates
  • Export/import tools for sharing dashboard configurations

Expected Impact: 60% reduction in dashboard creation time for common use cases through community templates.

Vision: Expand data source support through a plugin architecture that enables community contributions.

Planned Features:

  • Plugin architecture for custom data connectors
  • Connector SDK with clear documentation and examples
  • Community connector registry hosted on GitHub
  • Pre-built connectors for top 10 requested data sources

Community Strategy: Focus on popular connectors (PostgreSQL CDC, MongoDB, S3, etc.) and provide clear contributor guidelines.

Vision: Provide production-ready security and monitoring features using proven open-source tools.

Planned Features:

  • Authentication integration with OAuth2/OIDC providers
  • Role-based access control for queries and dashboards
  • Audit logging for all platform activities
  • Prometheus metrics for system monitoring
  • Alert integration with common notification systems

Vision: Support diverse deployment scenarios while maintaining simplicity.

Planned Features:

  • Kubernetes Helm charts for container orchestration
  • Docker Swarm support for simpler container deployments
  • Cloud deployment guides for AWS, GCP, and Azure
  • Resource sizing calculator to help users plan deployments

Vision: Integrate with existing ML/analytics tools rather than building custom solutions.

Planned Features:

  • Jupyter notebook integration for data exploration
  • Apache Superset connection as alternative to Grafana
  • dbt integration for data transformation workflows
  • MLflow integration for model deployment and monitoring

Approach: Focus on integrations and APIs rather than building competing solutions.

Vision: Help users understand and explore their data more effectively.

Planned Features:

  • Automated schema documentation generation
  • Data profiling to understand column distributions and relationships
  • Query recommendation engine based on schema analysis
  • Basic anomaly detection using statistical methods

Technical Approach: Leverage existing statistical libraries and focus on practical features that don’t require heavy ML infrastructure.

Vision: Continuously improve query performance and resource efficiency.

Planned Features:

  • Query result caching with configurable TTL
  • Automatic index suggestions based on query patterns
  • Connection pooling optimization
  • Query plan analysis and optimization hints

Current Focus: Better integration with open-source language models for cost-effective NLP features. Future Research: Fine-tuning smaller models specifically for SQL generation, exploring quantized models for edge deployment, and developing prompt optimization techniques.

Current Focus: Optimizing ClickHouse configurations and query patterns. Future Research: Investigating time-series optimization techniques, exploring column store optimizations, and developing real-time aggregation patterns.

Current Focus: Building contributor onboarding and documentation. Future Research: Exploring governance models for open-source projects, investigating community incentive programs, and developing contributor recognition systems.

Current State: Aurora is fully open-source from the start, encouraging community contributions and transparency.

Future Vision: Build a sustainable open-source project with active community participation and clear governance.

Key Initiatives:

  • Contributor guidelines with clear coding standards and review processes
  • Good first issues program to onboard new contributors
  • Community calls for roadmap discussion and technical decisions
  • Documentation-first approach for all features

Current State: Limited integrations, focusing on core ClickHouse and Kafka functionality.

Future Vision: Build partnerships with complementary open-source projects.

Planned Integrations:

  • Apache Arrow for efficient data transfer
  • Apache Airflow for workflow orchestration
  • Grafana plugins for enhanced visualization
  • Popular data catalogs for metadata management

Current State: Small core team with occasional external contributions.

Future Vision: Active community of contributors and users sharing knowledge and improvements.

Community Activities:

  • Monthly community calls for roadmap and technical discussions
  • Contributor recognition program highlighting community contributions
  • User showcase featuring real-world Aurora deployments
  • Documentation contributions from community members
  • 2-3 core maintainers focusing on platform stability and architecture
  • 1 community manager (part-time) for documentation and community engagement
  • Community contributors handling connectors, templates, and specialized features
  • Iterative releases every 2-3 months with clear feature milestones
  • Community-first features that solve common use cases
  • Integration over invention - leverage existing tools rather than building from scratch
  • Documentation-driven development ensuring features are accessible to users
  • Setup time: Reduce from hours to under 30 minutes for basic deployment
  • Community adoption: 1000+ GitHub stars, 50+ contributors within 18 months
  • Integration coverage: Support for top 10 requested data sources
  • Performance: Sub-second response for 90% of analytical queries
  • Active contributors: 20+ monthly contributors within 24 months
  • User community: Active Discord/Slack with 500+ members
  • Real-world usage: 100+ documented production deployments
  • Content creation: Community-generated tutorials and use cases

This roadmap balances technical ambition with the practical constraints of a small open-source team. By focusing on community-driven development, leveraging existing open-source tools, and prioritizing high-impact features, Aurora can grow sustainably while maintaining its core mission of simplifying real-time data analytics.

Success will be measured not just by features delivered, but by community growth, user adoption, and the platform’s ability to solve real problems for teams working with streaming data. The emphasis on integration over invention ensures resources are spent on unique value rather than reinventing existing solutions.