Skip to content

Roadmap

OpenAICE is actively developed. Here's what's planned for future versions.

v1.0 (Current)

  • Canonical state model (12 entity types)
  • State bus with fragment merging and freshness tracking
  • YAML-driven policy engine (5 rules)
  • Safety guardrails (confidence, freshness, cooldown, blast-radius)
  • Explainable recommendations with structured output
  • Adapters: Prometheus, Kubernetes, Slurm, GPU, Generic Serving, Replay
  • CLI with replay, serve, and state commands
  • FastAPI REST API
  • Replay-based golden tests
  • Docker support
  • VS Code Extension (sidebar, recommendations, replay, status bar)

v1.1 (Near-term)

  • VS Code Chat Participant (@openaice in agentic chat)
  • GCM integration adapter (Meta GPU Cluster Monitoring)
  • OpenTelemetry Collector receiver adapter
  • Prometheus metrics exporter for OpenAICE itself
  • Grafana dashboard templates
  • Additional policy rules for LLM-specific scenarios
  • WebSocket support for real-time recommendation streaming
  • Helm chart for Kubernetes deployment

v2.0 (Medium-term)

  • Actuation adapters — Execute approved recommendations via K8s API / Slurm commands
  • Approval workflow — Slack/Teams/PagerDuty integration for human-in-the-loop
  • Metadata enrichment adapters — MLflow, W&B experiment tracker integration
  • Multi-cluster federation — Manage recommendations across multiple clusters
  • Persistent state — PostgreSQL/Redis backend for state bus (replace in-memory)
  • Recommendation history — Historical analysis and trend detection
  • Policy simulation — "What-if" mode to test policy changes against historical data

v3.0 (Long-term)

  • ML-augmented policies — Learned thresholds for anomaly detection
  • Cost optimization engine — Cloud spend forecasting and right-sizing
  • Compliance reporting — Audit reports for regulatory requirements
  • Go migration — Performance-critical paths rewritten in Go
  • Plugin SDK — First-class plugin architecture for third-party extensions

Possible Expansions

  • Integration with more GPU types (AMD ROCm, Intel Gaudi)
  • Support for additional schedulers (PBS, LSF, Ray)
  • Custom health check framework (similar to GCM health checks)
  • Agent-based node collectors for bare-metal environments

Contributing

Want to help build the future of OpenAICE? See the Contributing Guide.