Limitations & Design Options

Critical analysis of architectural options, known limitations, and risk mapping for the AESOP platform.

Risk & Complexity Map

Architecture Limitations

1. Apache AGE Maturity
Apache AGE is relatively young compared to Neo4j. Recursive graph queries on thousands of nodes can be slow. Mitigation: use PostgreSQL for primary filtering (date, H3 zone), then AGE only on reduced subsets. Consider Neo4j as future alternative if graph complexity grows beyond AGE's performance envelope.

2. Cascade Propagation Ordering
The sequential propagation in MIDAS requires an arbitrary ordering of cross-IS dependencies per turn. Different orderings can produce different cascade outcomes. This is a known simplification. Mitigation: make the ordering configurable and run multiple orderings as sensitivity analysis.

3. Monte Carlo Cost at Scale
With 100 runs per combat engagement, and potentially hundreds of engagements per turn, the computational cost grows fast. Mitigation: use Celery worker pools, reduce runs for non-critical engagements, cache probability distributions.

4. C2AUTO Simplification
The automated strategic decision module (auto mode) uses a simplified "strategic culture" model. Real strategic decision-making involves politics, psychology, intelligence gaps, and institutional culture that cannot be captured in parameterized rules. Auto mode should only be used for rapid screening, never as definitive analysis.

5. Inter-Service Consistency
Three separate databases mean no ACID transactions across services. If MIDAS state updates while simulation reads it, the simulation may see a partial state. Mitigation: use versioned snapshots (TurnResult) and read from the last completed turn, never from in-progress state.

6. Real Infrastructure Data
MIDAS is only as good as its IS graph data. Populating real infrastructure graphs (power grid topology, telecom backbone, transport networks) requires either classified data access or extensive open-data collection and validation. The entity-to-IS mapping from aesop_intell is non-trivial.

7. WebSocket Scaling
Django Channels with Redis is suitable for small to medium multiplayer sessions (2-6 belligerants, 8 workstations each = max 48 connections). For larger exercises, consider dedicated WebSocket infrastructure.

Design Options to Study

Graph DB: Apache AGE vs Neo4j vs Amazon Neptune

AGE Same PostgreSQL server, no extra infra, SQL+Cypher hybrid.
Neo4j Mature, better tooling, separate server cost.
Neptune Managed, AWS-only.

Current choice (AGE) is pragmatic for MVP.

Spatial: H3 vs S2 vs GeoHash

H3 Uniform area hexagons, excellent neighbor math, Uber-proven.
S2 Google's system, quad-tree cells, variable shape.
GeoHash Simple but rectangular cells with edge effects.

H3 is the best choice for military hex-grid simulation.

Real-time: Django Channels vs Dedicated WebSocket Server

Channels Integrated with Django auth/sessions, simpler deployment.
Dedicated (e.g., Centrifugo, Soketi) Better performance, separate scaling.

For MVP, Channels is sufficient. Scale out later if needed.

LLM Provider: Mistral vs Anthropic vs OpenAI

Mistral Cheapest, EU-hosted (data sovereignty), adequate for NER.
Anthropic Best reasoning (for L3 synthesis), more expensive.
OpenAI Widest ecosystem.

Current multi-provider approach is correct.

Deployment: Monolith vs Microservices vs Modular Monolith

Current choice (3 separate Django projects) gives clear separation but adds operational complexity (3 DBs, 3 Celery workers, 3 deployments).

Alternative Modular monolith (single Django project, apps organized by module) is simpler to deploy. The microservice approach is justified if teams work independently or services scale differently.

Simulation Fidelity vs Speed Tradeoff

More VE/VC variables and more Monte Carlo runs = more realistic but slower.

Hot planning (auto mode, minutes) — simplify.
Training exercises (manual mode, days) — maximize fidelity.

Make fidelity level a session parameter.