System Overview

Complete technical reference — the 6-stage pipeline from data ingestion through feature engineering, ML training, real-time inference, and multi-broker trade execution.

Vamsi Denduluri • February 2026 • Production — Paper Trading Active on US Equities

1. System Overview

This system answers one question for 15 actively-monitored US equities: “Will this stock go UP, DOWN, or stay FLAT in the next 5 minutes, 25 minutes, 75 minutes, or 5 hours?”

It does this through a 6-stage pipeline that runs every 10 minutes during market hours, fully automated via N8N workflow orchestration:

📥Ticks4 Data Sources

📊Bars18 Features

🏷️Labels4 Horizons

🧠TrainingPer-Horizon Models

Inference< 0.2s Latency

📈Trading3 Brokers
Stage What It Does Key Output Duration
1. Ticks Fetches OHLCV data from 4 providers with automatic failover Raw price/volume in staging 5–10s
2. Bars Computes 18 technical features across 5 categories 23-column feature matrix 5–15s
3. Labels Classifies future price returns across 4 horizons UP / DOWN / FLAT labels 10–30s
4. Training Trains one classifier per horizon, uploads to cloud 4 versioned model files ~45s
5. Inference Loads models, generates predictions via FastAPI Predictions with probability scores <0.2s
6. Trading Evaluates multi-horizon consensus, executes orders Orders submitted to broker 2–5s

Symbol Coverage — Active Watchlist (15 symbols)

Symbol Company Sector
AAPL Apple Consumer Electronics
MSFT Microsoft Software / Cloud
NVDA NVIDIA AI / Semiconductors
TSLA Tesla Electric Vehicles
GOOGL Alphabet Inc. Search / Cloud
AMZN Amazon E-commerce / Cloud
META Meta Platforms Social Media
AVGO Broadcom Inc. Semiconductors
NFLX Netflix Streaming Media
JPM JPMorgan Chase Banking
AMD Advanced Micro Devices Semiconductors
PLTR Palantir Technologies Software / AI
RIVN Rivian Automotive Electric Vehicles
LCID Lucid Group Electric Vehicles
PRM Perimeter Solutions Specialty Chemicals

The system is not limited to these symbols — it processes any valid US equity on-demand.

Cloud Stack: PostgreSQL (all data), Azure Blob Storage (models), FastAPI (inference), N8N (orchestration), Docker (containerized deployment). Estimated cost: ~$20–40/month using free tiers.

2. Architecture

Foundation: 5-Minute Bar Resolution

The entire system is built on a 5-minute bar foundation — a deliberate choice that balances day trading liquidity (78 bars per trading day), API rate compliance (all providers support 5-minute data on free tiers), and model training quality (sufficient observations for statistical significance).

All time horizons are expressed as bar counts, not absolute time. This makes the architecture resolution-independent:

Horizon Bars Ahead Time Window (at 5min) Trading Use Case
H1 1 bar 5 minutes Immediate scalping signals
H5 5 bars 25 minutes Short-term momentum
H15 15 bars 75 minutes Medium-term swing
H60 60 bars 5 hours Long-term position

If the foundation changes to 1-minute bars in the future, the same configuration produces 1-minute, 5-minute, 15-minute, and 60-minute windows — zero code changes required.

Three-Container Deployment

Container Port Responsibility
Pipeline API 8001 Data ingestion, feature engineering, labeling, training
Inference API 8000 Model loading, prediction generation
Trading API 8002 Signal evaluation, order execution, position management

Separating these provides fault isolation (a crash in trading doesn’t affect predictions), independent scaling (inference is CPU-bound, pipelines are I/O-bound), and deployment flexibility (update trading logic without reloading ML models).

Clean Architecture

  • Interfaces — FastAPI endpoints, CLI entry points. Handles HTTP requests and responses.
  • Application — Pipeline orchestrators, decision engines. Coordinates workflow and validates constraints.
  • Domain — Trading strategies, position sizing, exit evaluation. Pure business logic with no external dependencies.
  • Infrastructure — Broker API clients, database adapters, storage adapters. All external communication.

This separation means trading strategies can be unit tested without databases or broker APIs, and swapping Alpaca for Interactive Brokers requires zero changes to business logic.

System Architecture

⏰ N8N Orchestrator
Every 10 minutes during market hours
🔄 Pipeline API :8001
Data + Features + Labels + Training
🧠 Inference API :8000
Model Loading + Predictions
📈 Trading API :8002
Signals + Orders + Positions

🗄️ PostgreSQL
All Data
☁️ Azure Blob Storage
Model Storage

3. Stage 1: Ticks — Market Data Ingestion

What It Does

Fetches real-time OHLCV (Open, High, Low, Close, Volume) data from multiple market data providers, deduplicates overlapping data using source priority, and stores it in PostgreSQL for downstream processing.

Data Sources

Source API Type Free Tier Limits Bars/Call Priority Primary Role
Alpaca Markets REST Unlimited calls 5,000+ 1 (highest) Primary real-time source during market hours
Yahoo Finance REST (yfinance) Unlimited Variable 2 Secondary source, high availability
Alpha Vantage REST 25 calls/day 100 3 Historical backfill
Polygon.io REST 5 calls/min 50,000 4 Bulk backfill for new symbols

When multiple sources provide data for the same symbol and timestamp, the system deduplicates using the priority order above. All source data is preserved — only the highest-priority version is used for downstream processing.

Each provider implements a common interface, making it straightforward to add new data sources.

Intelligent Mode Selection

Mode When Selected Behavior Frequency
Incremental New data available since last run Fetches only bars after latest timestamp ~90% of runs
Hybrid Provider corrections detected Fetches new data AND re-fetches revised bars ~8% of runs
Full Stale data (>40 days) or new symbol Complete backfill with 40-day lookback ~2% of runs
Skip No new data expected (off-market hours) Returns immediately, no API calls Off-hours

Mode selection happens per-symbol within each run. A single execution might use incremental for AAPL, hybrid for META (correction detected), and skip for PRM (off-market).

Revision Tracking

Market data providers occasionally correct historical values — a high price revised from 650.50 to 650.99, a volume figure adjusted after settlement. The system detects these corrections automatically, logs the revision (what changed, old value, new value, source), and propagates the correction through all downstream pipelines.

Resilience

  • 97% network success rate across all production runs
  • 3-attempt retry with exponential backoff per API call
  • 60-second timeout prevents hanging on unresponsive providers
  • Session-aware processing: Correctly handles pre-market, regular, after-hours, and extended sessions
  • Zero data loss confirmed across all production runs since December 2025

4. Stage 2: Bars — Feature Engineering

What It Does

Transforms raw OHLCV data into a 23-column feature matrix by computing 18 technical features across 5 categories. This feature matrix is what ML models train on and what inference uses for predictions.

Technical Features

Base Features (5): Open, High, Low, Close, Volume — the raw OHLCV data carried forward from ticks.

Price-Based Features (5)

Feature What It Measures Trading Significance
EMA-20 Short-term trend direction (20-period exponential moving average) Price above EMA-20 suggests uptrend; crossover with EMA-50 signals trend change
EMA-50 Medium-term trend direction (50-period exponential moving average) Acts as dynamic support/resistance; wider spread from EMA-20 indicates strong trend
VWAP Intraday fair value (volume-weighted average price) Institutional benchmark; deviation signals overvaluation/undervaluation
EMA Spread Momentum strength (EMA-20 minus EMA-50) Widening spread = accelerating trend; narrowing = trend weakening; zero crossing = reversal
VWAP Gap Price deviation from fair value (% difference between close and VWAP) Large positive gap = overextended; large negative = potential mean reversion

Momentum Features (4)

Feature What It Measures Trading Significance
RSI Overbought/oversold conditions (14-period Relative Strength Index) Above 70 = overbought; below 30 = oversold; divergence signals reversals
MACD Histogram Momentum acceleration (12/26/9 MACD difference) Growing histogram = increasing momentum; shrinking = fading; zero cross = directional change
Bollinger %B Price position within volatility bands (20-period, 2 std dev) Values above 1.0 = above upper band (breakout); below 0 = below lower band (breakdown)
RSI Divergence Directional momentum bias (RSI deviation from neutral 50) Positive = bullish momentum; negative = bearish; magnitude indicates strength

Volume Features (3)

Feature What It Measures Trading Significance
Relative Volume Unusual activity detection (current volume / rolling average) Spikes above 1.5x often precede large price moves
Dollar Volume Actual monetary flow (Close × Volume) Filters noise from low-dollar-volume periods
Volume Spike Binary flag for abnormal trading activity Marks bars exceeding the volume spike threshold

Sentiment Features (4) — Reserved for Future Integration

Feature Purpose
Sentiment Score Planned integration with financial news APIs
Sentiment Rolling Smoothed sentiment trend
Sentiment Volatility Sentiment stability measure
Sentiment-Volume Interaction Cross-signal between sentiment and volume

Day Trading Features (2) — Computed During Labeling

Feature What It Measures
Returns Forward price return at each prediction horizon
Volatility Rolling standard deviation of returns — market stability context

Incremental Processing

The pipeline does not blindly recalculate all bars every run. It detects which data has changed:

  1. New data — Any new market data since the last bar computation
  2. Revised data — Any corrections from data providers

Only bars matching either condition are reprocessed. During market hours, this typically means 5–15 new bars per run. Off-market, it means zero bars processed — the pipeline exits in under 1 second.

Computation Approach

  • Vectorized operations: All indicator calculations use vectorized functions — no row-by-row loops
  • No look-ahead bias: Features use only data up to and including the current bar
  • Idempotent writes: Results are written in batches with upsert logic
  • Typical performance: ~10 bars in under 8 seconds during incremental runs

5. Stage 3: Labels — Movement Classification

What It Does

Looks into the future to classify price movements as UP, DOWN, or FLAT for each of the 4 time horizons. This creates the ground truth that models learn to predict.

Classification Logic

For each bar, the pipeline calculates the forward return at each horizon:

returnh = (closet+h − closet) / closet
Label Condition Meaning
UP Forward return > +0.5% Significant upward price movement
DOWN Forward return < −0.5% Significant downward price movement
FLAT Between −0.5% and +0.5% No meaningful directional change

The 0.5% threshold was chosen to capture meaningful intraday moves while filtering out noise. Too low catches random fluctuations; too high misses actionable trading opportunities.

Multi-Horizon Processing

Horizon Look-Ahead Time at 5min What It Captures
H1 1 bar ahead 5 minutes Immediate momentum — will the current move continue?
H5 5 bars ahead 25 minutes Short-term trend — is there follow-through?
H15 15 bars ahead 75 minutes Medium-term swing — sustained directional move?
H60 60 bars ahead 5 hours Long-term position — session-level trend?

A single run across 15 symbols with ~540 new bars produces approximately 2,160 labeled samples (540 bars × 4 horizons).

Additional Computed Features

Feature Calculation Purpose
Returns Forward return at horizon h The actual value being classified
Volatility Rolling standard deviation of returns Market stability context for the model

Efficient Processing

The pipeline checks whether any bars have changed since the last run. If no new or updated bars exist, the entire labels pipeline exits in under 1 second — no computation, no writes. During off-market hours, this is the case for every run.

During market hours, only labels for new or revised bars are processed. This scoped approach keeps labeling fast even as the database grows.

6. Stage 4: Training — Model Building

What It Does

Trains one RandomForest classifier per time horizon using the labeled data from Stage 3, evaluates performance, and uploads versioned model files to Azure Blob Storage for inference.

Feature Selection

Status Count Features Rationale
Used 20 OHLCV (5) + EMA-20, EMA-50, RSI, VWAP, MACD Histogram, Bollinger %B, Relative Volume, Dollar Volume, Volume Spike, EMA Spread, VWAP Gap, RSI Divergence, Sentiment-Volume Interaction, Returns, Volatility Core predictive features — each captures a distinct market dimension
Excluded 3 Sentiment Score, Sentiment Rolling, Sentiment Volatility Placeholder values (reserved for future news/sentiment integration)

Production Algorithm: RandomForest

Parameter Value Rationale
Algorithm RandomForest Classifier Strong baseline for tabular data; handles non-linear patterns; resistant to overfitting
Number of Trees 100 Sufficient ensemble diversity without excessive training time
Max Depth 10 Prevents overfitting while allowing complex decision boundaries
Target Classes 3 (UP / DOWN / FLAT) Matches the labeling classification
Output Predicted class + probability scores for all 3 classes Enables confidence-based filtering downstream

Why Separate Models Per Horizon

Each horizon exhibits different feature importance patterns:

  • Short-term (H1, H5): Momentum indicators dominate — RSI, MACD Histogram, and Bollinger %B have the highest importance.
  • Medium-term (H15): Mix of momentum and price-based features — EMA Spread and VWAP Gap gain importance as trend sustainability matters more.
  • Long-term (H60): Trend and volume indicators dominate — EMA-based features, Relative Volume, and VWAP become the strongest predictors.

Training separate models per horizon yields 60–70% accuracy versus ~55% with a single multi-horizon model — a significant improvement for a 3-class classification problem (random baseline: 33%).

Model Performance

Metric H1 (5min) H5 (25min) H15 (75min) H60 (5hr)
Accuracy 60–65% 63–68% 65–70% 62–67%
Precision 62–64% 63–66% 64–66% 62–65%
Recall 62–65% 63–67% 64–67% 62–66%
Training Time ~15s ~15s ~15s ~15s

Expandable Algorithm Architecture

The system supports multiple algorithms through configuration — switching requires changing a single config value:

Tabular Models (ready to use):

Algorithm Strengths When to Use
RandomForest (production) Robust baseline, interpretable feature importance, fast training Default choice — proven reliable
XGBoost Often outperforms RF on tabular data; strong regularization Seeking accuracy gains over RF
LightGBM Faster training than XGBoost on large datasets; leaf-wise tree growth Training data exceeds 100K samples
CatBoost Minimal hyperparameter tuning; ordered boosting prevents overfitting Rapid experimentation needed
Logistic Regression Highly interpretable; fast training/inference Benchmark to verify complex models add value
SVM Effective non-linear boundaries with kernel trick UP/DOWN boundary is highly non-linear

Sequence Models (planned — require sliding window feature engineering):

Algorithm Strengths Prerequisite
LSTM Captures temporal dependencies across many bars Sliding window of N previous bars as input
1D CNN Learns local patterns in bar sequences; faster than LSTM Same sliding window input format
Transformer Attention mechanism for complex temporal patterns Large dataset (S&P 100+) for effective training

Model Storage and Versioning

Models are versioned and stored in Azure Blob Storage organized by version and horizon. All models can be hot-reloaded by the inference API without container restart.

Training is currently executed 2–3 times daily using a configurable lookback window. The flow is: new labels → retrain → upload → hot reload inference.

Model Lifecycle

📊Labeled Data~535K Samples

🏋️Train4 RF Models

☁️UploadAzure Blob

🔄Hot ReloadNo Restart

Predictions< 0.2s

📈TradingDecisions

7. Stage 5: Inference — Real-Time Predictions

What It Does

Loads all 4 trained models into memory and serves real-time predictions via a FastAPI service. For each symbol, the service produces a prediction per horizon — UP, DOWN, or FLAT — with probability scores for all three classes.

API Endpoints

Endpoint Purpose Input Source Response
Predict Latest Real-time prediction for a single horizon Database (latest bar for requested symbol) Predicted label, probabilities, confidence
Predict Latest All Real-time prediction for all 4 horizons Database (latest bar) Array of 4 predictions
Predict Backtesting with user-supplied data (single horizon) Request body (feature values) Predicted label, probabilities
Predict All Backtesting with user-supplied data (all horizons) Request body Array of 4 predictions
Reload Models Hot reload models from Azure Blob Storage Triggers download and cache refresh Confirmation with load times

Prediction Output

Field Example Description
Symbol AAPL Stock being predicted
Horizon 5 Predicting 5 bars (25 minutes) ahead
Predicted Label UP Model’s directional call
Probability UP 0.73 73% probability of upward movement
Probability DOWN 0.15 15% probability of downward movement
Probability FLAT 0.12 12% probability of no significant change
Confidence Level HIGH HIGH if max probability > 0.70, otherwise LOW
Predicted At 2026-02-09 10:30:00 Timestamp of prediction generation

Performance

Metric First Request Subsequent Requests Notes
Single horizon 0.125s <0.2s DB fetch + prediction
All 4 horizons 0.134–1.2s <0.2s DB fetch + 4 model inferences
Model reload 65–67s Downloads and caches all 4 models

Models are cached in memory at startup. Hot reload refreshes the cache without restarting the container — critical for deploying newly trained models during market hours.

Predictions are stored in the database with full linkage to the input bar, and each prediction is tracked to ensure idempotent evaluation — the same prediction never triggers multiple trades across runs.

8. Stage 6: Trading — Signal Generation & Execution

What It Does

Reads unprocessed predictions from the database, generates trading signals through multi-horizon consensus, validates each signal against portfolio constraints, submits approved orders to the configured broker, and tracks every decision for audit compliance.

The Complete Trading Workflow

  1. Symbol Resolution — Identify symbols with unprocessed predictions, plus symbols with open positions
  2. Prediction Fetch — For each symbol, fetch the latest predictions across all 4 horizons
  3. Exit Evaluation — Check all open positions for exit conditions and close positions before opening new ones
  4. Signal Generation — Pass predictions to the configured strategy to determine if a BUY or SELL signal is warranted
  5. Forecast Marking — Mark all evaluated predictions, regardless of signal outcome
  6. Signal Validation — Decision Engine validates portfolio constraints for each signal
  7. Order Submission — Approved signals are submitted to the broker
  8. Trade Persistence — Every executed order is recorded with full linkage to the triggering prediction
  9. Summary — Return a structured summary for both machine consumption and human notification (Telegram)

Trading Decision Flow

📊 Unprocessed Predictions
🔍 Multi-Horizon Consensus Check
✅ Signal Generated
BUY or SELL
🛡️ Decision Engine
Portfolio Constraints
📤 Order Submitted
🚫 Rejected

⏭️ No Signal
Mark Processed

Signal Generation: Multi-Horizon Consensus

Rather than acting on any single prediction, the system requires agreement across multiple time horizons. This reduces noise and increases signal quality significantly.

Why consensus matters: A single horizon spike might be noise. Multiple horizons agreeing suggests a stronger underlying market pattern. Different horizons capture different dynamics — short-term momentum confirms long-term trend.
H1 (5min) H5 (25min) H15 (75min) H60 (5hr) Result
UP 75% UP 80% UP 72% UP 78% Strong BUY — all 4 horizons agree with high confidence
UP 75% UP 70% DOWN 65% UP 68% No signal — conflicting directions
UP 55% UP 60% UP 58% UP 62% No signal — confidence too low across all horizons
FLAT 70% FLAT 75% FLAT 80% FLAT 72% No signal — FLAT predictions never generate trades

Trading Strategies

Conservative Strategy:

Parameter Value Description
Confidence Threshold 70% Minimum per-prediction confidence to consider
Horizons Required 3 of 4 At least 3 horizons must agree on direction
Position Multiplier 1.0× Standard position size for all signals

Tiered Strategy (active):

Tier Horizons Required Confidence Position Multiplier Description
Tier 1 (Strong) All 4 horizons agree 60%+ each 2.0× base size Highest conviction — largest position
Tier 2 (Moderate) Any 3 horizons agree 65%+ each 1.0× base size Standard conviction — standard position
Tier 3 (Cautious) Any 2 horizons agree 70%+ each 1.0× base size Lower conviction but high per-prediction confidence

Evaluation proceeds from strongest to weakest tier. First match wins.

Tiered Signal Evaluation

H15min • UP 75%
H525min • UP 80%
H1575min • UP 72%
H605hr • UP 78%

⚖️TieredEvaluation

🟢 Tier 1: Strong • 2.0x
🟡 Tier 2: Moderate • 1.0x
🟠 Tier 3: Cautious • 1.0x

Decision Engine: Portfolio Constraints

Check Rule Rejection Reason
Duplicate Position Cannot open a second position in a symbol already held DUPLICATE_POSITION
Max Positions Cannot exceed 20 concurrent positions (configurable) MAX_POSITIONS
Concentration Limit No single position can exceed 20% of portfolio value CONCENTRATION_LIMIT
Buying Power Sufficient cash must be available for the order INSUFFICIENT_BUYING_POWER

Position Sizing

The system uses fixed-dollar position sizing: each trade targets a configurable dollar amount (default $5,000), and the quantity is calculated based on current price. The tiered strategy applies a multiplier: Tier 1 signals get 2.0× (i.e., $10,000 positions), while Tier 2 and Tier 3 get 1.0× ($5,000). Fractional shares are supported for long positions.

Exit Conditions

Condition Trigger Rationale
Stop-Loss Unrealized loss exceeds 5% (configurable) Limits downside risk per position
Opposite Signal New prediction contradicts position direction Respect the model’s updated view
End of Day Within 10 minutes of session close Avoids overnight risk; works across all brokers and extended hours

The exit framework is extensible — planned additions include smart EOD (hold overnight winners), profit-taking, and tier-based exits.

Manual Trading

The system supports manual trades through the same pipeline. A user provides symbol, quantity, and side — the request bypasses signal generation but passes through the identical Decision Engine constraint checks, position sizing validation, and audit logging. Manual trades appear alongside automated trades in all analytics and dashboards.

9. Multi-Broker Architecture

Provider-Agnostic Design

All broker interactions flow through a single abstract interface that provides rate limiting (per-broker configurable), retry logic with exponential backoff, run tracking integration, API call logging, and standardized response format regardless of broker.

Every consumer — the trading pipeline, CLI tools, REST API — works with this abstract interface. Switching brokers requires zero changes to any consumer code.

The interface defines 9 methods that every broker must implement:

Method Purpose
Place Market Order Submit a buy or sell order for a given symbol and quantity
Get Positions Retrieve all open positions with current market value and P&L
Get Account Balance Return available cash, portfolio value, and buying power
Get Market Status Determine if the market is open, closed, or in pre/post-market
Get Latest Prices Fetch current bid/ask/last prices for one or more symbols
Get Orders List all open or recent orders with status
Get Order Status Check the fill status of a specific order by ID
Get Account Details Full account overview including margin, day trade counters, leverage
Get Activities Execution history — fill records for audit trail and reconciliation

Three-Broker Implementation

Broker Market Status API Style Authentication
Alpaca US equities (NASDAQ, NYSE) Active — Production HTTP REST (stateless) API Key + Secret per request
Interactive Brokers US equities (global capable) All 9 methods implemented, pending gateway testing TCP Socket (event-driven) IB Gateway pre-authentication
Kotak Securities India NSE equities + Gold/Silver ETFs 1 of 9 methods implemented, in progress Python SDK (synchronous) TOTP + MPIN (automated daily login)
🔄 Trading Pipeline
📋 Broker Interface
9 Standard Methods • Rate Limiting • Retry Logic
🇺🇸 Alpaca
US Equities • REST
🌐 Interactive Brokers
Global • TCP Socket
🇮🇳 Kotak Securities
India NSE • SDK

Broker Selection

  1. Runtime override — Pass a provider parameter in the API request or CLI flag
  2. Environment variable — Set for automated pipelines
  3. Config default — Read from configuration file

Alpaca (Active)

The production broker for US equities, using Alpaca’s Paper Trading API. Stateless HTTP REST with separate URLs for trading and market data. Supports market orders, fractional shares (long positions), real-time position tracking, and P&L calculation. 200 requests/minute rate limit.

Interactive Brokers (Code Complete — Pending Testing)

All 9 interface methods implemented. Uses an event-driven TCP socket protocol where requests trigger asynchronous callbacks. A synchronous wrapper layer makes the async API compatible with the rest of the system. Pending end-to-end testing with IB Gateway.

Kotak Securities (In Progress)

India’s National Stock Exchange (NSE) for equities and Gold/Silver ETFs. Uses a synchronous Python SDK with automated TOTP-based daily login. Supports delivery-based (CNC) and intraday (MIS) product types. Single config switch between sandbox and production.

Multi-Market Database Support

  • All trade records include currency, country, and exchange information
  • The symbol registry supports symbols across multiple exchanges and regions (US, India, Japan, UK, Germany, Canada, Australia)
  • Exchange codes are standardized across providers
  • Trading hours, settlement rules, and timezone handling are metadata-driven, not hardcoded

When adding a new market, only the broker client needs implementation. All business logic works unchanged because metadata drives market-specific behavior.

10. End-to-End Performance

Azure Production Performance

The system runs every 10 minutes during market hours as Docker containers on Azure Web App, with all pipelines executing sequentially in a single cycle.

Intelligent mode selection per symbol — incremental, hybrid, skip, or full — keeps cycle times low:

Pipeline Typical Duration Per Symbol Notes
Ticks ~74s ~5s API-bound (external provider response time)
Bars 13–16s ~1s Incremental feature computation
Labels 20–21s ~1.4s Scoped to new/revised bars only
Total Cycle ~1.8 min 15 symbols per cycle

Azure Web App (B1) → Azure PostgreSQL, February 2026.

Off-market hours: When no new data arrives, the entire pipeline completes in under 2 seconds — every stage detects “no changes” and exits immediately.

Training Performance

Training runs 2–3× daily using a configurable lookback window. The current algorithm is scikit-learn RandomForest (CPU-only). The database holds 3 years of backfilled data available for training and backtesting.

Horizon Samples (90-day window) Training Time
H1 (5 min) ~340K samples ~100s
H5 (25 min) ~137K samples ~37s
H15 (75 min) ~46K samples ~12s
H60 (5 hr) ~10K samples ~2s
Total ~535K samples ~2.5 min

15 symbols, RandomForest (100 estimators), CPU-only. February 2026.

All 4 models upload to Azure Blob Storage after training and are hot-reloaded by the inference API without restart.

Row Accounting

Every pipeline run tracks inserts, updates, and skips separately. Zero variance between expected and actual counts has been maintained across all production runs — if the pipeline says it processed 540 bars, exactly 540 rows were modified in the database.

11. Observability & Data Quality

Three-Tier Run Tracking

Tier What It Captures Use Case
Pipeline Overall status, total timing, aggregate row counts, error messages “Did the 10:30 AM run succeed? How long did it take?”
Summary Per-symbol breakdown — rows per symbol, timing, mode used “Why did AAPL take longer? Was it hybrid mode?”
Event Detailed operation logs with categories (200,000+ entries, BI-ready) “Show me all operations for TSLA in the last hour”

Validation Tools

Tool What It Does Key Capabilities
Data Diagnostics 20 automated checks against the production database Freshness validation, completeness scoring, schema verification, gap analysis
Run Analyzer Pipeline execution analysis with drill-down Console and UI modes, cycle detection, performance trending
Stock Analyzer Per-symbol data quality assessment Completeness scoring, gap detection, visual charts
Compliance Checker 15 automated code quality checks Layer separation, config compliance, SQL compliance, Pylance integration, duplicate detection

12. Reporting API & Appsmith Dashboard

The Fourth Container

Alongside the three core containers (Pipeline, Inference, Trading), the system includes a dedicated Reporting API on port 8003 — a read-only sidecar that serves dashboard data, portfolio analytics, and broker account information.

Separating read-heavy dashboard queries from write-critical trading actions ensures a slow analytics query never blocks trade execution.

🔄Pipeline API:8001 • Write
🧠Inference API:8000 • Read
📈Trading API:8002 • Write
📊Reporting API:8003 • Read-only

Single Generic Endpoint

The API follows a registry pattern — one universal endpoint handles all report types: GET /report/{report_type}. Adding a new report requires two steps: add a generator method and register it. No route changes needed.

Available Reports

Report Type What It Returns Response Time Use Case
portfolio_summary Account balance + open positions combined view ~0.6s Dashboard overview widget
positions All open positions with current market value and P&L ~0.3s Position monitoring table
orders Order history with 40+ fields per order, filterable ~0.7s Trade history and audit
account Equity, buying power, margin, day trade counters ~0.6s Account health monitoring
market_status Current market state with timestamps ~0.3s Market status indicator
activities Execution history — fill records for reconciliation ~0.5s Trade reconciliation

All endpoints support optional query parameters: days (lookback), status (order filter), symbol (single-symbol filter), activity_type (FILL, DIV, etc.), and limit (max records).

The Dashboard: Appsmith on Azure

The Appsmith dashboard runs as an Azure Web App at denduluri.net, with the four API sidecars (ports 8000–8003) deployed as Azure sidecar containers alongside it. The dashboard reads pre-computed data directly from PostgreSQL via SQL views and calls sidecars over localhost for live broker data — zero middleware overhead.

Page What It Does
Portfolio Real-time equity, positions, P&L, activity feed — auto-refreshes every 10 seconds during market hours
Trade Entry Manual order form with symbol validation, broker dropdown (Alpaca / IBKR / Kotak), confirmation flow
Bars Explorer OHLCV data table + TradingView candlestick chart with 160+ professional technical indicators
Pipeline Monitor Run health, data coverage gaps, daily bar summaries — driven by SQL views
Model Validation Forecast accuracy by horizon and confidence tier, prediction-vs-actual attribution

Additional capabilities: dark trading theme, role-based access (Admin / Developer / Viewer), TradingView embedded charts, mobile-responsive layout, and CSV export on all data tables.

Hybrid Data Strategy: The dashboard uses the Reporting API for live broker data (positions, balance) and SQL views for pre-computed analytics (pipeline health, prediction accuracy). This avoids hammering the broker API for static data while ensuring live data is always fresh.

13. Experiment & Backtesting Framework

The Problem

The production system trains models and executes trades — but how do we know if a different algorithm, different features, or different strategy would perform better? Without systematic experimentation, we’re flying blind.

Gap Impact
No train/test split in production training In-sample accuracy only — no idea if model generalizes
Hardcoded hyperparameters No way to optimize model performance
Only RandomForest active 5 other algorithms implemented but unused
Fixed indicator parameters Can’t test if different params improve predictions
No P&L simulation Can’t estimate profitability without live paper trading
No experiment tracking Can’t compare configurations systematically

The Solution: YAML-Driven Experiment Framework

A dedicated backtesting framework that lets a researcher define an experiment in a YAML config, run it from a single CLI command, and get comprehensive classification + trading + stability metrics — all tracked for comparison.

📄ExperimentYAML Config

📥Load DataPostgreSQL

📊FeaturesEngineering

✂️SplitTrain/Test

🧠TrainModel

📈BacktestSimulator

📋Metrics& Tracking

What’s Configurable

Dimension What You Can Change Example
Data Symbols, date range, lookback period Test on AAPL+NVDA+TSLA for 180 days
Features Which indicators, indicator parameters RSI-21 instead of RSI-14, add Bollinger Bandwidth
Labels Classification threshold, which horizons 0.3% threshold instead of 0.5%, focus on H5+H15
Model Algorithm, hyperparameters XGBoost with 500 trees and max_depth=8
Training Split method: holdout, k-fold, walk-forward Walk-forward with 60-day train / 5-day test
Strategy Conservative vs Tiered, confidence thresholds Tier 1 at 65% confidence instead of 60%

Walk-Forward Validation

The most rigorous testing method. Instead of a single train/test split, the framework slides a window across historical data, producing not just accuracy numbers but stability metrics — does the model work consistently across different time periods, or was it just lucky in one window?

📚 Train: Days 1–60
📊 Test: Days 61–65

📚 Train: Days 6–65
📊 Test: Days 66–70

📚 Train: Days 11–70
📊 Test: Days 71–75

📋 Aggregate Metrics Across All Windows

Comprehensive Metrics

Classification Metrics — How accurate are the predictions?

Metric What It Measures
Accuracy Overall prediction correctness
Precision / Recall / F1 Per-class (UP/DOWN/FLAT) performance
Brier Score Probability calibration — are 70% predictions correct 70% of the time?
High-Confidence Accuracy Accuracy only for predictions with confidence ≥ 80%

Trading Metrics — Would this make money?

Metric What It Measures
Total Return Overall profitability including commissions and slippage
Sharpe Ratio Risk-adjusted return (higher = better risk/reward)
Max Drawdown Worst peak-to-trough decline (risk measure)
Win Rate Percentage of profitable trades
Profit Factor Gross wins / gross losses (>1 = profitable)

Stability Metrics (walk-forward only) — Is performance consistent?

Metric What It Measures
Window Variance Consistency of Sharpe ratio across time windows
Positive Window % How often the strategy is profitable per window
Best/Worst Ratio Asymmetry between best and worst window outcomes

Experiment Tracking

System Best For What It Provides
MLflow ML research and experiment comparison Rich UI for parameter search, metric comparison, artifact storage
PostgreSQL Dashboard integration and cross-referencing Appsmith widgets, JOIN with production tables, historical queries

The database schema uses a bt_ prefix (4 tables, 7 views, 28 indexes) — completely isolated from production data. Safe to drop and recreate without any impact on live trading.

Parameter Sweep

Beyond single experiments, the framework supports grid search: 4 algorithms × 3 tree counts × 3 thresholds = 36 experiments, each tracked and comparable. This systematically finds the best configuration rather than relying on intuition.

Why This Matters: The backtesting framework is the bridge between paper trading and real money. It quantifies confidence, detects overfitting through walk-forward validation, estimates profitability with realistic costs, and makes live deployment a data-driven decision — not a gut feeling.

14. Developer Tooling

CLI commands for daily operations and automated tools for code quality and data maintenance — all following the same Clean Architecture patterns as the core system.

CLI Commands

🖥️CLICommands

⚙️Factories& Services

🏦Broker APITrading & Sync
🗄️PostgreSQLRead & Write

Command Purpose
execute_trades Submit orders — automated (strategy signals) or manual override (symbol / qty / side). Same code path as Trading API.
sync_orders Reconcile broker orders with local database — provider-agnostic via BaseTradingClient
manage_forecasts Verify predictions against actuals, reconcile forecast↔trade linkage, print accuracy reports
calculate_pnl Match EXIT orders to entries, compute realized P&L per trade
run_pipeline Execute any pipeline stage: bars, ticks, labels, train, trading

All CLI commands share the same factories and services as the API containers — execute_trades uses the identical TradingPipelineFactory that powers the Trading API on port 8002.

Operational Tools

Tool What It Does
Data Exporter Export any PostgreSQL table or broker data to CSV / JSON / Parquet — filterable by pipeline, run ID, symbol, date range
Portfolio Viewer Local Streamlit dashboard for real-time portfolio monitoring across brokers (calls same Trading API)
Broker Diagnostics Provider-agnostic account inspector — tests all 8 trading client methods across any configured broker
Ticks Backfill Bulk historical loader — 94 symbols × 3+ years of 5-min OHLCV from Alpaca SIP feed

Code Quality & Maintenance

Category What It Does
Compliance (15 checkers) Automated Clean Architecture validation — layer separation, SQL compliance, config compliance, Pylance integration, duplicate detection. 100% production code compliant (0 violations).
Validation Per-run quality gates — row reconciliation, timing consistency, logging standards
Diagnostics 4-way API data comparison, phantom update detection for root cause investigation
Maintenance (6 scripts) Database cleanup, deduplication, staging cleanup, session management, metadata backfill
Metadata Multi-region symbol management — 37 symbols across 10 exchanges, 7 regions. API-first from YFinance + Alpaca + Polygon.
Schema Sync Version-controlled SQL execution and database schema synchronization

15. What’s Next

Immediate Roadmap

Priority Work Item Purpose
1 Complete Kotak Securities integration Enable India NSE paper trading alongside US markets
2 Backtesting framework implementation Walk-forward validation, algorithm comparison, P&L simulation
3 Paper trading analytics Win rate, Sharpe ratio, max drawdown, strategy comparison

Future Phases

Phase Focus What It Adds
Sentiment Pipeline Financial news integration Sentiment scores per symbol — hypothesis: improves predictions during earnings seasons
Fundamentals Pipeline Earnings and financial metrics Days-until-earnings, PE ratio vs sector — hypothesis: improves longer-horizon predictions
Multi-Model Framework Algorithm experimentation A/B test model versions in parallel paper trading
Advanced Trading Additional order types and strategies Limit orders, bracket orders, smart EOD, profit-taking exits
Live Trading Real capital deployment Triggered when paper trading demonstrates consistent profitability

Design Philosophy

The project follows a disciplined approach: validate before risking capital. Every component is built to be auditable (complete decision logging), config-driven (no hardcoded values in business logic), and incrementally extensible (new algorithms, brokers, and data sources slot in through configuration rather than code changes).

Real money trading is a confidence threshold, not a deadline.