System Overview

Complete technical reference — the 6-stage pipeline from data ingestion through feature engineering, ML training, real-time inference, and multi-broker trade execution.

Vamsi Denduluri • February 2026 • Production — Paper Trading Active on US Equities

1. System Overview

This system answers one question for 15 actively-monitored US equities: “Will this stock go UP, DOWN, or stay FLAT in the next 5 minutes, 25 minutes, 75 minutes, or 5 hours?”

It does this through a 6-stage pipeline that runs every 10 minutes during market hours, fully automated via N8N workflow orchestration:

📥Ticks4 Data Sources

→

📊Bars18 Features

→

🏷️Labels4 Horizons

→

🧠TrainingPer-Horizon Models

→

⚡Inference< 0.2s Latency

→

📈Trading3 Brokers

Stage	What It Does	Key Output	Duration
1. Ticks	Fetches OHLCV data from 4 providers with automatic failover	Raw price/volume in staging	5–10s
2. Bars	Computes 18 technical features across 5 categories	23-column feature matrix	5–15s
3. Labels	Classifies future price returns across 4 horizons	UP / DOWN / FLAT labels	10–30s
4. Training	Trains one classifier per horizon, uploads to cloud	4 versioned model files	~45s
5. Inference	Loads models, generates predictions via FastAPI	Predictions with probability scores	<0.2s
6. Trading	Evaluates multi-horizon consensus, executes orders	Orders submitted to broker	2–5s

Symbol Coverage — Active Watchlist (15 symbols)

Symbol	Company	Sector
AAPL	Apple	Consumer Electronics
MSFT	Microsoft	Software / Cloud
NVDA	NVIDIA	AI / Semiconductors
TSLA	Tesla	Electric Vehicles
GOOGL	Alphabet Inc.	Search / Cloud
AMZN	Amazon	E-commerce / Cloud
META	Meta Platforms	Social Media
AVGO	Broadcom Inc.	Semiconductors
NFLX	Netflix	Streaming Media
JPM	JPMorgan Chase	Banking
AMD	Advanced Micro Devices	Semiconductors
PLTR	Palantir Technologies	Software / AI
RIVN	Rivian Automotive	Electric Vehicles
LCID	Lucid Group	Electric Vehicles
PRM	Perimeter Solutions	Specialty Chemicals

The system is not limited to these symbols — it processes any valid US equity on-demand.

Cloud Stack: PostgreSQL (all data), Azure Blob Storage (models), FastAPI (inference), N8N (orchestration), Docker (containerized deployment). Estimated cost: ~$20–40/month using free tiers.

2. Architecture

Foundation: 5-Minute Bar Resolution

The entire system is built on a 5-minute bar foundation — a deliberate choice that balances day trading liquidity (78 bars per trading day), API rate compliance (all providers support 5-minute data on free tiers), and model training quality (sufficient observations for statistical significance).

All time horizons are expressed as bar counts, not absolute time. This makes the architecture resolution-independent:

Horizon	Bars Ahead	Time Window (at 5min)	Trading Use Case
H1	1 bar	5 minutes	Immediate scalping signals
H5	5 bars	25 minutes	Short-term momentum
H15	15 bars	75 minutes	Medium-term swing
H60	60 bars	5 hours	Long-term position

If the foundation changes to 1-minute bars in the future, the same configuration produces 1-minute, 5-minute, 15-minute, and 60-minute windows — zero code changes required.

Three-Container Deployment

Container	Port	Responsibility
Pipeline API	8001	Data ingestion, feature engineering, labeling, training
Inference API	8000	Model loading, prediction generation
Trading API	8002	Signal evaluation, order execution, position management

Separating these provides fault isolation (a crash in trading doesn’t affect predictions), independent scaling (inference is CPU-bound, pipelines are I/O-bound), and deployment flexibility (update trading logic without reloading ML models).

Clean Architecture

Interfaces — FastAPI endpoints, CLI entry points. Handles HTTP requests and responses.
Application — Pipeline orchestrators, decision engines. Coordinates workflow and validates constraints.
Domain — Trading strategies, position sizing, exit evaluation. Pure business logic with no external dependencies.
Infrastructure — Broker API clients, database adapters, storage adapters. All external communication.

This separation means trading strategies can be unit tested without databases or broker APIs, and swapping Alpaca for Interactive Brokers requires zero changes to business logic.

System Architecture

⏰ N8N Orchestrator
Every 10 minutes during market hours

↓

🔄 Pipeline API :8001
Data + Features + Labels + Training

🧠 Inference API :8000
Model Loading + Predictions

📈 Trading API :8002
Signals + Orders + Positions

↓

🗄️ PostgreSQL
All Data

☁️ Azure Blob Storage
Model Storage

3. Stage 1: Ticks — Market Data Ingestion

What It Does

Fetches real-time OHLCV (Open, High, Low, Close, Volume) data from multiple market data providers, deduplicates overlapping data using source priority, and stores it in PostgreSQL for downstream processing.

Data Sources

Source	API Type	Free Tier Limits	Bars/Call	Priority	Primary Role
Alpaca Markets	REST	Unlimited calls	5,000+	1 (highest)	Primary real-time source during market hours
Yahoo Finance	REST (yfinance)	Unlimited	Variable	2	Secondary source, high availability
Alpha Vantage	REST	25 calls/day	100	3	Historical backfill
Polygon.io	REST	5 calls/min	50,000	4	Bulk backfill for new symbols

When multiple sources provide data for the same symbol and timestamp, the system deduplicates using the priority order above. All source data is preserved — only the highest-priority version is used for downstream processing.

Each provider implements a common interface, making it straightforward to add new data sources.

Intelligent Mode Selection

Mode	When Selected	Behavior	Frequency
Incremental	New data available since last run	Fetches only bars after latest timestamp	~90% of runs
Hybrid	Provider corrections detected	Fetches new data AND re-fetches revised bars	~8% of runs
Full	Stale data (>40 days) or new symbol	Complete backfill with 40-day lookback	~2% of runs
Skip	No new data expected (off-market hours)	Returns immediately, no API calls	Off-hours

Mode selection happens per-symbol within each run. A single execution might use incremental for AAPL, hybrid for META (correction detected), and skip for PRM (off-market).

Revision Tracking

Market data providers occasionally correct historical values — a high price revised from 650.50 to 650.99, a volume figure adjusted after settlement. The system detects these corrections automatically, logs the revision (what changed, old value, new value, source), and propagates the correction through all downstream pipelines.

Resilience

97% network success rate across all production runs
3-attempt retry with exponential backoff per API call
60-second timeout prevents hanging on unresponsive providers
Session-aware processing: Correctly handles pre-market, regular, after-hours, and extended sessions
Zero data loss confirmed across all production runs since December 2025

4. Stage 2: Bars — Feature Engineering

What It Does

Transforms raw OHLCV data into a 23-column feature matrix by computing 18 technical features across 5 categories. This feature matrix is what ML models train on and what inference uses for predictions.

Technical Features

Base Features (5): Open, High, Low, Close, Volume — the raw OHLCV data carried forward from ticks.

Price-Based Features (5)

Feature	What It Measures	Trading Significance
EMA-20	Short-term trend direction (20-period exponential moving average)	Price above EMA-20 suggests uptrend; crossover with EMA-50 signals trend change
EMA-50	Medium-term trend direction (50-period exponential moving average)	Acts as dynamic support/resistance; wider spread from EMA-20 indicates strong trend
VWAP	Intraday fair value (volume-weighted average price)	Institutional benchmark; deviation signals overvaluation/undervaluation
EMA Spread	Momentum strength (EMA-20 minus EMA-50)	Widening spread = accelerating trend; narrowing = trend weakening; zero crossing = reversal
VWAP Gap	Price deviation from fair value (% difference between close and VWAP)	Large positive gap = overextended; large negative = potential mean reversion

Momentum Features (4)

Feature	What It Measures	Trading Significance
RSI	Overbought/oversold conditions (14-period Relative Strength Index)	Above 70 = overbought; below 30 = oversold; divergence signals reversals
MACD Histogram	Momentum acceleration (12/26/9 MACD difference)	Growing histogram = increasing momentum; shrinking = fading; zero cross = directional change
Bollinger %B	Price position within volatility bands (20-period, 2 std dev)	Values above 1.0 = above upper band (breakout); below 0 = below lower band (breakdown)
RSI Divergence	Directional momentum bias (RSI deviation from neutral 50)	Positive = bullish momentum; negative = bearish; magnitude indicates strength

Volume Features (3)

Feature	What It Measures	Trading Significance
Relative Volume	Unusual activity detection (current volume / rolling average)	Spikes above 1.5x often precede large price moves
Dollar Volume	Actual monetary flow (Close × Volume)	Filters noise from low-dollar-volume periods
Volume Spike	Binary flag for abnormal trading activity	Marks bars exceeding the volume spike threshold

Sentiment Features (4) — Reserved for Future Integration

Feature	Purpose
Sentiment Score	Planned integration with financial news APIs
Sentiment Rolling	Smoothed sentiment trend
Sentiment Volatility	Sentiment stability measure
Sentiment-Volume Interaction	Cross-signal between sentiment and volume

Day Trading Features (2) — Computed During Labeling

Feature	What It Measures
Returns	Forward price return at each prediction horizon
Volatility	Rolling standard deviation of returns — market stability context

Incremental Processing

The pipeline does not blindly recalculate all bars every run. It detects which data has changed:

New data — Any new market data since the last bar computation
Revised data — Any corrections from data providers

Only bars matching either condition are reprocessed. During market hours, this typically means 5–15 new bars per run. Off-market, it means zero bars processed — the pipeline exits in under 1 second.

Computation Approach

Vectorized operations: All indicator calculations use vectorized functions — no row-by-row loops
No look-ahead bias: Features use only data up to and including the current bar
Idempotent writes: Results are written in batches with upsert logic
Typical performance: ~10 bars in under 8 seconds during incremental runs

5. Stage 3: Labels — Movement Classification

What It Does

Looks into the future to classify price movements as UP, DOWN, or FLAT for each of the 4 time horizons. This creates the ground truth that models learn to predict.

Classification Logic

For each bar, the pipeline calculates the forward return at each horizon:

return_h = (close_t+h − close_t) / close_t

Label	Condition	Meaning
UP	Forward return > +0.5%	Significant upward price movement
DOWN	Forward return < −0.5%	Significant downward price movement
FLAT	Between −0.5% and +0.5%	No meaningful directional change

The 0.5% threshold was chosen to capture meaningful intraday moves while filtering out noise. Too low catches random fluctuations; too high misses actionable trading opportunities.

Multi-Horizon Processing

Horizon	Look-Ahead	Time at 5min	What It Captures
H1	1 bar ahead	5 minutes	Immediate momentum — will the current move continue?
H5	5 bars ahead	25 minutes	Short-term trend — is there follow-through?
H15	15 bars ahead	75 minutes	Medium-term swing — sustained directional move?
H60	60 bars ahead	5 hours	Long-term position — session-level trend?

A single run across 15 symbols with ~540 new bars produces approximately 2,160 labeled samples (540 bars × 4 horizons).

Additional Computed Features

Feature	Calculation	Purpose
Returns	Forward return at horizon h	The actual value being classified
Volatility	Rolling standard deviation of returns	Market stability context for the model

Efficient Processing

The pipeline checks whether any bars have changed since the last run. If no new or updated bars exist, the entire labels pipeline exits in under 1 second — no computation, no writes. During off-market hours, this is the case for every run.

During market hours, only labels for new or revised bars are processed. This scoped approach keeps labeling fast even as the database grows.

6. Stage 4: Training — Model Building

What It Does

Trains one RandomForest classifier per time horizon using the labeled data from Stage 3, evaluates performance, and uploads versioned model files to Azure Blob Storage for inference.

Feature Selection

Status	Count	Features	Rationale
Used	20	OHLCV (5) + EMA-20, EMA-50, RSI, VWAP, MACD Histogram, Bollinger %B, Relative Volume, Dollar Volume, Volume Spike, EMA Spread, VWAP Gap, RSI Divergence, Sentiment-Volume Interaction, Returns, Volatility	Core predictive features — each captures a distinct market dimension
Excluded	3	Sentiment Score, Sentiment Rolling, Sentiment Volatility	Placeholder values (reserved for future news/sentiment integration)

Production Algorithm: RandomForest

Parameter	Value	Rationale
Algorithm	RandomForest Classifier	Strong baseline for tabular data; handles non-linear patterns; resistant to overfitting
Number of Trees	100	Sufficient ensemble diversity without excessive training time
Max Depth	10	Prevents overfitting while allowing complex decision boundaries
Target Classes	3 (UP / DOWN / FLAT)	Matches the labeling classification
Output	Predicted class + probability scores for all 3 classes	Enables confidence-based filtering downstream

Why Separate Models Per Horizon

Each horizon exhibits different feature importance patterns:

Short-term (H1, H5): Momentum indicators dominate — RSI, MACD Histogram, and Bollinger %B have the highest importance.
Medium-term (H15): Mix of momentum and price-based features — EMA Spread and VWAP Gap gain importance as trend sustainability matters more.
Long-term (H60): Trend and volume indicators dominate — EMA-based features, Relative Volume, and VWAP become the strongest predictors.

Training separate models per horizon yields 60–70% accuracy versus ~55% with a single multi-horizon model — a significant improvement for a 3-class classification problem (random baseline: 33%).

Model Performance

Metric	H1 (5min)	H5 (25min)	H15 (75min)	H60 (5hr)
Accuracy	60–65%	63–68%	65–70%	62–67%
Precision	62–64%	63–66%	64–66%	62–65%
Recall	62–65%	63–67%	64–67%	62–66%
Training Time	~15s	~15s	~15s	~15s

Expandable Algorithm Architecture

The system supports multiple algorithms through configuration — switching requires changing a single config value:

Tabular Models (ready to use):

Algorithm	Strengths	When to Use
RandomForest (production)	Robust baseline, interpretable feature importance, fast training	Default choice — proven reliable
XGBoost	Often outperforms RF on tabular data; strong regularization	Seeking accuracy gains over RF
LightGBM	Faster training than XGBoost on large datasets; leaf-wise tree growth	Training data exceeds 100K samples
CatBoost	Minimal hyperparameter tuning; ordered boosting prevents overfitting	Rapid experimentation needed
Logistic Regression	Highly interpretable; fast training/inference	Benchmark to verify complex models add value
SVM	Effective non-linear boundaries with kernel trick	UP/DOWN boundary is highly non-linear

Sequence Models (planned — require sliding window feature engineering):

Algorithm	Strengths	Prerequisite
LSTM	Captures temporal dependencies across many bars	Sliding window of N previous bars as input
1D CNN	Learns local patterns in bar sequences; faster than LSTM	Same sliding window input format
Transformer	Attention mechanism for complex temporal patterns	Large dataset (S&P 100+) for effective training

Model Storage and Versioning

Models are versioned and stored in Azure Blob Storage organized by version and horizon. All models can be hot-reloaded by the inference API without container restart.

Training is currently executed 2–3 times daily using a configurable lookback window. The flow is: new labels → retrain → upload → hot reload inference.

Model Lifecycle

📊Labeled Data~535K Samples

→

🏋️Train4 RF Models

→

☁️UploadAzure Blob

→

🔄Hot ReloadNo Restart

→

⚡Predictions< 0.2s

→

📈TradingDecisions

7. Stage 5: Inference — Real-Time Predictions

What It Does

Loads all 4 trained models into memory and serves real-time predictions via a FastAPI service. For each symbol, the service produces a prediction per horizon — UP, DOWN, or FLAT — with probability scores for all three classes.

API Endpoints

Endpoint	Purpose	Input Source	Response
Predict Latest	Real-time prediction for a single horizon	Database (latest bar for requested symbol)	Predicted label, probabilities, confidence
Predict Latest All	Real-time prediction for all 4 horizons	Database (latest bar)	Array of 4 predictions
Predict	Backtesting with user-supplied data (single horizon)	Request body (feature values)	Predicted label, probabilities
Predict All	Backtesting with user-supplied data (all horizons)	Request body	Array of 4 predictions
Reload Models	Hot reload models from Azure Blob Storage	Triggers download and cache refresh	Confirmation with load times

Prediction Output

Field	Example	Description
Symbol	AAPL	Stock being predicted
Horizon	5	Predicting 5 bars (25 minutes) ahead
Predicted Label	UP	Model’s directional call
Probability UP	0.73	73% probability of upward movement
Probability DOWN	0.15	15% probability of downward movement
Probability FLAT	0.12	12% probability of no significant change
Confidence Level	HIGH	HIGH if max probability > 0.70, otherwise LOW
Predicted At	2026-02-09 10:30:00	Timestamp of prediction generation

Performance

Metric	First Request	Subsequent Requests	Notes
Single horizon	0.125s	<0.2s	DB fetch + prediction
All 4 horizons	0.134–1.2s	<0.2s	DB fetch + 4 model inferences
Model reload	65–67s	—	Downloads and caches all 4 models

Models are cached in memory at startup. Hot reload refreshes the cache without restarting the container — critical for deploying newly trained models during market hours.

Predictions are stored in the database with full linkage to the input bar, and each prediction is tracked to ensure idempotent evaluation — the same prediction never triggers multiple trades across runs.

8. Stage 6: Trading — Signal Generation & Execution

What It Does

Reads unprocessed predictions from the database, generates trading signals through multi-horizon consensus, validates each signal against portfolio constraints, submits approved orders to the configured broker, and tracks every decision for audit compliance.

The Complete Trading Workflow

Symbol Resolution — Identify symbols with unprocessed predictions, plus symbols with open positions
Prediction Fetch — For each symbol, fetch the latest predictions across all 4 horizons
Exit Evaluation — Check all open positions for exit conditions and close positions before opening new ones
Signal Generation — Pass predictions to the configured strategy to determine if a BUY or SELL signal is warranted
Forecast Marking — Mark all evaluated predictions, regardless of signal outcome
Signal Validation — Decision Engine validates portfolio constraints for each signal
Order Submission — Approved signals are submitted to the broker
Trade Persistence — Every executed order is recorded with full linkage to the triggering prediction
Summary — Return a structured summary for both machine consumption and human notification (Telegram)

Trading Decision Flow

📊 Unprocessed Predictions

↓

🔍 Multi-Horizon Consensus Check

↓

✅ Signal Generated
BUY or SELL

↓

🛡️ Decision Engine
Portfolio Constraints

↓

📤 Order Submitted

🚫 Rejected

⏭️ No Signal
Mark Processed

Signal Generation: Multi-Horizon Consensus

Rather than acting on any single prediction, the system requires agreement across multiple time horizons. This reduces noise and increases signal quality significantly.

Why consensus matters: A single horizon spike might be noise. Multiple horizons agreeing suggests a stronger underlying market pattern. Different horizons capture different dynamics — short-term momentum confirms long-term trend.

H1 (5min)	H5 (25min)	H15 (75min)	H60 (5hr)	Result
UP 75%	UP 80%	UP 72%	UP 78%	Strong BUY — all 4 horizons agree with high confidence
UP 75%	UP 70%	DOWN 65%	UP 68%	No signal — conflicting directions
UP 55%	UP 60%	UP 58%	UP 62%	No signal — confidence too low across all horizons
FLAT 70%	FLAT 75%	FLAT 80%	FLAT 72%	No signal — FLAT predictions never generate trades

Trading Strategies

Conservative Strategy:

Parameter	Value	Description
Confidence Threshold	70%	Minimum per-prediction confidence to consider
Horizons Required	3 of 4	At least 3 horizons must agree on direction
Position Multiplier	1.0×	Standard position size for all signals

Tiered Strategy (active):

Tier	Horizons Required	Confidence	Position Multiplier	Description
Tier 1 (Strong)	All 4 horizons agree	60%+ each	2.0× base size	Highest conviction — largest position
Tier 2 (Moderate)	Any 3 horizons agree	65%+ each	1.0× base size	Standard conviction — standard position
Tier 3 (Cautious)	Any 2 horizons agree	70%+ each	1.0× base size	Lower conviction but high per-prediction confidence

Evaluation proceeds from strongest to weakest tier. First match wins.

Tiered Signal Evaluation

H15min • UP 75%

H525min • UP 80%

H1575min • UP 72%

H605hr • UP 78%

→

⚖️TieredEvaluation

→

🟢 Tier 1: Strong • 2.0x

🟡 Tier 2: Moderate • 1.0x

🟠 Tier 3: Cautious • 1.0x

Decision Engine: Portfolio Constraints

Check	Rule	Rejection Reason
Duplicate Position	Cannot open a second position in a symbol already held	DUPLICATE_POSITION
Max Positions	Cannot exceed 20 concurrent positions (configurable)	MAX_POSITIONS
Concentration Limit	No single position can exceed 20% of portfolio value	CONCENTRATION_LIMIT
Buying Power	Sufficient cash must be available for the order	INSUFFICIENT_BUYING_POWER

Position Sizing

The system uses fixed-dollar position sizing: each trade targets a configurable dollar amount (default $5,000), and the quantity is calculated based on current price. The tiered strategy applies a multiplier: Tier 1 signals get 2.0× (i.e., $10,000 positions), while Tier 2 and Tier 3 get 1.0× ($5,000). Fractional shares are supported for long positions.

Exit Conditions

Condition	Trigger	Rationale
Stop-Loss	Unrealized loss exceeds 5% (configurable)	Limits downside risk per position
Opposite Signal	New prediction contradicts position direction	Respect the model’s updated view
End of Day	Within 10 minutes of session close	Avoids overnight risk; works across all brokers and extended hours

The exit framework is extensible — planned additions include smart EOD (hold overnight winners), profit-taking, and tier-based exits.

Manual Trading

The system supports manual trades through the same pipeline. A user provides symbol, quantity, and side — the request bypasses signal generation but passes through the identical Decision Engine constraint checks, position sizing validation, and audit logging. Manual trades appear alongside automated trades in all analytics and dashboards.

9. Multi-Broker Architecture

Provider-Agnostic Design

All broker interactions flow through a single abstract interface that provides rate limiting (per-broker configurable), retry logic with exponential backoff, run tracking integration, API call logging, and standardized response format regardless of broker.

Every consumer — the trading pipeline, CLI tools, REST API — works with this abstract interface. Switching brokers requires zero changes to any consumer code.

The interface defines 9 methods that every broker must implement:

Method	Purpose
Place Market Order	Submit a buy or sell order for a given symbol and quantity
Get Positions	Retrieve all open positions with current market value and P&L
Get Account Balance	Return available cash, portfolio value, and buying power
Get Market Status	Determine if the market is open, closed, or in pre/post-market
Get Latest Prices	Fetch current bid/ask/last prices for one or more symbols
Get Orders	List all open or recent orders with status
Get Order Status	Check the fill status of a specific order by ID
Get Account Details	Full account overview including margin, day trade counters, leverage
Get Activities	Execution history — fill records for audit trail and reconciliation

Three-Broker Implementation

Broker	Market	Status	API Style	Authentication
Alpaca	US equities (NASDAQ, NYSE)	Active — Production	HTTP REST (stateless)	API Key + Secret per request
Interactive Brokers	US equities (global capable)	All 9 methods implemented, pending gateway testing	TCP Socket (event-driven)	IB Gateway pre-authentication
Kotak Securities	India NSE equities + Gold/Silver ETFs	1 of 9 methods implemented, in progress	Python SDK (synchronous)	TOTP + MPIN (automated daily login)

🔄 Trading Pipeline

↓

📋 Broker Interface
9 Standard Methods • Rate Limiting • Retry Logic

↓

🇺🇸 Alpaca
US Equities • REST

🌐 Interactive Brokers
Global • TCP Socket

🇮🇳 Kotak Securities
India NSE • SDK

Broker Selection

Runtime override — Pass a provider parameter in the API request or CLI flag
Environment variable — Set for automated pipelines
Config default — Read from configuration file

Alpaca (Active)

The production broker for US equities, using Alpaca’s Paper Trading API. Stateless HTTP REST with separate URLs for trading and market data. Supports market orders, fractional shares (long positions), real-time position tracking, and P&L calculation. 200 requests/minute rate limit.

Interactive Brokers (Code Complete — Pending Testing)

All 9 interface methods implemented. Uses an event-driven TCP socket protocol where requests trigger asynchronous callbacks. A synchronous wrapper layer makes the async API compatible with the rest of the system. Pending end-to-end testing with IB Gateway.

Kotak Securities (In Progress)

India’s National Stock Exchange (NSE) for equities and Gold/Silver ETFs. Uses a synchronous Python SDK with automated TOTP-based daily login. Supports delivery-based (CNC) and intraday (MIS) product types. Single config switch between sandbox and production.

Multi-Market Database Support

All trade records include currency, country, and exchange information
The symbol registry supports symbols across multiple exchanges and regions (US, India, Japan, UK, Germany, Canada, Australia)
Exchange codes are standardized across providers
Trading hours, settlement rules, and timezone handling are metadata-driven, not hardcoded

When adding a new market, only the broker client needs implementation. All business logic works unchanged because metadata drives market-specific behavior.

10. End-to-End Performance

Azure Production Performance

The system runs every 10 minutes during market hours as Docker containers on Azure Web App, with all pipelines executing sequentially in a single cycle.

Intelligent mode selection per symbol — incremental, hybrid, skip, or full — keeps cycle times low:

Pipeline	Typical Duration	Per Symbol	Notes
Ticks	~74s	~5s	API-bound (external provider response time)
Bars	13–16s	~1s	Incremental feature computation
Labels	20–21s	~1.4s	Scoped to new/revised bars only
Total Cycle	~1.8 min	—	15 symbols per cycle

Azure Web App (B1) → Azure PostgreSQL, February 2026.

Off-market hours: When no new data arrives, the entire pipeline completes in under 2 seconds — every stage detects “no changes” and exits immediately.

Training Performance

Training runs 2–3× daily using a configurable lookback window. The current algorithm is scikit-learn RandomForest (CPU-only). The database holds 3 years of backfilled data available for training and backtesting.

Horizon	Samples (90-day window)	Training Time
H1 (5 min)	~340K samples	~100s
H5 (25 min)	~137K samples	~37s
H15 (75 min)	~46K samples	~12s
H60 (5 hr)	~10K samples	~2s
Total	~535K samples	~2.5 min

15 symbols, RandomForest (100 estimators), CPU-only. February 2026.

All 4 models upload to Azure Blob Storage after training and are hot-reloaded by the inference API without restart.

Row Accounting

Every pipeline run tracks inserts, updates, and skips separately. Zero variance between expected and actual counts has been maintained across all production runs — if the pipeline says it processed 540 bars, exactly 540 rows were modified in the database.

11. Observability & Data Quality

Three-Tier Run Tracking

Tier	What It Captures	Use Case
Pipeline	Overall status, total timing, aggregate row counts, error messages	“Did the 10:30 AM run succeed? How long did it take?”
Summary	Per-symbol breakdown — rows per symbol, timing, mode used	“Why did AAPL take longer? Was it hybrid mode?”
Event	Detailed operation logs with categories (200,000+ entries, BI-ready)	“Show me all operations for TSLA in the last hour”

Validation Tools

Tool	What It Does	Key Capabilities
Data Diagnostics	20 automated checks against the production database	Freshness validation, completeness scoring, schema verification, gap analysis
Run Analyzer	Pipeline execution analysis with drill-down	Console and UI modes, cycle detection, performance trending
Stock Analyzer	Per-symbol data quality assessment	Completeness scoring, gap detection, visual charts
Compliance Checker	15 automated code quality checks	Layer separation, config compliance, SQL compliance, Pylance integration, duplicate detection

12. Reporting API & Appsmith Dashboard

The Fourth Container

Alongside the three core containers (Pipeline, Inference, Trading), the system includes a dedicated Reporting API on port 8003 — a read-only sidecar that serves dashboard data, portfolio analytics, and broker account information.

Separating read-heavy dashboard queries from write-critical trading actions ensures a slow analytics query never blocks trade execution.

🔄Pipeline API:8001 • Write

🧠Inference API:8000 • Read

📈Trading API:8002 • Write

📊Reporting API:8003 • Read-only

Single Generic Endpoint

The API follows a registry pattern — one universal endpoint handles all report types: GET /report/{report_type}. Adding a new report requires two steps: add a generator method and register it. No route changes needed.

Available Reports

Report Type	What It Returns	Response Time	Use Case
portfolio_summary	Account balance + open positions combined view	~0.6s	Dashboard overview widget
positions	All open positions with current market value and P&L	~0.3s	Position monitoring table
orders	Order history with 40+ fields per order, filterable	~0.7s	Trade history and audit
account	Equity, buying power, margin, day trade counters	~0.6s	Account health monitoring
market_status	Current market state with timestamps	~0.3s	Market status indicator
activities	Execution history — fill records for reconciliation	~0.5s	Trade reconciliation

All endpoints support optional query parameters: days (lookback), status (order filter), symbol (single-symbol filter), activity_type (FILL, DIV, etc.), and limit (max records).

The Dashboard: Appsmith on Azure

The Appsmith dashboard runs as an Azure Web App at denduluri.net, with the four API sidecars (ports 8000–8003) deployed as Azure sidecar containers alongside it. The dashboard reads pre-computed data directly from PostgreSQL via SQL views and calls sidecars over localhost for live broker data — zero middleware overhead.

Page	What It Does
Portfolio	Real-time equity, positions, P&L, activity feed — auto-refreshes every 10 seconds during market hours
Trade Entry	Manual order form with symbol validation, broker dropdown (Alpaca / IBKR / Kotak), confirmation flow
Bars Explorer	OHLCV data table + TradingView candlestick chart with 160+ professional technical indicators
Pipeline Monitor	Run health, data coverage gaps, daily bar summaries — driven by SQL views
Model Validation	Forecast accuracy by horizon and confidence tier, prediction-vs-actual attribution

Additional capabilities: dark trading theme, role-based access (Admin / Developer / Viewer), TradingView embedded charts, mobile-responsive layout, and CSV export on all data tables.

Hybrid Data Strategy: The dashboard uses the Reporting API for live broker data (positions, balance) and SQL views for pre-computed analytics (pipeline health, prediction accuracy). This avoids hammering the broker API for static data while ensuring live data is always fresh.

13. Experiment & Backtesting Framework

The Problem

The production system trains models and executes trades — but how do we know if a different algorithm, different features, or different strategy would perform better? Without systematic experimentation, we’re flying blind.

Gap	Impact
No train/test split in production training	In-sample accuracy only — no idea if model generalizes
Hardcoded hyperparameters	No way to optimize model performance
Only RandomForest active	5 other algorithms implemented but unused
Fixed indicator parameters	Can’t test if different params improve predictions
No P&L simulation	Can’t estimate profitability without live paper trading
No experiment tracking	Can’t compare configurations systematically

The Solution: YAML-Driven Experiment Framework

A dedicated backtesting framework that lets a researcher define an experiment in a YAML config, run it from a single CLI command, and get comprehensive classification + trading + stability metrics — all tracked for comparison.

📄ExperimentYAML Config

→

📥Load DataPostgreSQL

→

📊FeaturesEngineering

→

✂️SplitTrain/Test

→

🧠TrainModel

→

📈BacktestSimulator

→

📋Metrics& Tracking

What’s Configurable

Dimension	What You Can Change	Example
Data	Symbols, date range, lookback period	Test on AAPL+NVDA+TSLA for 180 days
Features	Which indicators, indicator parameters	RSI-21 instead of RSI-14, add Bollinger Bandwidth
Labels	Classification threshold, which horizons	0.3% threshold instead of 0.5%, focus on H5+H15
Model	Algorithm, hyperparameters	XGBoost with 500 trees and max_depth=8
Training	Split method: holdout, k-fold, walk-forward	Walk-forward with 60-day train / 5-day test
Strategy	Conservative vs Tiered, confidence thresholds	Tier 1 at 65% confidence instead of 60%

Walk-Forward Validation

The most rigorous testing method. Instead of a single train/test split, the framework slides a window across historical data, producing not just accuracy numbers but stability metrics — does the model work consistently across different time periods, or was it just lucky in one window?

📚 Train: Days 1–60

↓

📊 Test: Days 61–65

📚 Train: Days 6–65

↓

📊 Test: Days 66–70

📚 Train: Days 11–70

↓

📊 Test: Days 71–75

↓

📋 Aggregate Metrics Across All Windows

Comprehensive Metrics

Classification Metrics — How accurate are the predictions?

Metric	What It Measures
Accuracy	Overall prediction correctness
Precision / Recall / F1	Per-class (UP/DOWN/FLAT) performance
Brier Score	Probability calibration — are 70% predictions correct 70% of the time?
High-Confidence Accuracy	Accuracy only for predictions with confidence ≥ 80%

Trading Metrics — Would this make money?

Metric	What It Measures
Total Return	Overall profitability including commissions and slippage
Sharpe Ratio	Risk-adjusted return (higher = better risk/reward)
Max Drawdown	Worst peak-to-trough decline (risk measure)
Win Rate	Percentage of profitable trades
Profit Factor	Gross wins / gross losses (>1 = profitable)

Stability Metrics (walk-forward only) — Is performance consistent?

Metric	What It Measures
Window Variance	Consistency of Sharpe ratio across time windows
Positive Window %	How often the strategy is profitable per window
Best/Worst Ratio	Asymmetry between best and worst window outcomes

Experiment Tracking

System	Best For	What It Provides
MLflow	ML research and experiment comparison	Rich UI for parameter search, metric comparison, artifact storage
PostgreSQL	Dashboard integration and cross-referencing	Appsmith widgets, JOIN with production tables, historical queries

The database schema uses a bt_ prefix (4 tables, 7 views, 28 indexes) — completely isolated from production data. Safe to drop and recreate without any impact on live trading.

Parameter Sweep

Beyond single experiments, the framework supports grid search: 4 algorithms × 3 tree counts × 3 thresholds = 36 experiments, each tracked and comparable. This systematically finds the best configuration rather than relying on intuition.

Why This Matters: The backtesting framework is the bridge between paper trading and real money. It quantifies confidence, detects overfitting through walk-forward validation, estimates profitability with realistic costs, and makes live deployment a data-driven decision — not a gut feeling.

14. Developer Tooling

CLI commands for daily operations and automated tools for code quality and data maintenance — all following the same Clean Architecture patterns as the core system.

CLI Commands

🖥️CLICommands

→

⚙️Factories& Services

→

🏦Broker APITrading & Sync

🗄️PostgreSQLRead & Write

Command	Purpose
execute_trades	Submit orders — automated (strategy signals) or manual override (symbol / qty / side). Same code path as Trading API.
sync_orders	Reconcile broker orders with local database — provider-agnostic via `BaseTradingClient`
manage_forecasts	Verify predictions against actuals, reconcile forecast↔trade linkage, print accuracy reports
calculate_pnl	Match EXIT orders to entries, compute realized P&L per trade
run_pipeline	Execute any pipeline stage: bars, ticks, labels, train, trading

All CLI commands share the same factories and services as the API containers — execute_trades uses the identical TradingPipelineFactory that powers the Trading API on port 8002.

Operational Tools

Tool	What It Does
Data Exporter	Export any PostgreSQL table or broker data to CSV / JSON / Parquet — filterable by pipeline, run ID, symbol, date range
Portfolio Viewer	Local Streamlit dashboard for real-time portfolio monitoring across brokers (calls same Trading API)
Broker Diagnostics	Provider-agnostic account inspector — tests all 8 trading client methods across any configured broker
Ticks Backfill	Bulk historical loader — 94 symbols × 3+ years of 5-min OHLCV from Alpaca SIP feed

Code Quality & Maintenance

Category	What It Does
Compliance (15 checkers)	Automated Clean Architecture validation — layer separation, SQL compliance, config compliance, Pylance integration, duplicate detection. 100% production code compliant (0 violations).
Validation	Per-run quality gates — row reconciliation, timing consistency, logging standards
Diagnostics	4-way API data comparison, phantom update detection for root cause investigation
Maintenance (6 scripts)	Database cleanup, deduplication, staging cleanup, session management, metadata backfill
Metadata	Multi-region symbol management — 37 symbols across 10 exchanges, 7 regions. API-first from YFinance + Alpaca + Polygon.
Schema Sync	Version-controlled SQL execution and database schema synchronization

15. What’s Next

Immediate Roadmap

Priority	Work Item	Purpose
1	Complete Kotak Securities integration	Enable India NSE paper trading alongside US markets
2	Backtesting framework implementation	Walk-forward validation, algorithm comparison, P&L simulation
3	Paper trading analytics	Win rate, Sharpe ratio, max drawdown, strategy comparison

Future Phases

Phase	Focus	What It Adds
Sentiment Pipeline	Financial news integration	Sentiment scores per symbol — hypothesis: improves predictions during earnings seasons
Fundamentals Pipeline	Earnings and financial metrics	Days-until-earnings, PE ratio vs sector — hypothesis: improves longer-horizon predictions
Multi-Model Framework	Algorithm experimentation	A/B test model versions in parallel paper trading
Advanced Trading	Additional order types and strategies	Limit orders, bracket orders, smart EOD, profit-taking exits
Live Trading	Real capital deployment	Triggered when paper trading demonstrates consistent profitability

Design Philosophy

The project follows a disciplined approach: validate before risking capital. Every component is built to be auditable (complete decision logging), config-driven (no hardcoded values in business logic), and incrementally extensible (new algorithms, brokers, and data sources slot in through configuration rather than code changes).

Real money trading is a confidence threshold, not a deadline.