Paper 1: A Multi-Model Adaptive Q-Learning Framework for Robust Portfolio Management in Stochastic Markets
Abstract: This study presents TAQLA, a new Tabular Adaptive Q-Learning Agent for portfolio management in stochastic financial markets. TAQLA rests on a multi-model reinforcement learning (RL) architecture that integrates parameter-adaptive Q-Learning mechanisms into softmax-based exploration to reconcile short-term profit maximization with long-term capital preservation. The method is contrasted with vanilla Q-Learning, SARSA, and a random trading policy using simulated equity market data. Empirical analysis shows that TAQLA performs better on profitability, risk-adjusted performance, and drawdown minimization, with a last portfolio value of $1687.45 (+68.74% of initial capital), a Sharpe ratio of 1.41, and a maximum drawdown of just 12.8%. Q-Learning and SARSA, on the other hand, yield Sharpe ratios below 1.0 and drawdowns exceeding 18%. Parameter sensitivity analysis across β (softmax temperature), α (learning rate), and γ (discount factor) reveals that aggressive exploration (β ≈ 1.0–1.5) and reasonable discounting (γ ≈ 0.4–0.6) generate the most aggressive and robust outcomes. Such outcomes place TAQLA as a robust RL-based adaptive portfolio control method under uncertainty, with improved capital appreciation and robustness to adverse market conditions.
Keywords: Reinforcement learning; Q-Learning; tabular reinforcement learning; portfolio management; dynamic asset allocation