Why do machine learning models not perform well when used in live stock market prediction?
“Machine learning” by definition assumes there is a repeatable pattern that can be trained, but we all know there is some percentage of randomness to the markets that add a twist.
Overfit the model & data so backtests works well, but real performance is poor.
Not running model through a real-time simulator for some period of time as an “out of sample” test data before going live.
Not taking into account the REAL transaction costs of trading: such as ECN, SEC, FINRA, broker fees, borrow fees (if shorting), platform fees, market data fees.
Not comparing a backtest to a benchmark ($SPY at the very least).
Using backtest market data that doesn’t account for dividend payouts or stock splits, this is a common mistake.