This has been great fun. I decided the way to go was rather than just predict the finishing positions, I would instead try to predict the running time of each horse. The data scraping took me a week or so and then I had another week of errors because I'm shit at Python still and also didn't really understand how regression models and machine learning work.
Anyway, I finally had a breakthrough on Sunday night and the back testing was successful so last night I thought I'd try to predict today's racing. Unfortunately I've only set it up for chase races using 2018-2023 data so far (partly because I kept running into self-induced errors using the full dataset, but also I think my CPU won't cope with it all together) and there's a whopping two chase races today and they're both at the clown course Hereford so we've got just two shitty Class 5s, so not the best start. That might be a blessing in disguise though because I've not cracked the input of racecards yet so it took a lot of manual work to enter today's 15 runners into the model.
Today's predictions:
14:30 - #3 wins in 248.8303s (2nd #6, 3rd #2)
15:30 - #2 wins in 333.093s (2nd #4, 3rd #1)
Actual result:
14:30 - #3 wins in 248.9s
(2nd #5, 3rd #2; paid 3 places)
15:30 - #5 wins in 336.6s (2nd #4, 3rd #2; paid only 2 places)
14:30 returns a win at 15/2 and a place at 11/4.
15:30 returns a place at 3/1. The predicted time for the actual winner was 336.2674s so again, incredibly, pretty bang on. Just turned out to be a slower race than anticipated.
I haven't obviously just stumbled across a quick route to millions but this is so much fun. It seems like it might be a good predictor, I just need to find a way to input racecards that isn't 100% manual for this to be worth bothering with, it takes way too long as it is. I'm excited to see what is says for the National on Saturday. Although I feel this model might work better for flats (less random opportunities for falls, even more trend indicators like stall draw).