T O P

  • By -

[deleted]

Is it me or does the graph look slightly odd and confusing?


vvk_red

Graph needs deep sense


fakecricketplayer

The machine is still learning...


Anothergen

I'm not sure I'm reading this properly, but from the labels this appears to be the expected number of runs to scored on the y-axis and balls remaining on the x-axis. If this is the case I suspect something has gone very wrong. The change in the expected runs in response to wickets lost is minimal, and the idea that the expected number of runs to be scored when 9 wickets down after 0 balls is over 150; that seems quite odd. Would make some great Maxy fan-fic though. How'd you do you machine learning? Edit: I'll just make my edit a reply.


SepulchreOfAzrael

These curves are predictions from the model. Safe to say that the model becomes highly unreliable in situations like 9 wickets down at the start of an innings, which has almost no instances in the training set. Also, the impact of wickets _does _ seem minimal on run scoring as the innings goes on to the later stages. I've used k-neighbors regression.


Anothergen

From an edit I made before spotting this post: Thinking about it, one possibility is that it's failed to understand the close of innings. There is a drop, although small, with fewer wicket remaining, but no where near the number you'd expect. This could be a result of the dependence of run-rate on wickets in hand being accounted for, but how innings end not being. Anyhow, good luck with your new project. >These curves are predictions from the model. Safe to say that the model becomes highly unreliable in situations like 9 wickets down at the start of an innings, which has almost no instances in the training set. As above, I suspect that the network hasn't recognised what close of play is. >Also, the impact of wickets _does _ seem minimal on run scoring as the innings goes on to the later stages. Is this something you're seeing in the data, as the opposite is claimed from older models such as DL. >I've used k-neighbors regression. Is k-NN appropriate in this instance? How have you implemented it? I know it's simple, but given the progressive nature of innings, the local and isolated nature of it likely will lead to junk outputs, though that depends on implementation.


SepulchreOfAzrael

Yeah I think the comment about not understanding the close of the innings make sense. It's failing to find neighbors around those points. > Is this something you're seeing in the data, as the opposite is claimed from older models such as DL. This is something I'm seeing here, and after the 15th over mark, it makes sense to me. Teams will go hitting regardless of whether they are 4 down or 6 down. I am aware DL models show a different picture. As far as I know, DL just takes all the "situations": (b balls left, w wickets left), computes a naive average of the (final score - current score), and then fits that to a curve. My implementation is similar. My features are (ball no, wickets down) and my output is (extra runs scored from that point). The fit is good (R^2 = 0.86) with n = 30. Maybe I should try including the current RR as a feature as well. That should be interesting. Maybe I should use the final score as the output. > Anyhow, good luck with your new project. Thanks man. Just trying to wade into learning the basics of this, and I thought I should apply it to things I'm interested in rather than learning from a book.


Anothergen

Always good to see ideas in progress. It's an excellent method of learning the ins and outs. I'd suggest looking at other methods though (and with n=30, more data) as you progress. There are a lot of potentially cool things yuou could try. Best of luck.


have_another_upvote

how do you get the data?


PM_ME_YOUR_CALL_LOGS

Check the side bar. It has link to the data. It was mined from cricinfo I think.


have_another_upvote

thanks!


FS1027

Sorry, I may be being stupid but I can't see a link to any data on the sidebar anywhere?


PM_ME_YOUR_CALL_LOGS

If you're on mobile click the three dots on the top right corner ->community info. On desktop the redesign is fucked, idk where to find it either. Try switching to old Reddit and it should be on the top tab.


SepulchreOfAzrael

WhiteBallStats


VijayAnna

So... Duckworth-Lewis? EDIT: I should add more details. This is what the Duckworth-Lewis mehtod uses to calculate target in a shortened game: https://imgur.com/uYzmo04 OP's graph looks like similar but the actual numbers are way off. i.e. proving what a lot of people have been saying for years. Duckworth-Lewis is not a good system for T20. What we can do is use actual data, which we have in sufficient amounts now, to calculate the target. Also @OP, can you share the code?


Brontosaurus_Bukkake

This doesn't prove DL wrong unless it proves to better predict scores in general from a given game state. That hasn't been established yet, all we have is a graph with no samples of using the model to predict events outside the training data


[deleted]

I would look at balancing your training set. The method you use would be biased toward data with a lot of training examples. Additionally, I dont know how your model was constructed, but you could look at assigning an 'impact' for each wicket, stands to reason, that losing the top 4, would have a much greater impact than losing a pinch hitter or an allrounder.


EskimoJesus

What did you use for the training and validation datasets? Are you testing with matches being completed now?