Cricinfo Blogs
cricinfo.com About cricinfoblogs
Beyond The Blues Beyond The Test World Different Strokes From the Editor Girls Aloud Iain O'Brien Inbox
It Figures Pak Spin Shot Selection The Buzz The Confectionery Stall The Surfer Tour Diaries

Cricinfo Blogs Home
Statsguru Home

« The new, improved batting average | | The Monopolists »

December 3, 2007

Posted by Ananth Narayanan at 2:21 PM in Trivia - batting

Tackling not-outs, and answering reader queries

First let me explain the reasons for undertaking this whole exercise of extended batting averages:

  • The purpose was not to replace the conventional batting average. It was a suggestion to complement the batting average.
  • It was not a Tendulkar v Lara article. Their figures were just used for comparison.

    Let me start by replacing the first para of my article with the following, just to put to bed the Tendulkar v Lara arguments. Consider the following two outstanding batsmen, among the best of their generation.

    Richards and Kallis in Tests
    Batsman Tests Innings Not-outs Runs Average
    Viv Richards 105 182 12 8540 50.24
    Jacques Kallis 111 189 31 9197 58.31

    Richards’ average is nearly eight behind Kallis', but is he that far behind? One of the main reasons for the difference in average has been the wide disparity in not-outs between the two, 12 against 31. It might be partly because of the way Richards played, almost always in an attacking mode. Both Richards and Kallis have similar Batting Position Index values - which is the average batting position at which a batsman has batted in - of 4.16 (Richards) and 3.77 (Kallis), indicating almost similar batting positions. This analysis seeks a way to normalise such situations.

    Now to respond to some of the comments that came in:

    The 1500 runs cut-off wasn’t meant to exclude Vinod Kambli, as someone suggested (Kambli is incidentally one of my favourite players). It was determined that the overall runs per Test for a top-order batsmen was around 75. The 1500 runs meant that one would have played 20 tests, which is a fair number of games. It also allowed me to include Hussey, which ensured further discussion on this phenomenal cricketer. Selecting the top 25 batsmen was again done to allow to include Lara and Pietersen, who were two of the 5 batsmen whose EBA was greater than their Batting Average.

    The average of last ten innings could be construed as an arbitrary decision. Come to think of it, if I had taken five innings, it would have seemed too few, while 20 might have seemed too many. Ten innings represents about seven tests, which in turn is a minimum of two Test series.

    Chris made a valid point about the order of the first table, stating that it should have been ordered by batting average rather than the EBA. A valid point, and I apologise for overlooking the significance. Unfortunately I had split the EBA-ordered wide table into two smaller ones and should have re-ordered the same.

    A number of people have commented that this exercise was not needed since the final EBA table is more or less the same as the batting average table. My argument is that the result does not invalidate the analysis process.

    The question of not-outs

    The extension of not-out innings has attracted the most comments and rightly so. The approach I have taken can be construed as arbitrary. However it must be remembered that what has been done is neither a statistical extension nor a simulation-based computation. It is a fourth-dimension prediction and should be taken as it is. I can only repeat that the EBA should be taken to complement the current and much more understood batting average. The EBA can never be a substitute for batting averages since the common man can neither compute the same on his own nor understand the same easily.

    When the concept was first created, the batting average was added to the not-out innings. It was only when I reworked the same concept for this blog did I change it slightly to include current form.

    Some of the responses to the not-out issue have been interesting. Stuart says:

    A batting average measures the number of runs between dismissals. If you get 20* and 27, that is equivalent to a single innings of 47 for your batting average. It also means you cobbled together 47 runs before you got out, whether it was over two innings or one. As it stands, interpreted correctly, a batting average is a perfect measure and needs no adjustments or fiddling.

    That’s a fine analysis, and we could take this as an additional measure.

    One of the best alternatives, and quite simple to implement also, was provided by Arvind Agarwal. It is given below.

    EBA = Batting Average x (1 - (Not Out Inngs / Total Inngs) ^ 2. The computed values are:
    Lara = 52.80 (0.998 x Average)

    Sachin = 53.82 (0.980 x Average)

    Bradman = 97.93 (0.980 x Average)

    Ponting = 58.08 (0.977 x Average)

    M Hussey = 82.04 (0.945 x Average)

    My gut feel is that Arvind's computations match mine almost completely without getting into any of the not-out extension complications and very easy to compute. Again this has to be taken as an additional measure rather than a replacement of the batting average.

    There have been suggestions to take into account the match conditions, bowling attack etc., but it would be too complicated an exercise for this simple task. Similarly, the idea of using weighted averages instead of using the average of the last ten innings is a good one, but it makes the process more difficult and the results difficult to comprehend for the non-statiscally oriented people.

    Glossus has suggested considering only those innings in which the batsman was dismissed, and ignoring the not-out innings. The table below has the results for this exercise.

    Out batting average, and extended batting averages
    Batsman Tests Career average Out batting average Extended batting average
    Don Bradman 52 99.94 83.83 97.81
    Michael Hussey 18 86.18 69.05 81.34
    George Headley 22 60.83 45.61 61.33
    Herbert Sutcliffe 54 60.73 54.64 60.54
    Graeme Pollock 23 60.97 54.43 59.68
    Everton Weekes 48 58.62 54.88 58.53
    Ricky Ponting 112 59.40 49.46 58.52
    Wally Hammond 85 58.46 46.19 58.43
    Garry Sobers 93 57.78 44.06 58.16
    Ken Barrington 82 58.67 50.37 58.11
    Eddie Paynter 20 59.23 48.31 57.71
    Jack Hobbs 61 56.95 53.34 56.52
    Jacques Kallis 111 58.21 42.42 56.43
    Len Hutton 79 56.67 47.89 56.41
    Kumar Sangakkara 68 55.74 46.16 56.26
    Clyde Walcott 44 56.69 51.03 56.14
    Rahul Dravid 113 56.26 47.60 55.54
    Mohammad Yousuf 77 55.72 48.84 55.28
    Sachin Tendulkar 141 54.94 44.33 53.90
    Dudley Nourse 34 53.82 47.49 53.40
    Brian Lara 131 52.89 49.76 52.97
    Kevin Pietersen 30 52.69 50.44 52.84
    Greg Chappell 87 53.86 44.57 52.79
    Matthew Hayden 91 52.57 49.19 52.50
    Javed Miandad 124 52.57 41.97 51.62

    Charles Davis, in his blog , has commented on this computation. Some of the answers to Charles can be found elsewhere in this article. Our first basis was the career average and would probably have been more apt. However I must point out to Charles that the "not exceeding the highest score" idea was only done to prevent extremely high scores, especially when batsmen (like Sangakkara/Yousuf/Kallis) are going through an outstanding run of form. That restriction may not be needed if the career average is used. However I must point out that the standard deviation differential between the career average and last 10 innings, according to Charles himself, is less than 10%. Charles, many thanks for your comments.

     
    Feedback Feedback

    Comments

    Posted by: Justin Keeling at December 3, 2007 6:29 PM

    Not sure if this is the correct forum for this 'theory', but why do bowlers' runs conceeded also have to include fielders' mistakes (such as dropped catches, misfields and overthrows)? Why can't something like they have in baseball be instituted? (whenever I bowled I used to loath others causing runs to be added to my tally and by reactions of the internationals I think they are of similar opinion)

    Posted by: Marcus at December 3, 2007 7:12 PM

    An off-topic question: you mentioned players' Batting Position Index. Are there any weighted versions of this stat? It is clear that a player who has played 100 tests at 1 and 100 tests at 11 (batting position index of 6) will not have a similar result as a player who has played 200 times at 6. Do you know of a version that weights this such that the first player would have a position index much closer to 11?

    Posted by: Aditya Banerjee at December 4, 2007 11:45 AM

    I was just wondering what the adjusted stats for Sangakkara the batsman would look like using this method after going through this week's Ask Steven column. He seems to have scored over 2000 runs at a 90+ average. From my calculations, the average drops to around 71 from 96 (including the latest test vs Eng)

    Posted by: Sriram Subramanian at December 18, 2007 4:58 PM

    In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.

    Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...

    Posted by: joel at December 18, 2007 9:52 PM

    the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground.
    I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.

    Posted by: Malcolm at December 19, 2007 7:05 AM

    An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

    Posted by: Sriram Subramanian at December 19, 2007 9:19 AM

    Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.

    Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.

    Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.

    Posted by: Malcolm at December 19, 2007 10:59 AM

    The runs per innings is also deceptive. If you are a number 5 or 6 batsman coming in after four really good batsmen, you could realistically be called on to get small scores or have a large amount of your innings cut off by decalrations. You would then end up with a low average runs per innings. It would not be a true reflection of your talent which is what the average is supposed to be. Obviously there are more sophisticated statistical techniques that could be used to analyse the performance of a player but the average, strike rate and conversion rate that you get are an excellent indication of the quality of the player, remembering, of course that the accuracy of any statistic increases as the number of obvservation increases.

    Posted by: Joel at December 19, 2007 8:54 PM

    Hmm, good point Malcolm.
    I wonder if anyone out there would care to perform a statistical analysis of the top 25 players' MODE, to determine their most likely score on walking out to the middle? It may raise a few eyebrows, not to mention the ire of millions!!

    Posted by: Kamlesh at March 10, 2008 5:10 AM

    I don't think EBA can be more than normal batting average, as shown for Sir Sobers and other follwoing batsman:
    Garry Sobers 93 57.78 44.06 58.16
    George Headley 22 60.83 45.61 61.33
    Brian Lara 131 52.89 49.76 52.97
    Kevin Pietersen 30 52.69 50.44 52.84
    Kumar Sangakkara 68 55.74 46.16 56.26

    Kindly ractify.

    There is nothing in the calculation methodology to conclude that the EBA cannot be greater than the normal Average. In fact it can be seen that I have referred to these 5 batsmen in my article.

    Ananth

      Post your comment
    Posting Guidelines
    Name:
    Email Address:
    Comments:
    characters left
  • The Contributors

    Y Anantha Narayanan has over 35 years of IT background. Over the past 15 years, he has been concentrating on Cricket analysis and software development. He has been involved with StumpVision, Wisden, Hallmark Software and his own site www.thirdslip.com during this period.
    David Barry
    David Barry was cricket-starved when teaching English in France, and study of cricket stats was his only way to stay sane. He is now back in Brisbane, Australia, and working towards a PhD in Physics. He once played for the worst team in the G-division of Muscat's cricket league.

    After doing an MBA in marketing and working in an advertising agency, S Rajesh decided that his skills might be put to better use by number-crunching on cricket. He hasn’t regretted that decision in the last six years, and edits the Numbers Game column on cricinfo.com every Friday.

    Andrew Samson had his moments with bat and ball, once scoring 43 and taking 3 for 14 with his legbreaks, but he was much better at arithmetic, which explains why he is where he is today. Andrew has been keeping cricket stats since the days when it used to be done with pen and paper, and has been involved in scoring/stats for Radio and TV since 1987. He has been Cricket South Africa's official statistician since1994.
    Charles Davis
    A former scientist and occasional TV quiz champion, Charles Davis now works full time at sports statistics in Melbourne. His only real contribution to the Test record books came at age 4, when he formed part of the record 90,800 crowd who saw West Indies at the MCG in 1961. He has two books to his credit, and claims to be the only cricket statistician ever who has been quoted in the New York Times and in Australian Federal Parliament on the same day. Not to be confused with the West Indian batsman Charlie Davis, especially in terms of ability.
    Ric Finlay
    Having just taken early retirement as a Mathematics teacher in Hobart, Ric Finlay now fully devotes his time to recording cricket, both past and present, for the popular CSW cricket database, along with his colleague David Fitzgerald (www.tastats.com.au). His interest in the game is inversely proportional to his ability as a player, but he did once score a century after being dropped at 3 and running out three of his team-mates. His first memory of international cricket is the 1962-63 MCC tour of Australia, described as one of the most boring ever. Totally fascinated, he was instantly hooked, and has never looked back. Author of three books on cricket of a historical nature, he has provided statistics and scored for radio and television cricket coverage since 1983.
    Categories
    About (2) Allrounders (3) Batsmen v bowlers (1) Captaincy (2) Grounds (1) ODIs (3) Test cricket (4) Batting (10) Teams (1) Tests - bowling (6) Trivia (1) Trivia - batting (33) Trivia - bowling (9) Twenty20 (4) Wicketkeepers (2)
    Recent Posts
    What's a reasonable winning score in ODIs? Analysing bowlers in Test wins How far ahead is the top one - part II In a winning cause How far ahead is the top one ... Follow-up on comparing halves of players' careers Comparing the two halves of players' careers Following up on the Test batsmen peer analysis Comparing Test batsmen with their peers Test bowlers analysis: a follow-up
    Archives
    November 2009October 2009September 2009August 2009July 2009June 2009May 2009April 2009March 2009February 2009January 2009December 2008November 2008October 2008September 2008August 2008July 2008June 2008May 2008April 2008March 2008February 2008January 2008December 2007November 2007
    RSS Feeds RSS Feed
    © Cricinfo 2009