First let me explain the reasons for undertaking this whole exercise of extended batting averages:
The purpose was not to replace the conventional batting average. It was a suggestion to complement the batting average.
It was not a Tendulkar v Lara article. Their figures were just used for comparison.
Let me start by replacing the first para of my article with the following, just to put to bed the Tendulkar v Lara arguments. Consider the following two outstanding batsmen, among the best of their generation.
Richards and Kallis in Tests
Batsman
Tests
Innings
Not-outs
Runs
Average
Viv Richards
105
182
12
8540
50.24
Jacques Kallis
111
189
31
9197
58.31
Richards’ average is nearly eight behind Kallis', but is he that far behind? One of the main reasons for the difference in average has been the wide disparity in not-outs between the two, 12 against 31. It might be partly because of the way Richards played, almost always in an attacking mode. Both Richards and Kallis have similar Batting Position Index values - which is the average batting position at which a batsman has batted in - of 4.16 (Richards) and 3.77 (Kallis), indicating almost similar batting positions. This analysis seeks a way to normalise such situations.
Now to respond to some of the comments that came in:
The 1500 runs cut-off wasn’t meant to exclude Vinod Kambli, as someone suggested (Kambli is incidentally one of my favourite players). It was determined that the overall runs per Test for a top-order batsmen was around 75. The 1500 runs meant that one would have played 20 tests, which is a fair number of games. It also allowed me to include Hussey, which ensured further discussion on this phenomenal cricketer. Selecting the top 25 batsmen was again done to allow to include Lara and Pietersen, who were two of the 5 batsmen whose EBA was greater than their Batting Average.
The average of last ten innings could be construed as an arbitrary decision. Come to think of it, if I had taken five innings, it would have seemed too few, while 20 might have seemed too many. Ten innings represents about seven tests, which in turn is a minimum of two Test series.
Chris made a valid point about the order of the first table, stating that it should have been ordered by batting average rather than the EBA. A valid point, and I apologise for overlooking the significance. Unfortunately I had split the EBA-ordered wide table into two smaller ones and should have re-ordered the same.
A number of people have commented that this exercise was not needed since the final EBA table is more or less the same as the batting average table. My argument is that the result does not invalidate the analysis process.
The question of not-outs
The extension of not-out innings has attracted the most comments and rightly so. The approach I have taken can be construed as arbitrary. However it must be remembered that what has been done is neither a statistical extension nor a simulation-based computation. It is a fourth-dimension prediction and should be taken as it is. I can only repeat that the EBA should be taken to complement the current and much more understood batting average. The EBA can never be a substitute for batting averages since the common man can neither compute the same on his own nor understand the same easily.
When the concept was first created, the batting average was added to the not-out innings. It was only when I reworked the same concept for this blog did I change it slightly to include current form.
Some of the responses to the not-out issue have been interesting. Stuart says:
A batting average measures the number of runs between dismissals. If you get 20* and 27, that is equivalent to a single innings of 47 for your batting average. It also means you cobbled together 47 runs before you got out, whether it was over two innings or one. As it stands, interpreted correctly, a batting average is a perfect measure and needs no adjustments or fiddling.
That’s a fine analysis, and we could take this as an additional measure.
One of the best alternatives, and quite simple to implement also, was provided by Arvind Agarwal. It is given below.
EBA = Batting Average x (1 - (Not Out Inngs / Total Inngs) ^ 2. The computed values are:
Lara = 52.80 (0.998 x Average)
Sachin = 53.82 (0.980 x Average)
Bradman = 97.93 (0.980 x Average)
Ponting = 58.08 (0.977 x Average)
M Hussey = 82.04 (0.945 x Average)
My gut feel is that Arvind's computations match mine almost completely without getting into any of the not-out extension complications and very easy to compute. Again this has to be taken as an additional measure rather than a replacement of the batting average.
There have been suggestions to take into account the match conditions, bowling attack etc., but it would be too complicated an exercise for this simple task. Similarly, the idea of using weighted averages instead of using the average of the last ten innings is a good one, but it makes the process more difficult and the results difficult to comprehend for the non-statiscally oriented people.
Glossus has suggested considering only those innings in which the batsman was dismissed, and ignoring the not-out innings. The table below has the results for this exercise.
Out batting average, and extended batting averages
Batsman
Tests
Career average
Out batting average
Extended batting average
Don Bradman
52
99.94
83.83
97.81
Michael Hussey
18
86.18
69.05
81.34
George Headley
22
60.83
45.61
61.33
Herbert Sutcliffe
54
60.73
54.64
60.54
Graeme Pollock
23
60.97
54.43
59.68
Everton Weekes
48
58.62
54.88
58.53
Ricky Ponting
112
59.40
49.46
58.52
Wally Hammond
85
58.46
46.19
58.43
Garry Sobers
93
57.78
44.06
58.16
Ken Barrington
82
58.67
50.37
58.11
Eddie Paynter
20
59.23
48.31
57.71
Jack Hobbs
61
56.95
53.34
56.52
Jacques Kallis
111
58.21
42.42
56.43
Len Hutton
79
56.67
47.89
56.41
Kumar Sangakkara
68
55.74
46.16
56.26
Clyde Walcott
44
56.69
51.03
56.14
Rahul Dravid
113
56.26
47.60
55.54
Mohammad Yousuf
77
55.72
48.84
55.28
Sachin Tendulkar
141
54.94
44.33
53.90
Dudley Nourse
34
53.82
47.49
53.40
Brian Lara
131
52.89
49.76
52.97
Kevin Pietersen
30
52.69
50.44
52.84
Greg Chappell
87
53.86
44.57
52.79
Matthew Hayden
91
52.57
49.19
52.50
Javed Miandad
124
52.57
41.97
51.62
Charles Davis, in his blog , has commented on this computation. Some of the answers to Charles can be found elsewhere in this article. Our first basis was the career average and would probably have been more apt. However I must point out to Charles that the "not exceeding the highest score" idea was only done to prevent extremely high scores, especially when batsmen (like Sangakkara/Yousuf/Kallis) are going through an outstanding run of form. That restriction may not be needed if the career average is used. However I must point out that the standard deviation differential between the career average and last 10 innings, according to Charles himself, is less than 10%. Charles, many thanks for your comments.
Posted by: Justin Keeling at December 3, 2007 6:29 PM
Not sure if this is the correct forum for this 'theory', but why do bowlers' runs conceeded also have to include fielders' mistakes (such as dropped catches, misfields and overthrows)? Why can't something like they have in baseball be instituted? (whenever I bowled I used to loath others causing runs to be added to my tally and by reactions of the internationals I think they are of similar opinion)
Posted by: Marcus at December 3, 2007 7:12 PM
An off-topic question: you mentioned players' Batting Position Index. Are there any weighted versions of this stat? It is clear that a player who has played 100 tests at 1 and 100 tests at 11 (batting position index of 6) will not have a similar result as a player who has played 200 times at 6. Do you know of a version that weights this such that the first player would have a position index much closer to 11?
Posted by: Aditya Banerjee at December 4, 2007 11:45 AM
I was just wondering what the adjusted stats for Sangakkara the batsman would look like using this method after going through this week's Ask Steven column. He seems to have scored over 2000 runs at a 90+ average. From my calculations, the average drops to around 71 from 96 (including the latest test vs Eng)
Posted by: Sriram Subramanian at December 18, 2007 4:58 PM
In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.
Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...
Posted by: joel at December 18, 2007 9:52 PM
the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground.
I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.
Posted by: Malcolm at December 19, 2007 7:05 AM
An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.
Posted by: Sriram Subramanian at December 19, 2007 9:19 AM
Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.
Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.
Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.
Posted by: Malcolm at December 19, 2007 10:59 AM
The runs per innings is also deceptive. If you are a number 5 or 6 batsman coming in after four really good batsmen, you could realistically be called on to get small scores or have a large amount of your innings cut off by decalrations. You would then end up with a low average runs per innings. It would not be a true reflection of your talent which is what the average is supposed to be. Obviously there are more sophisticated statistical techniques that could be used to analyse the performance of a player but the average, strike rate and conversion rate that you get are an excellent indication of the quality of the player, remembering, of course that the accuracy of any statistic increases as the number of obvservation increases.
Posted by: Joel at December 19, 2007 8:54 PM
Hmm, good point Malcolm.
I wonder if anyone out there would care to perform a statistical analysis of the top 25 players' MODE, to determine their most likely score on walking out to the middle? It may raise a few eyebrows, not to mention the ire of millions!!
Posted by: Kamlesh at March 10, 2008 5:10 AM
I don't think EBA can be more than normal batting average, as shown for Sir Sobers and other follwoing batsman:
Garry Sobers 93 57.78 44.06 58.16
George Headley 22 60.83 45.61 61.33
Brian Lara 131 52.89 49.76 52.97
Kevin Pietersen 30 52.69 50.44 52.84
Kumar Sangakkara 68 55.74 46.16 56.26
Kindly ractify.
There is nothing in the calculation methodology to conclude that the EBA cannot be greater than the normal Average. In fact it can be seen that I have referred to these 5 batsmen in my article.
Y Anantha Narayanan has over 35 years of IT background. Over the past 15 years, he has been concentrating on Cricket analysis and software development. He has been involved with StumpVision, Wisden, Hallmark Software and his own site www.thirdslip.com during this period.
David Barry was cricket-starved when teaching English in France, and
study of cricket stats was his only way to stay sane. He is now back
in Brisbane, Australia, and working towards a PhD in Physics. He once
played for the worst team in the G-division of Muscat's cricket
league.
After doing an MBA in marketing and working in an advertising agency, S Rajesh decided that his skills might be put to better use by number-crunching on cricket. He hasn’t regretted that decision in the last six years, and edits the Numbers Game column on cricinfo.com every Friday.
Andrew Samson had his moments with bat and ball, once scoring 43 and taking 3 for 14 with his legbreaks, but he was much better at arithmetic, which explains why he is where he is today. Andrew has been keeping cricket stats since the days when it used to be done with pen and paper, and has been involved in scoring/stats for Radio and TV since 1987. He has been Cricket South Africa's official statistician since1994.
A former scientist and occasional TV quiz champion, Charles Davis now works full time at sports statistics in Melbourne.
His only real contribution to the Test record books came at age 4, when he formed part of the record 90,800 crowd
who saw West Indies at the MCG in 1961. He has two books to his credit, and claims to be the only cricket statistician
ever who has been quoted in the New York Times and in Australian Federal Parliament on the same day. Not to be
confused with the West Indian batsman Charlie Davis, especially in terms of ability.
Having just taken early retirement as a Mathematics teacher in Hobart, Ric
Finlay now fully devotes his time to recording cricket, both past and
present, for the popular CSW cricket database, along with his colleague
David Fitzgerald (www.tastats.com.au). His interest in the game is
inversely proportional to his ability as a player, but he did once score a
century after being dropped at 3 and running out three of his team-mates.
His first memory of international cricket is the 1962-63 MCC tour of
Australia, described as one of the most boring ever. Totally fascinated, he
was instantly hooked, and has never looked back. Author of three books on
cricket of a historical nature, he has provided statistics and scored for
radio and television cricket coverage since 1983.