Cricinfo Blogs
cricinfo.com About cricinfoblogs
Beyond The Blues Beyond The Test World Different Strokes From the Editor Girls Aloud Iain O'Brien Inbox
It Figures Pak Spin Shot Selection The Buzz The Confectionery Stall The Surfer Tour Diaries

Cricinfo Blogs Home
Statsguru Home

February 7, 2009

Posted by Ananth Narayanan at in Grounds

19 grounds, 19 years - an in-depth study





The National Stadium in Dhaka has the lowest mean, but that's also because Bangladesh has batted there so often © Cricinfo
A few readers had made comments in response to my article on "Test Openers" that the pitch/ground conditions should be taken into account while determining the value of an opener's innings. I had responded with a short message on the difficulties of determining the true nature of any pitch/ground. I have been thinking over these comments and have felt that it is essential to explore this point in depth. Thanks to Tushar, and others before, for raising queries in this regard.

The most comical situation in an ODI telecast are the pitch specialist's comments. They are as reliable as a weather forecaster's. When Ravi Shastri pontificates "it is a belter", one can be rest assured that one in two innings would have floundered to 201 for 7 in 50 overs. Alternately when David Lloyd says with his "Roses" twang that "250 should be a winning score", I alwasys look for the situation 7 hours later when the batting team has successfully chased a 300+ total. I wish the broadcasters show a split image of the pitch specialist's comments and the innings scores.

Test matches are different. Normally the specialists comment on the first session and make overall comments. One thing I am sure. No pitch specialist, no analyst or for that matter no curator can, with confidence, forecast how the pitch would behave.

This analysis covers 19 premier Test grounds across 9 countries. MCG, SCG, WACA, Lord's, Oval and Headingley lead the field. These are the major Test playing grounds, with most of these grounds clocking in at over 100 Tests. Then I have taken two grounds from each of the other six major Test playing countries. One ground from Bangladesh completes the selection. This brings up the 19 grounds.

I have taken matches played in these grounds during the last 19 years (from 1.1.1990 onwards) for consideration. Barring Calcutta and Chennai where only 9 Tests have been played during these 19 years (because of BCCI's rotation policies), the other grounds have completed 10 or more Test matches, with 32 Tests at Lord's, London leading the field. A total of 338 Tests are analysed.

Anticipating the readers' comments, I looked at excluding the Test matches played against Bangladesh and Zimbabwe. However that is fundamentally wrong since this is a statistical analysis and I cannot take casual liberties with my selection methodology. Also one of the grounds is in Bangladesh. One should also not forget the fact that a strong team like India was dismissed for 75 on the opening day by South Africa in India and the same team, a few months back, scored 705 against a strong Australia at Sydney. So all the Tests are considered.

In order to have uniform conditions I have taken the completed (all out or delaration) first innings. This is to avoid a Test abandoned with the first innings standing at 24 for 3 or 150 for 5. Later innings vary a lot and will distort the figures considerably.

Readers should remember that this is a departure from my usual analysis insofar as it is a purely statistical analysis. I have tried to make the analysis simple and understandable and explained the statistical terms. With this background, let us look at the tables.

The first is a simple table listed in order of the Mean. The mean is an alternate term for Average. It is worked out by the following formula.

        Sum of all values
Mean  = -----------------
          No. of values
Mean is a very useful value for analysis. One can make a generalised observation on a possible score at the ground. However Mean is strongly affected by very high and very low values. As such, a pinch of salt should be available nearby. I have also got the mean of the most recent 5 Tests played on the ground and presented this and compared with the mean. That shows a recent trend.

Table of Mean scores (in order of Mean)

Ground                    Num   Total   Mean   Last Ratio
                         Tests   Runs         5 mat

National Stadium, Dhaka    10    2229   222.9   238  1.07
Asgiriya Stadium, Kandy    16    4098   256.1   173  0.68
Kingsmead, Durban          16    4333   270.8   247  0.91
Basin Reserve, Wellington  24    6752   281.3   231  0.82
National Stadium, Karachi  12    3446   287.2   299  1.04
Sabina Park, Kingston      15    4373   291.5   320  1.10
Eden Park, Auckland        16    4706   294.1   282  0.96
S.S.C Ground, Colombo      29    8966   309.2   278  0.90
M.A.C Stadium, Chennai      9    2871   319.0   285  0.89
Kensington, Bridgetown     19    6115   321.8   344  1.07
Wanderers, Johannesburg    19    6118   322.0   261  0.81
Gaddafi Stadium, Lahore    16    5204   325.2   363  1.12
Melbourne Cricket Ground   20    6707   335.4   318  0.95
Sydney Cricket Ground      22    7900   359.1   399  1.11
Lord's, London             32   11665   364.5   449  1.23
Headingley, Leeds          16    5860   366.2   407  1.11
Eden Gardens, Calcutta      9    3348   372.0   426  1.15
W.A.C.A. Ground, Perth     19    7090   373.2   431  1.16
Kennington Oval, London    19    7380   388.4   374  0.96
National Stadium, Dhaka has the lowest mean. Understandable since that involves 7 innings by Bangladesh, 6 of these below 204. Asgiriya Stadium, Kandy also has a fairly low mean value. Here different teams have been dismissed for low scores. Surprisingly Kingsmead, Durban has also showed a penchant for low scores.

At the other end, Eden Gardens, WACA and Oval have had a fairly high Mean values. It is surprising that there is almost a 75% difference between the low and high Mean values.

Asgiriya Stadium, Kandy has shown an alarming dip in the first innings scores recently. The ratio is 0.68. Basin Reserve, Wellington has seen its Mean value dip by 20%. At the other end, there is a marked increase in first innings scores at Lord's.

The Mean does not reflect the data distribution truly. A simple example. A batsman scoring 100 and 0 in the two innings of a test has a Mean value of 50, which is the same value of another batsman who has scored 50 and 50. However the two values of the first batsman have a much higher degree of variance. This is determined by the measure Standard Deviation which is probably the most used of all statistical measures.

Table of Standard Deviation and CoV (in order of CoV)

Ground                      Mean StdDevn  CoV
                                  
National Stadium, Karachi  287.2   77.2  26.9 %
Melbourne Cricket Ground   335.4   92.0  27.5 %
Sabina Park, Kingston      291.5   85.9  29.5 %
Kingsmead, Durban          270.8   84.0  31.1 %
Eden Gardens, Calcutta     372.0  126.6  34.1 %
W.A.C.A. Ground, Perth     373.2  136.7  36.7 %
Sydney Cricket Ground      359.1  132.2  36.9 %
Eden Park, Auckland        294.1  116.0  39.5 %
Kennington Oval, London    388.4  154.1  39.7 %
Kensington, Bridgetown     321.8  129.1  40.2 %
National Stadium, Dhaka    222.9   92.0  41.3 %
S.S.C Ground, Colombo      309.2  130.6  42.3 %
Wanderers, Johannesburg    322.0  139.1  43.3 %
Lord's, London             364.5  163.4  44.9 %
Asgiriya Stadium, Kandy    256.1  115.3  45.1 %
M.A.C Stadium, Chennai     319.0  148.2  46.5 %
Headingley, Leeds          366.2  172.3  47.1 %
Gaddafi Stadium, Lahore    325.2  161.5  49.7 %
Basin Reserve, Wellington  281.3  147.7  52.5 %
Standard deviation is the measurement of the distribution of data about the Mean value and describes the dispersion of data on either side. A low standard deviation indicates that the data set is clustered around the mean value, whereas a high standard deviation indicates that the data is widely spread with significantly higher/lower figures than the mean. The squaring and taking root option eliminates the problem with negative values.

This calculation is described by the following formula in fig 1, where the two 'x' values represent Mean and individual value (sign immaterial). Instead of n, n-1 is used as the divisor.





Fig. 1: Standard deviation formula © Ananth Narayanan

The three English grounds have a very high value of SD, indicating quite a lot of dispersion. Karachi, Durban and Kingston have low SD values indicating a clustering of values around the Mean value.

Standard Deviation has little interpretable meaning on its own unless the Mean value is also reported alongwith. For a given standard deviation value, it indicates a high or low degree of variability only in relation to the mean value. For this reason, it is easier to get an idea of variability in a distribution by dividing the Standard Deviation with the Mean. If this is then represented as a % of Mean, it is called as Coefficient of Variation (CoV), which is a dimension-less ratio.

In general, a low CoV indicates a lower value of SD w.r.t. Mean and a high ratio indicates vice versa. Where CoV is quite high, such as Basin Reserve and Lahore, it would be next to impossible to do any prediction of expected scores. For these and a few other grounds, the SD is around half the Mean value and there is wide dispersion of scores. On the other hand look at MCG and Karachi. The low CoV indicates a heavy clustering of values around the Mean and one can do a decent attempt at predicting a score or at least a score range.

Now we come to an analysis of the quartile scores and Median. Three measures are important in this analysis. Q1 is the first quartile score, the score which is at 25% position. Q3 is the third quartile score, the score which is at 75% position. But the most important score is Q2, known more as Median which is the score at mid-point. If there are odd number of entries, the Median is the mid-score. If there are even scores, the Median is the average of the two mid-point scores.

Table of Quartile values and QVC (in order of QVC)

Ground                      SD    Q1  Median  Q3    QVC

Eden Gardens, Calcutta    119.4  305  371.0  428   0.17
Melbourne Cricket Ground   89.6  270  342.5  394   0.19
Sydney Cricket Ground     129.2  291  317.5  451   0.22
National Stadium, Karachi  73.9  216  270.5  337   0.22
S.S.C Ground, Colombo     128.3  234  285.0  380   0.24
M.A.C Stadium, Chennai    139.8  235  257.0  391   0.25
Sabina Park, Kingston      83.0  225  265.0  374   0.25
Wanderers, Johannesburg   135.4  226  302.0  411   0.29
Eden Park, Auckland       112.3  203  283.5  380   0.30
Kingsmead, Durban          81.3  198  261.5  366   0.30
National Stadium, Dhaka    87.3  160  193.5  298   0.30
Basin Reserve, Wellington 144.6  174  245.0  342   0.33
Kensington, Bridgetown    125.7  224  298.0  446   0.33
W.A.C.A. Ground, Perth    133.1  239  373.0  485   0.34
Asgiriya Stadium, Kandy   111.7  150  263.5  305   0.34
Kennington Oval, London   150.0  236  380.0  484   0.34
Lord's, London            160.8  255  350.5  528   0.35
Gaddafi Stadium, Lahore   156.4  183  291.0  398   0.37
Headingley, Leeds         166.9  198  375.5  515   0.44
The Quartile Variation Coefficient (QVC) which is determined by the formula given below represents a measure of central dispersion. It is also a dimension-less ratio. Even though this takes into account only 50% of data, the QVC is a very valuable measure since the 50% considered is the most important either-side-of-middle areas. This can also be expressed as a % value.
        Q3 - Q1
QVC   = -------
        Q3 + Q1

A low value indicates a very strong clustering of values around the Median. For instance for MCG, the Median is 342 runs, the Q1 value is only 70 runs away and the Q3 is only 52 runs away. So the Q1-Q3 differential is only 146 while the overall range, as seen next, is a whopping 392. Similar situation for Eden Gardens and SCG.

On the other hand, a high QVC indicates a thinning of the central area. Take Headingley. The median is 375, Q1 is 177 away and Q3 is 140 away. Q1-Q3 is a high 317 out of a total Range of 481 runs.

Table of Ranges and SDs (in order of Range-SD ratio)

Ground                       SD   Low High Range Ratio
                                    Score

M.A.C Stadium, Chennai     148.2  167  560  393   2.65
Headingley, Leeds          172.3  172  653  481   2.79
Sabina Park, Kingston       85.9  164  431  267   3.11
National Stadium, Dhaka     92.0  107  400  293   3.18
Kennington Oval, London    154.1  173  664  491   3.19
National Stadium, Karachi   77.2  196  450  254   3.29
Gaddafi Stadium, Lahore    161.5  147  679  532   3.29
Kingsmead, Durban           84.0  139  420  281   3.35
Eden Gardens, Calcutta     126.6  185  616  431   3.40
Asgiriya Stadium, Kandy    115.3   71  469  398   3.45
Lord's, London             163.4   77  653  576   3.52
Basin Reserve, Wellington  147.7  110  660  550   3.72
W.A.C.A. Ground, Perth     136.7   82  602  520   3.80
Wanderers, Johannesburg    139.1  119  652  533   3.83
Kensington, Bridgetown     129.1  102  605  503   3.90
S.S.C Ground, Colombo      130.6   89  600  511   3.91
Eden Park, Auckland        116.0  139  621  482   4.16
Sydney Cricket Ground      132.2  150  705  555   4.20
Melbourne Cricket Ground    92.0  159  551  392   4.26

There is another important measure which is the Range, which is the difference between the low score and high score. In other words this measure indicates the range of scores, as its name indicates. By itself the Range is of no great relevance. It has to be seen in relation to the SD. Hence I have worked out a ratio of Range to SD. The above table is sequenced by this ratio.





Fig 2: Scores at Lord's (Click here for a bigger image) © Ananth Narayanan
Normally the ratio is between 2.0 and 6.0. Anything outside these values indicates a way-out distribution of values, either a completely dispersed distribution or a completely centralized distribution.

A low value, say 2.65 for Chennai indicates a high SD value while a high value, such as 4.26 for MCG, indicates a low SD value. A low ratio indicates a wide dispersion and a high ratio indicates central clustering.

Conclusion:

1. Mean scores are a reasonable indicator of the expected score. Prediction based on Mean & SD is a possible task. Let us take Kingston. The mean is 292 and SD is 83. If one takes an empirical formula of Mean + or - 0.5 of SD, one can estimate a first innings score of between 251 to 333. One could even increase by the last 5 Test average factor, 1.10, leading to an educated estimate of 276 to 366. Let me see what happens since I am writing this before even the Kingston toss. (On 6/2/09) Ha! England scored 318, smack mid-point of this projection. Not a bad attempt.

2. Evaluation of an innings and individual score is virtually impossible. Headingley has had scores of 570 for 7 during 2007 vs West Indies and 203 all out during 2008 against South Africa. Let us say that Australia or England score 350 in the first innings at Headingley, a few months later. Compared to 2008, it is a great performance while compared to 2007, it is a poor performance. What does one do with any degree of confidence. One can use the Mean value for such analysis, with no great degree of confidence. However as a single point of measure in a broad frame of analysis, it is worth considering.





Fig 3: Scores at Headingley (Click here for a bigger image) © Ananth Narayanan
3. How does one evaluate an innings at Dhaka. The low Mean value will increase the valuation of most innings. However the low scores have been caused by string of low Bangladeshi scores. If we exclude the Bangladeshi scores, then there will be no data available. Other grounds do not present this difficulty since there are not many Bangladeshi innings. Especially in India, where BCCI, with its infinite arrogance, has never invited Bangladesh.

4. The wide variations in innings scoring patterns between grounds belonging to same country is amazing. Look at the figures for the two Pakistani grounds and two Indian grounds.

5. There is a recent batting domination in England and drop in scores at Kandy and to a lesser extent at Wanderer's and Basin Reserve.

Graphs: I have done no Graphs barring for two grounds, Lord's and Headingley - chronological scores to show the yo-yo nature of scores. A BoxPlot is an excellent means of pictorially depicting the quartile variations but we need to do one for each ground.

Please click here for a chronological list of Tests, for selected grounds.

Please click here for a list of Tests sequenced by runs scored, for selected grounds.

I have given explanations to the best of my knowledge. However since my knowledge of statistics is of an acquired nature, there might be errors and/or alternate explanations. I call upon my fellow columnists and readers to come in with their own suggestions and comments.

Comments (19)

The Contributors

Y Anantha Narayanan has over 35 years of IT background. Over the past 15 years, he has been concentrating on Cricket analysis and software development. He has been involved with StumpVision, Wisden, Hallmark Software and his own site www.thirdslip.com during this period.
David Barry
David Barry was cricket-starved when teaching English in France, and study of cricket stats was his only way to stay sane. He is now back in Brisbane, Australia, and working towards a PhD in Physics. He once played for the worst team in the G-division of Muscat's cricket league.

After doing an MBA in marketing and working in an advertising agency, S Rajesh decided that his skills might be put to better use by number-crunching on cricket. He hasn’t regretted that decision in the last six years, and edits the Numbers Game column on cricinfo.com every Friday.

Andrew Samson had his moments with bat and ball, once scoring 43 and taking 3 for 14 with his legbreaks, but he was much better at arithmetic, which explains why he is where he is today. Andrew has been keeping cricket stats since the days when it used to be done with pen and paper, and has been involved in scoring/stats for Radio and TV since 1987. He has been Cricket South Africa's official statistician since1994.
Charles Davis
A former scientist and occasional TV quiz champion, Charles Davis now works full time at sports statistics in Melbourne. His only real contribution to the Test record books came at age 4, when he formed part of the record 90,800 crowd who saw West Indies at the MCG in 1961. He has two books to his credit, and claims to be the only cricket statistician ever who has been quoted in the New York Times and in Australian Federal Parliament on the same day. Not to be confused with the West Indian batsman Charlie Davis, especially in terms of ability.
Ric Finlay
Having just taken early retirement as a Mathematics teacher in Hobart, Ric Finlay now fully devotes his time to recording cricket, both past and present, for the popular CSW cricket database, along with his colleague David Fitzgerald (www.tastats.com.au). His interest in the game is inversely proportional to his ability as a player, but he did once score a century after being dropped at 3 and running out three of his team-mates. His first memory of international cricket is the 1962-63 MCC tour of Australia, described as one of the most boring ever. Totally fascinated, he was instantly hooked, and has never looked back. Author of three books on cricket of a historical nature, he has provided statistics and scored for radio and television cricket coverage since 1983.
Categories
About (2) Allrounders (3) Batsmen v bowlers (1) Captaincy (2) Grounds (1) ODIs (3) Test cricket (4) Batting (10) Teams (1) Tests - bowling (6) Trivia (1) Trivia - batting (33) Trivia - bowling (9) Twenty20 (4) Wicketkeepers (2)
Recent Posts
What's a reasonable winning score in ODIs? Analysing bowlers in Test wins How far ahead is the top one - part II In a winning cause How far ahead is the top one ... Follow-up on comparing halves of players' careers Comparing the two halves of players' careers Following up on the Test batsmen peer analysis Comparing Test batsmen with their peers Test bowlers analysis: a follow-up
Archives
November 2009October 2009September 2009August 2009July 2009June 2009May 2009April 2009March 2009February 2009January 2009December 2008November 2008October 2008September 2008August 2008July 2008June 2008May 2008April 2008March 2008February 2008January 2008December 2007November 2007
RSS Feeds RSS Feed
© Cricinfo 2009