Cross-posted from my blog:

Sabermetrics (noun): the analysis of baseball through objective evidence, especially baseball statistics.

Sabermetrics, a collection of formulas and math-rooted ideas that allows interested parties to obtain an improved understanding of baseball, is a topic about which I've blogged sporadically. The Hardball Times, Fire Joe Morgan, Baseball Prospectus, and Baseball Think Factory, among a bunch of other web sites, are excellent sources for sabermetric perspective.

Here, I discussed predictive baseball statistics. Cliffnotes: a team's adjusted Pythagorean record has better predictive value than its actual record, a hitter's predicted OPS has better predictive value than his actual OPS, and a pitcher's FIP has better predictive value than his ERA. These statements are facts because Pythagorean record, PrOPS, and FIP are less influenced by luck than won-loss record, OPS, and ERA. The idea is to isolate that of which players are in control and to remove that which is more or less random, because doing so yields information that's an improvement over traditional statistics with regard to predicting future performance.

For whatever reason, such thinking has yet to become popular in non-baseball circles. Sure, there's this (basketball's version of sabermetrics) and this (football's), but neither is as advanced or as accepted as the baseball version (to my knowledge, work in the sabermetric vein has yet to be been performed in hockey, soccer, etc.). This is probably the case because baseball is an inherently better match with math and statistics than other sports are, but who knows.

Anyway, I caught myself wondering, the other day, about sabermetrics and tennis. A Google search of "sabermetrics tennis" does yield a third result of this blog, but while semi-interesting, it's fluffy and doesn't offer much math.

I decided to do a little thinking on my own. The goal? To obtain an improved understanding of tennis (sound familiar?). Specifically, my aim was to manipulate tennis statistics in a way that caused them to have improved predictive value.

I mentioned Pythagorean record above. I asked myself, "Would it make sense to attempt to apply this concept to tennis?" At this point, I'm not completely sure of the answer to this question.

That said, I created a procedure to derive something resembling the tennis equivalent. Here are the steps I followed:

1. Find a player's profile at atptennis.com.

2. Open a new tab/window for the player's "YTD [year to date] match facts" link in the upper right portion of the screen.

3. Multiply the player's "Service Games Won %" number by his "Service Games Played" number.

4. Multiply the player's "Return Games Won %" number by his "Return Games Played" number.

5. Round the results of step three and step four to the nearest integers and sum these numbers.

6. Divide the result of step five by the sum of the player's "Service Games Played" number and his "Return Games Played" number.

7. Multiply the result of step six by the player's total number of matches (the sum of the player's wins [W] and the player's losses [L], as seen in his profile page).

8. Round the result of step seven to the nearest tenth in order to obtain a number to which I'll refer as Adjusted wins (Wa).

9. Multiply Wa by a multiplier (yikes, awkward use of language) of 1.309 (see below) in order to obtain a total for Pythagorean wins (Wp).

10. Subtract Wp from the player's total number of matches in order to obtain a total for Pythagorean losses (Lp; Pythagorean Record = Wp-Lp).

Example (Roger Federer):

1. Above

2. Above

3. (.88)(546) = 480.48

4. (.29)(533) = 154.57

5. 480 + 155 = 635

6. 635/(546 + 533) = .5885078777

7. (.5885078777)(37 + 8) = 26.48285449

8. 26.5 = Wa

9. (26.5)(1.309) = 34.7

10. (37 + 8) - 34.7 = 10.3; Pythagorean record = 34.7-10.3

Here are the results for the rest of the players currently ranked in the top 10 in the world as of Monday, June 23, 2008:

Observant viewers of the chart will understand my methodology. I divided each player's W-L percentage (1) by his Wa-La percentage (2) in order to obtain a value I called  (column eight). I added up each player's value for  and divided by 10 in order to derive a mean for , which turned out to be 1.309. All that this means is that so far in 2008, the average top 10 player has a W-L percentage of 1.309 times his Wa-La percentage.

Taking that into account, in theory, top 10 players who have a  of greater than 1.309 have been "lucky" (Federer, Nadal, Djokovic, Davydenko, Ferrer, Roddick, and Nalbandian fall into this category). Meanwhile, top 10 players who have a  of less than 1.309 have been "unlucky" (Blake, Wawrinka, and Gasquet have been "unlucky" so far). NOTE THAT MY METHODOLOGY COMPLETELY IGNORES THE BELIEF THAT CERTAIN PLAYERS ARE MORE ”CLUTCH” THAN OTHER PLAYERS!

In column nine (the rightmost column), I took each player's result from column eight and subtracted the column eight average (1.309, the multiplier I used to derive Pythagorean record). The result was each player's "Luck Score," the difference between his "luck" and the "luck" of the average top 10 player thus far in 2008.