|Home||Race Schedule||Race Results||Race Stats||Athlete Stats||Rules|
|How-to||Search Schedule||Search Results||Search Stats||Athlete Rankings||Board|
|Archive||Photo Gallery||Mailing lists||Marketplace||Links||About|
|2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995|
The algorithm used to compute these rankings is the same as that used in 1996, 1997, and 1998.
Alternative algorithms will be evaluated during 2000, just as they were during 1999.. Don't Panic! The algorithms being considered give substantially the same results as the current one; the major difference is that they represent attempts to utilize the difficulty of the race rather than its importance. At high-level races, this makes almost no difference: important races like Team Trials are hard. Where it does make a difference is at mid-level races, which fluctuate in difficulty due to course design, changing water conditions, etc.
So don't change your training or travel plans: the way to move up in rankings will remain unchanged: go to harder races and beat stronger paddlers.
If you are interested in participating in the discussion or development process, then contact me -- my info's below.
Here's a step-by-step explanation of the algorithm used for 1999 rankings. If you don't want to read all this, skip to the end.
0. The data used is that which is available right here, on the 1999 Race Results page.
1. Each race's data is reshuffled into the same format:
Class Name(s) Time-1 Penalty-1 Total-1 Time-2 Penalty-2 Total-2 Better-Score Total-Score
Obviously, not all of this information is available for every race; fields which are not available from the race results are left blank. This is done in order to convert the many different formats in which race data is supplied into a single format that can be used for subsequent steps.
2. The classes for each race are translated from the many names that show up in results to a list of canonical racing classes. In other words, the "race classes" are turned into "ranking classes". Here's an example of part of the translation table used to do this
C-1W C-1W C-1W expert C-1W C-1 C-1 C-1 (A/B) C-1 C-1 A/B C-1 C-1 C/D C-1 C-1 Cadet C-1 C-1 Expert C-1 C-1 Jr C-1
3. Results for classes which don't currently get ranked -- that would be open boats, squirt boats, sit-on-tops, etc.; are dropped.
4. Scores with DNS, DNR, or DNF are interpreted numerically, with 999.99 used for every one of those. Mostly, this is just used to gather per-race statistics, because races where someone DNF'd don't count toward their ranking. Speaking of which, statistics for every race used in rankings are available.
5. If a race was scored with 2 seconds for touches/50 seconds for misses, nothing happens in this pass. But if it was scored with 5 seconds/50 seconds, the penalties are recalculated to the 2/50 system. This is done via a look-up table and a small algorithm. The impact of mixing results like this is negligible: in 1998 and 1997 I computed rankings both ways and the differences in final rankings were insignificant.
6. Every paddler's name is converted to canonical form, which I've hopefully spelled correctly. Here's an example:
Carleton Goold Goold, Carleton Carleton Gould Goold, Carleton Carlton Gould Goold, Carleton Goold, C Goold, Carleton Goold, Carlton Goold, Carleton Gould, Carleton Goold, Carleton
7. Every race result is converted into this form:
Class=Name Score Ratio
where "Score" is their combined-run total score for the race, and "Ratio" is the ratio of their score to the best-score-of-the-day. For example, from the Riversport Slalom in 1998:
K-1W=Thomas,_Natalie 471.65 1.904 K-1W=Potochny,_Evy 462.75 1.868 K-1W=Gelblat,_Renee 531.46 2.145 K-1W=Hearn,_Cathy 270.49 1.092 K-1W=Weld,_Kara 272.07 1.098 K-1W=Beakes,_Nancy 317.22 1.280Boats which did not complete two runs are dropped at this point. (See previous comment about how DNFs don't count toward rankings.)
8. The ratio from the previous pass is inverted to give a competitor's race ratio: this number reflects how far off they were from the best-score-of-the-day. (The boat with the best score of the day has a race ratio of 1.000.) Two lookups happen: last year's rank class (A, B, C, D or Unranked), and membership on the national A team. (The reason for this is that the strength-of-field assignment is based on this.)
K-1W=Thomas,_Natalie 0.525 C K-1W=Potochny,_Evy 0.535 U K-1W=Gelblat,_Renee 0.466 C K-1W=Hearn,_Cathy 0.916 ATEAM K-1W=Weld,_Kara 0.911 ATEAM K-1W=Beakes,_Nancy 0.781 BIn the case of boats which competed in the same race class more than once (e.g. K-1 Masters and K-1) only the better of those two results is used. This is done in order to comply as best as possible with our rules concerning competition in two age classes and to try to level the playing field. (Because, for example, someone who is 41 can take four runs, while someone who is 39 can only take two. It seems that the person taking four already has an advantage, so we shouldn't give them an addditional advantage by counting this as two races instead of one.)
9. Each race result is weighted by the race weight; the race weight is given by
(Field Strength + Importance Factor) ------------------------------------ 20where field strength and importance factor both have maximum values of 10; thus the race weight has a maximum value of 1.000. The table of assigned field strength and importance factors, along with the criteria used to make these assignments, is here. Continuing the example above, and using Riversport's field strength of 9 and importance factor of 5:
K-1W=Thomas,_Natalie 0.367 K-1W=Potochny,_Evy 0.374 K-1W=Gelblat,_Renee 0.326 K-1W=Hearn,_Cathy 0.641 K-1W=Weld,_Kara 0.638 K-1W=Beakes,_Nancy 0.547This number is a competitor's race weight: think of it as "how much credit you get for doing this well at this race against this competition".
10. All results from all races are combined. If a paddler has done more than three races, their best (highest) three race weights are selected. If a paddler has done only two races, they're assessed a 5% penalty; if only one race, a 10% penalty. These results are then averaged to give the competitor's rank ratio. Same example as before:
K-1W=Thomas,_Natalie 0.350 Bellefonte,Riversport K-1W=Potochny,_Evy 0.366 Lehigh,Riversport,Bellefonte K-1W=Gelblat,_Renee 0.369 Lehigh,Codorus,Farmington K-1W=Hearn,_Cathy 0.841 Trials-3,Trials-2,Nationals K-1W=Weld,_Kara 0.824 Trials-3,Trials-2,Nationals K-1W=Beakes,_Nancy 0.654 Trials-2,Trials-1,NOC-DBH-2
11. The results are sorted by rank ratio and separated by racing class. Each boat's percentile rank within its class is assigned, based on the boat having a value of 100.0. Letter classes (A,B,C,D) are assigned based on percentile rank. If there's a tie to three significant digits, both boats are assigned the same (ordinal) rank and the next number gets skipped. (Example: two K-1's wind up at 87.8; they are both boats A14. The next boat is A16.) This doesn't really mean much, because we use the rank ratio, not the ordinal, to decide things like eligility for team trials.
Here's the table used to decide the assignment of letter classes based on the percentile rank:
"A" Ranked 85% to 100% "B" Ranked 65% to 84% "C" Ranked 40% to 64% "D" Ranked below 40%
12. If the paddler is a citizen of a country other than the US, a notation
to that effect is added to their name. The lookup table that I use for
this is slowly becoming more accurate, but I wouldn't be surprised to find
that I've missed someone.
Note 1: Agegroups are provided for informational purposes only and have no impact on rankings.
Note 2: The presence of non-US paddlers also has no impact on rankings, since the breakpoints for class assignments as well as the cutoff for automatic admission to Team Trials are assigned on a percentile basis, not on the number of boats. All it really does is provide our guests with an inkling of how they rank among people who have competed here in the last year.
13. In the case of rec boats (plastic, cruiser, etc.), all of the above is repeated *except* that better-of-two instead of combined runs are used. In order to provide an adequate statistical basis for comparison, the rec boats are lumped together with the race boats to crunch through the numbers, then the race boats are dropped out. This ensures that at races where the overwhelming majority of boats are glass (e.g. Mid-America #2) that there are enough boats to compare against. (And since the race boats are dropped out of these calculations *before* the percentiles are calculated, people who race rec boats aren't penalized for doing so.)
Also, because rec boats haven't been previously ranked, I found it necessary to assign guesstimates to a handful (4) of rec boats in order to previde a starting point for computations. I minimized the number of such estimates (because I loathe making up numbers, even when I can do so with a high degree of confidence). It's also worth noting that if my estimates are wrong, the errors thus introduced will diminish with each iteration of rankings. To put it another way: each time the rankings are run, the effect of my initial estimates decreases, so after a few times through, even gross errors will disappear...and hopefully I didn't make any of those.
Here are the four estimates I made for rec boats:
Class Name Rank Ratio (within REC) K-1 Rec Beakes, Jason .950 K-1 Rec Poindexter, Mark .685 K-1 Rec Maxwell, Tyler .550 K-1 Rec Collins, Dave .550
These were arrived at by comparing performance in glass vs. performance in plastic and were done only to make it possible to compute initial rankings for rec boats. I think my estimates were reasonably close, given that the final rankings for these boats were:
Class Name Rank Ratio (within REC) K-1 Rec Beakes, Jason 1.000 K-1 Rec Poindexter, Mark .730 K-1 Rec Maxwell, Tyler .606 K-1 Rec Collins, Dave .643
14. That's it. Please note that although all the calculations were done to several decimal places, that does not mean that rankings are accurate to that degree. For example, the difference between a rank ratio of .453 and .456 falls well within the variability of manual timing systems. And boats which fall, say at percentile 91 and 88, are essentially indistinguishable.
Also, please note that paddlers who did well at important races with high field strength numbers may be ranked ahead of paddlers who beat them head-to-head.
For those of you who want the techie details, all the scripts are written in Perl and run on a Unix system.
best score of race (field strength + importance factor) ------------------ X ------------------------------------ X penalty competitor's score 20where the field strength and importance are assigned from the table below. The penalty is 1.0 (no penalty) for any boat doing >= 3 races, .95 for any boat doing 2 races, .9 for any boat doing 1 race.
|Factor Points||Field Strength (fastest times)||Importance of Race|
|10||4 National "A" Team athletes||Olympic/National Team Trials|
|9||3 National "A" Team athletes||CIWS Finals|
|8||2 National "A" Team athletes||CIWS Qualifiers|
|7||1 National "A" Team athlete||Team Trials Qualifier/USOF Qualifiers|
|6||"A" ranked athlete||Divisional Championships|
|Major Cup Series, Major Double Headers|
|Junior Olympic Qualifiers|
|5||"B" ranked athlete||Other Local/Regional Races|
|C-D Race Series|
|4||"C" ranked athlete||Citizens Races|
|2||"D" ranked athlete||Flatwater/Pool/Jiffy Slaloms|
The penalty (see step 10 above) is 1.0 for any boat doing >= 3 races, .95 for any boat doing 2 races, .9 for any boat doing 1 race. (See step 10 above.)