Everything Statistics - September 29, 2001 (2) (idea) by 1010011010

It's flattering that Professor Pi would spend so much time on my method. What's disconcerting is how little his presentation represents what I was decribing, though his mistakes are understandible. It's obvious that the /msg function of the catbox is not the best place to discuss statystical analysis and calculus

The one that jumps out at me first is that he is not using the midpoints of the section, but rather endpoints. This, as he notes, incorporates the extreme reputations and, as expected, can skew the results. This is exactly the reason why midpoints are used.

The second, and most egregious, error is that Professor Pi has swaped the domain and range in his dataset. The reputation of the write-up represents the Y-values. The postion of a write-up in the ranked list gives you an X-value.

                    The Second Dataset

12 |                                         -
11 |                                         
10 |                                       --
 9 |                                      -  
 8 |                                    --
 7 |                                 ---     
 6 |                              ---         
 5 |                          ----           
 4 |                      ----               
 3 |                 -----                   
 2 |           ------                        
 1 |      -----                              
 0 |------____________________________________
    0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
      0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0
                        0 0 0 0 0 0 0 0 0 0 0

If we were going to use three points we would poke the write ups at 25%, 50% and 75% down the list.
In this case #52, #103, and #155, as there are 206 write-ups. Reputations as follow: #52=1 #130=4 #155=6.
These data apply to both datasets.
Node-Fu (MpNF)? 3.7
User-X's contribution to E2 (MpNFP)? 755.3

Whoops! It's been pointed out to me that the 25th and 75th percentile do not represents the midpoints of the first and last third. It should be 16.5% and 82.5% or the #34 and #170 write-ups.
The reputation of #34 is still 1, but the reputation of #170 is 7.
The result? MpNF=4 and MpNFP=824.

What was originally done was basically taking data points evenly spaced throughout the list and then discarding the highest and lowest... which may actually be a superior measure and is definitely easier to understand.

My main complaint with MNF and MNFP is not that it's easy to manipulate because it's only based on one data point. It's that it's not very representative.

User A has 66 nodes of reputation 1 and 33 nodes of reputation 40
User B has 33 nodes of reputation 1, 33 nodes of reputation -20, and 33 nodes at 40
User C has 99 nodes at 1
User D has 33 at 1, 33 at -40, and 33 at 20
User F has 66@1 and 33@-40

All of these noders have the same number of write-ups and the same median value. MNF and MNFP treats them as if they all have identical noding habits... which is clearly not the case. Whatever method is finally chosen, it should be able to effectively distinguish between the above 5 noders and order them correctly, yet not be adversely affected if each of them adds a write up to Uses of Soy in Lesbian Monkey Foreplay which is immediately C!'d and voted to 200 while still protecting users from malicious downvoting.

Everything Statistics - September 29, 2001 (3)	Everything Statistics - September 29, 2001	The hole in the ground for bodily waste when camping	root log: September 2001
SOY! SOY! SOY! according to the Babel Fish	Honor Roll	Why Won't Jesse Helms Just Hurry Up and Die?	E2 node tracker
measures of central tendency	I can't find a bra that fits right	is pi normal?	October 5, 2001
1010011010	Everything Statistics - January 20, 2002	TES	gestalt
The Golem Project	Leather Cow Statues	The New York Times calls about E2	2¢