A statistical analysis of current voting behavior on everything2

(log) by ushdfgakjasgh Sat Jan 19 2008 at 20:10:05

Introduction

There is very little data on e2 on any sort of voting behavior, moreover, there is much less data (if any at all) pertaining to how votes are distributed. This study will attempt to be an impartial comparison of the reception that writeups of different "types," such as "thing," "idea, "recipe," etc., posted currently on everything2 receive in terms of reputation.

Data collection

The data was collected by means of random samples of each node type conducted by BookReader and I. Through in10se's Writeups By Type script, we were able to view a listing of the most recent writeups posted of any type. Out of these, we selected the 20 (21 on three types; want not waste not) most recently posted. Consequentially, it must be acknowledged that, with the exception of nodes of type "Lede," which are so rare that our sample size extended back to writeups posted in 2001, this study serves only as a "snapshot," so to speak, of e2's current voting behavior. Since we had to vote on the writeups for their rep data to become available, our own votes were discounted so as not to create a bias of any sort. Data was collected in the form of node_id, total rep, upvotes and downvotes.

Analysis

After the data was collected, I separated the data into two sets: total rep per writeup type, and average upvote/total vote per writeup type. I attempted to describe the resulting data through median, mean, standard deviation, minimum, maximum, and range functions, subsequently learning to use OpenOffice. Following is the data for the "rep" portion, organized in ascending order for each statistic (I apologize for the formatting):

	
Mean	Type		Stdev	Type		Median	Type
5	Poetry		7.19	Idea		3.5	Poetry	
8.45	Fiction		7.26	Poetry		6	Person				
8.62	Person		7.39	Fiction		8	Idea	
10.1	Idea		7.4	Dream		9	Fiction		
11.8	Dream		9.35	Thing		11	Dream		
14.65	Thing		9.58	Person		11	Lede		
14.6	Personal	9.85	Recipe		14	Personal	
15.95	Review		9.97	Log		15	Thing		
16.5	Essay		10.3	Review		15	Review		
16.8	Log		10.57	Personal	16	Recipe		
18.14	Recipe		11.39	Essay		17	Log		
18.25	Lede		11.41	Place		17.5	Essay		
20.3	Event		12.62	Event		21.5	Event		
21.1	Place		19.67	Lede		22	Place		

		
Max	Type		Min	Type		Range	Type
21	Poetry		-7	Poetry		24	Dream
25	Dream		-5	Essay		28	Poetry
27	Idea		-5	Thing		29	Idea
27	Person		-4	Person		31	Fiction
29	Fiction		-4	Personal	31	Person
32	Essay		-2	Idea		39	Thing
34	Log		-2	Fiction		36	Log
34	Thing		-2	Log		36	Recipe
38	Personal	-2	Review		37	Essay
38	Review		-1	Event		40	Review
40	Recipe		-1	Lede		42	Event
41	Event		1	Dream		42	Personal
51	Place		4	Place		55	Place
65	Lede		4	Recipe		72	Lede

At this point, I spent about three hours shouting at people in the #openoffice.org IRC chat and crying into my keyboard. Following that, the arrays of ratios of upvotes/total votes for each writeup had the same functions applied to them:

Mean	Type		Stdev	Type		Median	Type	
0.63	Poetry		0.09	Recipe		0.64	Poetry
0.72	Person		0.1	Place		0.77	Lede
0.73	Fiction		0.13	Dream		0.77	Person
0.78	Idea		0.14	Lede		0.78	Fiction
0.78	Lede		0.15	Event		0.79	Idea
0.79	Dream		0.15	Log		0.8	Dream
0.81	Essay		0.16	,Idea		0.86	Log
0.81	Log		0.17	Essay		0.87	Essay
0.81	Personal	0.17	Personal 	0.88	Personal
0.86	Event		0.18	Fiction		0.89	Thing
0.86	Thing		0.18	Poetry		0.91	Event
0.88	Review		0.18	Thing		0.94	Recipe
0.91	Recipe		0.19	Review		0.95	Place
0.92	Place		0.21	Person		0.95	Review
									
									
Max	Type		Min	Type		Range	Type
0.94	Poetry 		0.22	Thing		0.37	Recipe
1	Dream		0.29	Person		0.38	Place
1	Essay		0.29	Poetry		0.45	Dream
1	Event		0.3	Personal	0.51	Event
1	Fiction		0.33	Review		0.54	Log
1	Idea		0.39	Essay		0.56	Fiction
1	Lede		0.44	Fiction		0.56	Idea
1	Log		0.44	Idea		0.56	Lede
1	Person		0.44	Lede		0.61	Essay
1	Personal	0.46	Log		0.65	Poetry
1	Place		0.49	Event		0.67	Review
1	Recipe		0.55	Dream		0.7	Personal
1	Review 		0.63	Place		0.71	Person
1	Thing		0.63	Recipe		0.78	Thing

Error analysis

The sample size was the only significant source of error in the experiment, a compromise we accepted in order to finish it in some sort of reasonable time. The e2 statistics noodlit says that there's a total of 439,495 writeups (these statistics don't exactly have a very good reputation for accuracy, but will be now for the sake of completeness), or an average of 29300 per writeup type (15 types including definition, which cannot be voted upon). A margin of error of 21.91% applies in this case. If a function is ran across all reps collected, its margin of error would be 5.85%. The margins of error calculated through this method, in spite of their questionable accuracy, should still be taken into consideration in any interpretation of the data.

Conclusion

The data speaks for itself, for the most part. The chance a writeup of any given type is going to be deleted can be calculated easily with the mean and the standard deviation. I'm not quite positive how to do that, however (integration? bleh), so I'll leave it to someone else. My data shows the writeups of type "recipe" have the best chance of success after posting, with the smallest mean standard deviation of every type and one of the largest ratios of votes to total votes on average. On the other hand, writeups of type "poetry" are much at much more danger of deletion, as the difference of the mean and standard deviation is only 0.45, meaning that roughly 20-25% of all writeups of type "poetry" are deleted.

As I previously mentioned, the lede data is significantly older than the rest of the data in the sample. To my surprise, writeups of type "lede," as measured by standard rep, were the third highest out of any type, despite usually contaning no more than two or three sentences. This the oft-hypothesized notion that writeups sitting and collecting votes over time will accumulate massive reputations regardless of quality.

In the continued interest of impartiality, it would be in poor taste to make any recommendations as per the use of this information, and so I won't. Many thanks to BookReader and every god who funneled votes my way. Thank you for your time.

The raw data of the study can be found here or here, the latter posted on e2 and the former in ODS, HTML and XLS formats.

Feel free to point out any useful corrections that should be made.

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.