Saturday, December 8, 2007

The 1 to 10 scale is broken or Why I hate the number 7.

Welcome to Leave Luck to Us. Here we rate video games. But we don't rate them from 1-10. We rate them a different way but more on that later.


Whether it's women, restaurants, movies, one-liners, or video games; I've never liked the idea of rating things on a scale of 1-10. It seems to me that asking someone to rate something on a gamut more often than not turns into "Pick a number from 1-10 that's higher than 5."


It's not natural. When riffing with your friends on the latest RPG seldom does your friend comment on the 1-10ness of it. Rather, he'll comment on whether it's good or bad. Buy or sell. See or skip. Yes or no.


So why do most sites rate games numerically and how useful is the scale? How honest are the reviews?


I guess sites rate games with a number because they think that's what their readers want. To be fair, it is a convenient way to quickly encapsulate a review. But it's my opinion that therein lies a problem: It's too easy to wrap a review—whether it's positive or negative—in a 6 or 7 rating. Putting it into a box with a number on it can disguise a lot of what the author may really mean. For example, the writer may complain about control issues, audio problems, and inferior graphics. However, at the end of the review you may be surprised that the game has been issued a 7/10.


This way the writer can have it both ways: he isn't a fan of the game but a 7 means it's still playable for most of his audience. He's neither right nor wrong. If the author manned up and rated it an honest 3 he'll be bombarded with complaints from fans of the game. Now if a fan approaches him, he can say “Look. I gave it a 7. That means it's a decent title.” Conversely, if a critic of the game approaches him he can respond. “Yeah. That's why I only gave it a 7.” So my theory is that a rating of 7 is meaningless and has replaced 5 as the new average.


I wondered if I could somehow prove that 7 has become the new average and settled upon the following methodology:


I went to IGN.com for my dataset. I chose to look at PS2 games because I felt that the number of reviews would provide a fair test. Also, these reviews would span from system launch to system retirement and that should serve to flatten out any personal feelings people may have held about the system.


IGN allows users to select games with a given rating rounded down to the nearest whole number. I simply counted each title found under each rating. (IGN actually scores games to within 0.1 accuracy. I've discarded the decimals and scored the games as whole numbers. For example, 4.2 becomes a 4. It might be interesting to further break down the scores to include the decimals. I wonder if they would follow a normal distribution or IGN's curve. We'll leave that one to the chaos mathematicians.)


After counting all of the scores I was able to create the following graph:


The red bars indicate scores given to games while the black curve approximates a "normal distribution"; a bell curve.

In any healthy population you'd expect the black curve but IGN's scores appear quite skewed to the right side of the graph. In fact, it seems that "7" is the actual peak of the graph. Much as what I expected to find.


Basically, this graph shows that 7 is indeed the new average. I believe it's because, as mentioned earlier, a 7 is a nice “safe” score to give a game if a writer doesn't want to be too committal.


I started examining this issue in response to the recent controversy regarding the recent firing of Jeff Gerstmann from Gamespot allegedly for scoring Kane & Lynch a 6. Had he scored it a 7 he probably would have protected himself from much of the fallout.


At any rate, I have decided on a new system for rating games: Buy, Hold, and Sell. It's the same rating system investors use when evaluating stocks. It's honest, natural, compact, and tells you everything you need to know in a easily digested nugget. It also eliminates much of the infighting between fans because a game they like “only” scored an 8.8. It's also harder to gloss over bad reviews with a middling score. With only three available ratings—not including the qualifier “strong” as in a “strong buy”—an editor is forced to be more honest. No longer can he hide behind a meaningless 6 or 7.


Thoughts?




5 comments:

Risto said...

The Finnish school system uses a 4-10 grading, where 7 is both the arithmetic and the practical average. When faced with a 1-10 grading, I tend to follow this system ingrained in me, with 7 as the expectation value.

Anonymous said...

Many or most European schools use a grading system in which 1 (sometimes 0) is the lowest possible grade, and 10 is the highest. In these systems, a 6 (or 5.5) is considered the lowest passing grade, 5 and lower being a failure. 7 is commonly considered the average passing grade, 8 and higher being above average.

In this system, everything under 6 is unacceptably bad, and inappropriate for a game which is reasonably funny to play.

Your normal distribution would mean that over 50% of all games released are unacceptably bad. On the other hand, a seven as the average would mean that the typical game is nothing special, but not especially disappointing either, which matches my personal experience.

Anonymous said...

I don't think even accurately rated games would follow a normal distribution.

Game companies are there to make money. So I'd expect for two things to happen:

1. If the game is judged internally to be worth about a "5" then effort would be spent on polishing it, with the result of bringing it up to at least a 6 or 7. Delaying a game would make sense when you can fix a few annoying things and make it noticeably better easily, even if still far from perfect.

2. If the game is judged internally to be impossible to bring up to a good quality level, it will probably be published anyway as an attempt to cut the losses.

The Stone said...

I have decided on a new system for rating games: Buy, Hold, and Sell.

I actually remember a magazine years ago used to use that scale, but for the life of me I can't remember what it was. Anyways, nice article.

Anonymous said...

I suggest that we adopt a compas-like system, with the four points based on semistandard variables, with each compas graded from red to yellow to green based on the strength of the title. This would mean that it would be more difficult to compare the value of games, quite rightly so, as the enjoyment of games and the reviewing of such is entirely subjective.