Tuesday 30 December 2014

A collection of data for the analysis of video game ratings

In light of the recent controversy surrounding video games journalism and the subsequent allegations of nepotism and corruption among professional game critics, I thought it was time to take a closer look at the available review data from some well known critics. This blog post will discuss a simple desktop application for extracting review data from the Metacritic website, and, more importantly, the resulting collection of data assembled for the analysis of game review scores. All source code and game review data can be freely downloaded, used, modified and distributed. Download links are provided near the bottom of the post.

The MetaCritScan Application
MetaCritScan is a simple Windows application developed in C# for searching game review data from Metacritic. The data is displayed in a listview table and can be saved to XML file for later analysis. It supports three different types of searches which correspond to the different types of pages available on the Metacritic site:
  1. Critic reviews by game
  2. Critic reviews by critic name
  3. User reviews by game
The program is quite straightforward to use — just click the appropriate tab at the top for the type of search you want to do, choose the game and platform (or critic) and then click the SEARCH button. The game and critic lists are read from XML files supplied with the program, and these can be modified to include additional games or critics if desired. A screenshot is shown below. Note that the application has been designed and tested on the December, 2014 version of the metacritic.com website. Any future updates to the site's HTML formatting will likely break functionality.


Games List
To facilitate an organized analysis of critic and user review data, it's helpful to have a list of relevant games to work from. My interpretation of relevance is confined to all console and PC games released over the last 5 years. Mobile and handheld titles, as well as any games released prior to Jan 2010, were excluded. To ensure a good sample size of review data from popular, currently active critics, some of which haven't even been publishing review scores for 5 years, the games list includes all console and PC titles reviewed by Giant Bomb, The Escapist, Polygon or Joystiq from Jan 2010 to Nov 2014 inclusive. This is a total of 937 games covering the vast majority of big studio and well-known indie releases, along with some obscure titles you've probably never heard of.

Each entry in the games list contains the basic information available on Metacritic: the game name, available platforms, developer, publisher, and release date. Additionally, the list provides data about the game's genre and any awards it received, which I think could lead to some interesting analysis on the opinions of critics and players alike for specific types of games. The games list is stored as an XML file (games.xml) with individual entries in the format shown below.

<Game>
   <Name>Shovel Knight</Name>
   <MetaCritUrlName>shovel-knight</MetaCritUrlName>
   <Developer>Yacht Club Games</Developer>
   <Publisher>Yacht Club Games</Publisher>
   <Date>Jun 26, 2014</Date>
   <Platform>PC, 3DS, Wii U</Platform>
   <Genre>Action, Platformer, Indie</Genre>
   <Award>TGA-W</Award>
</Game>

Following is a more detailed description of the Platform, Genre and Award fields.

Platform
A comma delimited list of platforms supported by the game.

Possible entries:
PC, Xbox One, Xbox 360, Xbox, PlayStation 4, PlayStation 3, PlayStation 2, PlayStation, PlayStation Vita, PSP, Wii U, Wii, GameCube, Nintendo 64, 3DS, DS, Game Boy Advance, Dreamcast, iOS, iPhone/iPad, Mobile.

As a result of considering only the more recent games available on console or PC, the platform string will always contain at least one of PC, Xbox One, Xbox 360, PlayStation 4, PlayStation 3, Wii U, Wii.

Genre
Presents the game's genre(s) as a comma delimited list of classifiers. As Metacritic's genre data is hopelessly inadequate, the data provided here was taken primarily from Wikipedia and Steam with occasional use of Giant Bomb.

Possible entries:
Action, Adventure, Beat-em-up, Card, Collection, Exercise, F2P, FPS, Fantasy, Fighting, Graphic, Hack-and-slash, Horror, Indie, MMO, MOBA, Music, Party, Platformer, Point-and-click, Puzzle, RPG, RTS, Racing, Rhythm, Sandbox, Sci-Fi, Shoot-em-up, Shooter, Simulation, Sports, Stealth, Strategy, TPS, Trivia, Turn-based.

Genre classifiers such as Shooter, Action, Adventure, and Strategy provide only a broad idea of the type of game. They are often accompanied by a second, more informative classifier such as FPS, TPS (third person shooter), Graphic, Point-and-click, Platformer, Stealth, Hack-and-slash, Shoot-em-up, Turn-based, RTS, etc. Other classifiers such as Fantasy, Sci-Fi and Horror speak more to the general theme of a game than its mechanics.

Some clarification is needed on what qualifies as an indie game. I find that Steam and Wikipedia take a permissive view of the indie designation, often applying the label to games developed by medium-size studios who are merely independent, i.e., privately held, without financial backing from a big publisher. The definition of indie used here excludes game studios with any more than 15-20 employees. Data from LinkedIn, Wikipedia and other websites was used to estimate studio size. I do realize that reliable information on studio size at the time of development, let alone the exact number of employees actually working on a specific game, is difficult to come by. Nevertheless, considerable effort was expended to identify indie games based on the criteria mentioned above, and the data provided is believed to be reasonably accurate.

Award
A comma delimited list of awards received by a game at selected festivals and award shows. Data was gathered from Wikipedia and the respective websites for each competition. Entries in this field are of the form [competition]-[prize] where:

Possible competitions:
IndieCade = IndieCade Festival
IGF = Independent Games Festival
GDC = Game Developer Choice Awards
VGA = The Video Game Awards
TGA = The Game Awards
GJA = Golden Joystick Awards

Possible prizes:
GW = grand prize winner
GF = grand prize finalist (nominee)
W = minor category winner
F = minor category finalist (nominee)

Rather than attempting to enumerate all the different awards handed out by these competitions, the only distinction made is for the grand prize or game of the year. All other awards are categorized as minor. If a game was a winner or finalist of multiple minor awards, only a single instance of the award entry will be present (no duplication). A few other notable award shows, such as BAFTA and The Game Critics Awards, were neglected.

Game Review Data
Using the MetaCritScan application along with the game and critic lists described above, a fairly comprehensive collection of game review data from both critics and users has been compiled. The collection is comprised of three basic types of review data which correspond to the different types of pages available on Metacritic - critic reviews by game, critic reviews by critic name, and user reviews by game. The information obtained from each review page was stored as an XML file, the format of which is described in greater detail below. The size of the collection is
  1. Critic reviews by game – 1931 files
  2. Critic reviews by critic name – 583 files
  3. User reviews by game – 1931 files
Notice that the number of critic and user game review files (1931) is larger than the number of games (937). This is because some games have been reviewed on multiple platforms. The critic reviews by critic name data set includes the complete record of reviews from numerous prominent critics, namely 1UP, Destructoid, Edge, Eurogamer, Game Informer, Gamereactor Denmark, GameSpot, GamesRadar, GameTrailers, Giant Bomb, IGN, Joystiq, Kill Screen, LevelUp, Machinima, PC Gamer, Polygon, RPG Fan, The Escapist, VideoGamer, XGN.

Critic reviews by game
This type of page presents a list of critic reviews for a specific game and platform. Prominently displayed near the top of the page is the Metascore, which is an average of the individual critic scores. Each entry includes the critic name, their review score, a short blurb from the critic's review, and a link to the full review article on the critic's website. Note that the critic scores are on a 0-100 scale.

Example page:
http://www.metacritic.com/game/xbox-one/call-of-duty-advanced-warfare/critic-reviews?num_items=100

The critic review data for a game is stored in an XML file with the naming convention critic_[game]_[platform] where [game] and [platform] are in lowercase dashed style. For example, in the present case the relevant file is critic_call-of-duty-advanced-warfare_xbox-one.xml. This file contains up to the first hundred individual critic reviews as well as a summary of all critic review scores for the game.

Individual review format: 
<CriticReview>
   <Game>Call of Duty: Advanced Warfare</Game>
   <Platform>Xbox One</Platform>
   <Critic>Destructoid</Critic>
   <Score>80</Score>
   <Date>Nov 3, 2014</Date>
   <FullReviewUrl>http://www.destructoid.com/review-call-of-duty-advanced-warfare-283217.phtml</FullReviewUrl>
   <Comments>Advanced Warfare plays it a little too safe with the campaign, but it feels like a real core entry in the series, and will please fans who are jaded after last year's release. While Treyarch is still the king of Call of Duty in my eyes, Sledgehammer Games has shown itself to be more than capable of taking over with its debut entry. Infinity Ward is now the odd man out.</Comments>
</CriticReview>

Review summary format: 
<CriticReviewSummary>
   <Game>Call of Duty: Advanced Warfare</Game>
   <Platform>Xbox One</Platform>
   <Metascore>82</Metascore>
   <NumReviews>50</NumReviews>
   <NumPosReviews>43</NumPosReviews>
   <NumMixReviews>7</NumMixReviews>
   <NumNegReviews>0</NumNegReviews>
</CriticReviewSummary>

Critic reviews by critic name
This type of page, referred to by Metacritic as a 'publication profile', presents a list of game reviews by a specific critic. Also displayed is a summary of the critic's aggregate stats, which includes their average review score, number of positive and negative scores, a comparison to other critics, etc. The more prolific critics who've been around for a long time, such as IGN and GameSpot, have generated many thousands of reviews which are presented over a series of pages.

Example page:
http://www.metacritic.com/publication/joystiq?filter=games&num_items=100&page=0

The data from a dedicated critic page is stored in an XML file with the naming convention critic_[criticname]_[page] where [criticname] is in lowercase dashed style. For example, in the present case the relevant file is critic_joystiq_page1.xml. Each file contains one hundred individual critic reviews as well as a summary of the critic's review scores over all time.

Individual review format:
<CriticReview>
   <Game>Saints Row IV</Game>
   <MetaCritUrlPlatform>pc</MetaCritUrlPlatform>
   <Critic>Joystiq</Critic>
   <Score>100</Score>
   <Date>Aug 14, 2013</Date>
   <FullReviewUrl>http://www.joystiq.com/2013/08/14/saints-row-4-review/</FullReviewUrl>
   <Comments>Every single thing in Saints Row 4 is worth doing, which is a huge accomplishment on its own, but its story missions in particular are inventive, hilariously unexpected examples of truly inspired game design.</Comments>
</CriticReview>

Review summary format:
<CriticReviewSummary>
   <Critic>Joystiq</Critic>
   <AvgScore>73</AvgScore>
   <NumReviews>766</NumReviews>
   <NumPosReviews>421</NumPosReviews>
   <NumMixReviews>277</NumMixReviews>
   <NumNegReviews>68</NumNegReviews>
</CriticReviewSummary>

User reviews by game
This type of page presents a list of user reviews for a specific game and platform. The overall User Score near the top of the page is an average of the individual user scores. Each entry includes the user name, their review score, and a comment from the user. Note that the user scores are on a 0-10 scale.

Example page:
http://www.metacritic.com/game/playstation-4/grand-theft-auto-v/user-reviews?num_items=100

The user review data for a game is stored in an XML file with the naming convention user_[game]_[platform] where [game] and [platform] are in lowercase dashed style. For example, in the present case the relevant file is user_grand-theft-auto-v_playstation-4.xml. This file contains up to the first hundred individual user reviews as well as a summary of all user review scores for the game.

Individual review format:
<UserReview>
   <Game>Grand Theft Auto V</Game>
   <Platform>PlayStation 4</Platform>
   <User>VideoGamePlayer</User>
   <Score>7</Score>
   <Date>Nov 18, 2014</Date>
   <Comments>Graphics could be better. I was expecting more. First Person mechanics could use some ironing out. Same game as before but with better graphics which aren't all that good.</Comments>
</UserReview>

Review summary format:
<UserReviewSummary>
   <Game>Grand Theft Auto V</Game>
   <Platform>PlayStation 4</Platform>
   <UserScore>8.2</UserScore>
   <NumReviews>747</NumReviews>
   <NumPosReviews>614</NumPosReviews>
   <NumMixReviews>31</NumMixReviews>
   <NumNegReviews>102</NumNegReviews>
</UserReviewSummary>

Summary and Download Links
This post presented a software tool for searching video game review data from the Metacritic website. If you'd like give it a try or inspect/modify the source code, here are the (Google Drive) direct download links:
In addition, a sizable collection of critic and user game review data for recent (2010-2014) console and PC games has been created, including a games list supplemented with genre and award data not available from Metacritic. The data collection is intended to be used for research on game review scores. I think there are quite a few potentially interesting angles of analysis here. For example, is there a significant disparity in the way critics and users rate indie vs big studio titles? It has also been suggested that certain critics may be biased towards specific platforms. I'll be looking at some of these ideas in future blog entries. The XML version of the collection can be downloaded here:
My preferred software package for data analysis is Matlab, so I also created a more compact version of the collection in MAT (Matlab data file) format. It includes the relevant critic review by critic name data as well as the summaries for the critic and user review by game data, which is believed to be sufficient for most purposes. If you don't have Matlab or prefer a different package, nearly every programming environment offers a DOM parser which should allow you to work with the XML files instead.
Finally, it's also worthwhile to provide some Matlab functions and scripts for reading the games, critics, platforms, and review data from their respective XML files into Matlab struct arrays. These should be helpful for performing analysis with the XML version of the data collection in Matlab. On the other hand, working with the MAT version of the data collection allows the saved variables to be loaded directly into the workspace, such that most of the code provided here isn't needed.

No comments:

Post a Comment