Introducing Telemetrics: The Weighted Tweet Index

Posted In News, Projects - By Darryl Woodford On Friday, January 17th, 2014 With 0 Comments

Back in October, as I was flying to and from Denver for the AoIR conference, I was reading a number of books about sporting metrics, In one of these, “Trading Bases”, which Joe Peta argues that many of the analytical tools used in, and developed for, the sports world, apply equally to his past in Wall Street Trading (which, for those familiar with my work and arguments comparing day trading to sports gambling, it will come as no surprise I think is a sensible link). As I was reading, it struck me that we were sitting on a pile of baseline data for Television series, Sporting Events, and Twitter, and so “Telemetrics” was born.

Telemetrics, which I define as the application of Sabermetric principles to television ratings and social media engagement, has thus been the focus of my work over the past couple of months. We are beginning to make substantial progress in both the historical analysis, and future prediction, of Twitter engagement around television. At this stage, I should also mention that our progress here would not be possible without a team of research assistants & embedded student projects, which includes Katie Prowd, Portia Vann and Kate Guy, who have (respectively) put together the data and assisted in testing the metrics, taken on the work of developing dashboards to display our results, and investigated and designed infographics to better communicate results to a wide audience.

Our work to date shows that we are fairly accurately able to match the Nielsen SocialGuide capture technology, at least for shows which do not exceed the 1% Twitter API restrictions (as I have discussed previously in relation to Scandal). In terms of correlations between tweets and television ratings, we have observed a substantial variation across genre, format and country. In particular, early results suggest a significant difference between reality shows and more standard television fair such as sitcoms and drams. Additionally, ‘specials’ (which includes events such as the 2012 US Election debates and award shows, as well as potentially premieres and finales of series) do not appear comparable with standard episodes of series. Finally, for now, we have also noted substantial differences between similar formats in Australia as compared to the United States, which is particularly significant since much of the literature, and examples of best practice, come from the work of researchers and organisations such as Nielsen (and their SocialGuide subsidiary) which have focused on the US market. All of this means that drawing a fixed correlation between traditional television ratings and Twitter use does not seem a sensible approach.

Big Brother 15: CCI vs Nielsen Data

The first thing we had to do was verify our collection methodology. Luckily, we have long been collecting tweets around television, as part of work discussed previously here on Big Brother 15, which expanded to comparisons between Big Brother in Australia and the United States, and subsequently reality TV shows as a whole. With the start of the 2013 TV season, we added a range of new terms including popular sitcoms, dramas and sci-fi shows in order to broaden the number of exemplars available to us. However, returning to the Big Brother data made sense for verifying our methods, as both we and Nielsen SocialGuide recorded, and Nielsen had published on their website, the number of tweets, and unique users, for a large portion of the season, as visualised below:

From our perspective, that was pretty good. Tweets matched almost exactly, and the only major difference, on 23 August (22 August in the United States), was easily attributable, being the night in which the Head of Household competition continued after the show, thus resulting in us recording an oversampling of tweets (compared to Nielsen) which were attributable to the live feed, rather than the network broadcast. That said though, we were very happy with these numbers, and meant we could be confident our methodology was producing accurate results. Thus, excluding known data outages, we were happy to move forward with the data we had collected. It is worth noting here that while we were slightly different on unique users, it was by a relatively consistent amount. We’re still not sure of the cause of this; although it seems unlikely that we were counting just enough tweets from a group of repeat users to have slightly different terms but the same overall volumes, and so with Nielsen’s methodology essentially a black-box, the mystery will remain.

The weighted tweet index

The Weighted Tweet Index (WTI) was the first metric we created as part of the Telemetrics project. Essentially, the goal was to break raw volume numbers down into it’s constituent parts. Perhaps the best way to illustrate this is by way of an example.The 1750th ranked show (at the time) by pure volume, excluding specials and sporting events was an airing of the film “Space Jam” on Cartoon Network from Friday 6 July 2012. The data (in this case from Nielsen SocialGuide) records a total of 25,033 tweets attributable to the show, making it the 3rd ranked broadcast of the day. But what if that had aired on an average day, in an average month, and on an average network (taken in this case to be an average broadcast network – e.g. ABC, CBS, FOX, NBC)?

According to our current metrics, it turns out that a Cartoon Network show can expect to see about 16% of the twitter activity of an average broadcast network. Similarly, Friday’s can expect to receive around 52% of the Twitter activity of an average day (nothing new here for TV Execs, Friday is historically a dumping ground for non-performing shows), and July – being in the offseason for TV – receives 65% of an average month. Note that there’s a lot of refinement possible here: a kids movie during the school holidays (I assume) should probably be expected to do better than average, and we currently don’t account for the time of day nor what was being scheduled against it. But, for now, let’s stick with our numbers, so:

Weighted Tweets = (Old Tweets) / (Network Factor * Month Factor * Day Factor)

Therefore, new tweets = 25,033 / (0.16 * 0.52 * 0.65) = 25,033 / 0.0541 = 462,717 tweets if shown on an average broadcast network, on an average day, in an average month. With the greater degree of precision in the model (i.e. not rounding to 2 decimal places), our actual figure here is 466,805. As it turns out, once ranked by weighted tweets, Space Jam thus moves from being the 1,750th ranked show in this period to the 317th.

Predicting the Future

Much of the utility of this approach comes not necessarily from analyzing the past, though we’ll certainly do a lot of that over the coming weeks, but the ability to predict the future. Because our ‘Weighted Tweets’ figure has been stripped of much of it’s context, we can then take this figure (across all episodes that we have data for within a particular series), and apply to it the factors that will apply when it next airs. Here are two examples:

Here you can see one interface to our model. Once a show is selected at the top, historic data is pulled from the data store, and a number of metrics are calculated. Which of these metrics is the best estimator varies depending on the volume, source, and age of the data we hold. In this case, we have a number of recent episodes and so the weighted tweets for the last month were selected. We entered the day & month the show was airing, as well as the network, and (for now) set the growth factor as 1. Growth Factor is a variable that we are still experimenting with, although currently it is largely contained within the Month factor for current/future dates. Ultimately, we predicted 157,511 tweets, and when the Nielsen data became available, we saw the results:

Not too shabby. 7,000 on a total of 150,000 is a 4.80% error; for the first version of the model we’ll certainly take that. Looking at the last 10 shows, a pure average of the raw tweet totals over that time would have been 99,591 tweets, and for the last four shows 113,100 tweets, so our prediction certainly appears to be outperforming simple averages.

Both Teen Wolf & The Bachelor were essentially premieres (Teen Wolf being a second half premiere), and Wolf Watch appears to be a companion show to Teen Wolf, and so we are currently unable to predict any of those three, so let’s move on to WWE Monday Night Raw.

Here, again, we have a lot of data, with this being a weekly show, and so the monthly average was a sensible choice. Again we added in the new variables (Monday, Jan 2014 on USA), left the growth factor alone, and predicted 215,507 tweets, and the results were again quite pleasing:

We were off by 10,594 tweets, a 4.69% error, although this time the other way. It’s interesting to note that simply taking the average of the previous 10 RAW shows would have predicted 153,398 tweets, and an average of the last four shows would have predicted 165,275 tweets, so again we do seem to be on the right track.

Note here that we’re not necessarily predicting what it will get, but what it should expect to get. In part that’s because there is still a wide range of factors we don’t account for, but even as we add more refinements to the model over the coming months, there’s still one we can never really account for: content. Just as we saw in the Big Brother work I’ve published here previously (and more formal journal versions of that are still in the works), there are some things you cannot account for, such as the racism scandal that engulfed Big Brother 15 and saw a rapid increase in tweet volume.

However, in some ways knowing what a show should expect to get can be more useful, both in providing a barometer of success for networks, producers and social media strategists, and in providing industry and researchers with a list of shows which either exceed or fail to reach these levels, thus allowing an analysis of what may have contributed to the success or failure of a particular episode or series on social media.

More on our progress here in the coming weeks, but for now we’re pushing ahead with refinements to the above model, and new metrics galore!