The Week in Telemetrics: Advanced Metrics & Fine-Tuning

Posted In News, Projects - By Darryl Woodford On Monday, January 20th, 2014 With 0 Comments

A new week, and so time for an update on progress in Telemetrics. In general, the last 10 days has primarily been concerned with adding additional fields and detail to our data store, as the first stage in a process to investigate the relative importance of each of the factors impacting on Tweets, in order to better account for them in both our historic analysis and future predictions. We have pulled data from a number of sources, and so a significant amount of time over the past week has been spent scraping and formatting the data, establishing a key system so that they become relational, and spot-checking (and using excel formulae) to verify the integrity and accuracy of the data:

Evaluating our predictions

Over on my own blog, we have been predicting tweet numbers for a selection of upcoming TV shows. It became immediately obvious is that our prediction methodology coped badly with Second half premieres. We had deliberately excluded season premieres and finales from our predictive data set, but it now appears clear that second half premieres (i.e. after a show goes on a winter hiatus, but within the same season) also receive a significant boost in tweets. With our new data (scraped from EpGuides & Wikipedia), we are now able to systematically identify these episodes, and plan to evaluate whether we can make a simple adjustment to account for the boost such shows receive. Overall, our average error for last week was 33.19%, however if we exclude the second half premieres that falls to 9.21%, which is within the realms of what we were expecting, and easily out-performs the simple average of the last 10 or 4 shows. Taking a simple average of the last 10 episodes would have recorded an overall error of 35%, falling to 25.5% once the second half premieres are excluded, a much larger figure than our 9.21%, although of course the sample sizes here are still small.

Filling out the last two months: Replacement Value & Tweets Above Replacement (TAR)

One significant issue we had been experiencing using our preferred “average weighted tweets over past two months” (of episodes) metric was that of small sample size. While I am reasonably confident that we can demonstrate the last 8 shows are a better estimator than the weighted average of all shows in our database, for shows we are not manually tracking there can be many holes in this data. A show which barely scratches the Nielsen top 5, and isn’t in our tracking system, may be missing data for 75% of the last two months shows, as seen here:

If we, as we have previously, simply don’t count shows for which we have don’t have tweet figures, we are heavily biasing the estimates. As you can see in our prediction data above, we predicted 10,245 tweets last week for “The Carrie Diaries”, which turned out to be a 12% over-estimate. A better solution here is to adapt a concept from sabermetrics called “replacement value” . Essentially, in baseball and other sports, replacement level is defined as the production you could expect from signing a “free agent”, that is – the next best player who is not under contract to any major league team. This allows, for example, to predict the impact of an injury, trade or contract expiry on a teams production, but also to measure other players against this level; for example through Wins Above Replacement. I plan to return to the latter concept, or in this case Tweets Above Replacement (TAR) in the future, as a means of measuring how well a show does in comparison to throwing a “replacement level” show on in that timeslot.

For our current purposes, and the Carrie Diaries example, we are more concerned with predicting how many tweets a particular episode may have got, despite the final statistics not being (freely) available. However, we do know what the bottom ranked show on Friday is, which – for example – was 3,700 tweets on 15th November, 1,600 tweets on 22nd November, and 6,600 tweets on 6 December. However, here too, we are slightly thrown by Nielsens method of reporting, as the Tonight Show was ranked fourth with 1,600 tweets, but ahead of BBC America’s “An Adventure in Space and Time” which ranked 5th with 9,200, based on number of impressions.

So, for now, let’s call the replacement value for Friday nights 5,000 (we’re currently calculating replacement values for each Day/Month combination). If we plug 5,000 in for those missing episodes, instead of ignoring them, our weighted tweets formula would have predicted 8552 tweets, an error of just 5.2% compared to the previous 12.2%.

We’re probably a few days away from properly integrating this into the prediction algorithm, but we do hold high hopes for this improvement.

Sometimes you can be too precise, or the problems of small samples

Another, perhaps short lived, tweak over the past 10 days was experimenting with weekly indexes rather than monthly. The idea here was that such a system would better account for weeks such as ‘sweeps’, when networks tend to put on their strongest programming, as well as for the effect of major events such as the Superbowl, Oscars, Golden Globes and so on. Particularly with the wresting shows, such a method would also account for the cycle around events such as Wrestlemania and Summerslam, which anecdotally appear to signal increased tweeting. For much of our older data, where we have 50-100 weekly shows weekly indexes appear to offer small incremental advantages in forward-predictions over the monthly indexes utilised in the above predictions.

However, when predicting for the future (where weekly indexes can be more volatile), and for the last few months of data when we often only have 25-50 shows in a week, the volatility of this index negated such advantages and actually showed a decrease in prediction accuracy. In particular, weeks 52 and 53 of 2013 and 1 of 2014 (i.e. over the Christmas break) saw such low weekly indexes that any show that had mild success during those weeks was suddenly forecast to have season-highs in their next episode, when in actuality the index was primarily impacted by a few extremely low days over the actual Christmas holiday (particularly when we exclude sporting events – i.e. the NBA and College Football bowls which dominate American broadcasting over the Christmas period).

It may be that eventually we can recover those incremental gains, but for current purposes we plan to return to the monthly indexes. This also means we can more easily predict shows for the current month, where the index has already been somewhat established, as opposed to ‘predicting’ the monthly index itself based on historical data and trends.

Some predictions for the coming week

So, with all that said, let’s head back to the spreadsheets and give our top 10 predictions for the coming week:

Twitter Excitement Index (TEI)

Finally, in the tradition of all good TV shows, we’ll leave you with a teaser for the coming week. Based on work by Brian Burke at Advanced NFL Stats, we have developed a measure we’re calling the “Twitter Excitement Index”, which essentially measures the peaks and troughs in Twitter conversation to establish a measure of excitement. That is, a show may be averaging 3000 tweets / minute, but have a time series graph which hovers slightly above and below the average over the course of an episode – in this case, it has attracted a large twitter audience but that audience doesn’t seem to be particularly provoked by the content of the show to tweet. By contrast, a show averaging 300 tweets / minute may be spiking continuously throughout the episode, showing that people were reacting to the events as they happened (as we have seen, for example, with Big Brother 15 in our previous work). By again stripping out those factors related to audience size, we are producing a metric which analyses the reaction of Twitter users to the content of the show. But, more on this next week..