Sunday, 13 October 2013

UK House Prices vs. Household Incomes

There seems nothing the British press likes more than a good house price story. Both the OECD and 'The Economist' studies quoted in The Telegraph recently use the house price to household income ratio as a consideration of affordability and sustainability of the market. Most often this is a ratio of average house prices to average incomes; I keep wondering if this ratio is itself a function of income? What follows is a first (and not that rigorous!) look at this idea.

To simplify I want to assume the top 1% of earners will consider the top 1% of houses and so forth down to the bottom 1% of earners who will be assumed to pay the bottom 1% of house prices. To approximate this I compare the percentiles of the two distributions. Of course, this is all a gross oversimplification but it does provide a tractable starting point and the results are certainly interesting.

For the income data I used the ready prepared figures from an excellent Guardian article published last year. The data is available here. The details on how the data is 'equivalised' are all on the Guardian site and in the original source. Let's take a quick look at the percentiles of the income data:
It looks a bit different to the Guardian's graph. I think this is because I'm looking at gross income instead of net.

The house prices took a little more work but fortunately the Land Registry has a complete record of house transaction prices. Here are the percentiles of their distribution between 2010-2011 (with the period chosen to match the income data):
To the naked eye the distributions look fairly similar and pretty much as we'd expect. To get a better sense of things we need to look at the ratios:
Note: The 1% point is out of view at c. 23.
Whilst the ratio is fairly constant, at around 8, some variation is apparent. The interesting things are really happening at either end of the graph. At the low end perhaps there is some minimum price for which a house can reasonably be built, if the same hard minimum didn't apply to household income at the low end the ratio would spike up as we observe here. From around 10% there is a steady increase in the ratio. I have two ideas: perhaps percentage of income available to spend on a house increases with income or perhaps the amount of 'bubble' in the market increases as one heads towards 'prime investment opportunities'?!

As always, thanks for reading and I'd be grateful for your opinions!

For those interested please find the R code below or on github.

Monday, 10 June 2013

Microsoft Office Metadata with R

Sometimes I need to retrieve various items of metadata from Microsoft Office files. For the 'old-style' (i.e. '.doc' and '.xls') files perhaps a solution in python, such as hachoir, was the best way to extract this data from the ole2 file format - although perhaps it was always possible in R too? When I started digging around for a similar solution for the 'new-style' (i.e. '.xlsx' and '.docx') files I was pleasantly surprised to find the file structure is much more open, indeed it is called Office Open XML. I am by no means an expert but basically it is a zipped set of xml type files. This makes getting at the metadata so much easier. I found a simple example in python by zeekay on stack overflow. My code below is an unashamed replication of this in R.

Friday, 26 April 2013

The Financial Crisis on Tape Part II

EDIT: I am aware of some browsers failing to load the github code below. I will try to improve this as soon as possible. Until then it may work better on than at Finally all the code is available at github.

First, I would like to welcome all of you reading this via R-Bloggers. I have been a user of this superb resource for years and I'm proud that I now have the opportunity to contribute!

In The Financial Crisis on Tape Part I I demonstrated how inter-sector and inter-asset correlation changed during the Financial Crisis by using a heat map to visualise such correlations. Today, I will use these heat maps as individual frames to create a video. I hope the video will provide a nice trailer to the Financial Crisis if not completely catching it on Tape.

As usual, some of my readers may be more interested in the finance-side than the R-side of this post (although I hope to inspire as much migration from excel based analytics as possible). Therefore, I like to provide the results at the start. I know it may not be as humorous as some you tube clips, but it is only 40 seconds long. So, enjoy!

The following assets are shown in the video:
SPY: S&P 500
QQQ: Nasdaq 100
EEM: Emerging Markets
IWM: Russel 2000
EFA: EAFE (Europe, Australasia and Far East)
TLT: 20 Year Treasury
IYR: U.S. Real Estate

The chart on the right is hopefully familiar enough; it is a heat map showing the correlation between a number of assets. As the video progresses, the six month calculation period for the correlations moves forward in time. We start with a heat map of the correlation computed over the first 6 months of 2005 and end with the heat map of the correlation computed over the most recent 6 months of data. The changes in the correlation structure - especially the increases in correlation during periods of stress - should be evident.

The graph on the left is intended to provide the context, as some of us may not recall the exact market movements of the assets since 2005. The grey box highlights the six month period of each asset's time series which was used to compute the correlations shown in the heat map to the right. This should make it easier to link the changes in the correlation to market conditions. The coloured dots represent the last day of this backward looking window.

If you're still reading, I guess you might like to know how the video was made? You create all the individual frames as png files using R. Then you combine the frames into an mpeg using ffmeg (windows download). Nothing easier than that?!
It's really not as complicated as it sounds. For more detail, please refer to this previous R-Blog - thanks Civil Statistician. The animation package could provide a nice introduction too. But now let's take a walk through the R-code to see how to make the individual png files.

First, we need to load the data. I have used the QuantMod package and I'd like to thank Systematic Investor for his useful posts on the subject.

Next, we compute the historical returns. These historical returns will be used to compute the correlations. We also need to keep the price time series to produce the graph on the left of the video. Finally we massage the results a little to make the calculations a easier and get the assets plotted in the desired order.

Now we have all the data we need to compute the rolling-correlations. The comments within the code provide details for each step.

Then, the fun part: plotting! This relies heavily on ggplot2. Perhaps unsurprisingly PlotCorrelsForThisDate takes a date as its argument, looks up the correlation data (which we obtained earlier) and plots a heat map.

We create the price time series graph with the ProduceFilmStill and plot it next to the correlation heat map using the gtable package to line everything up nicely. Finally, we save the plot as a png in a directory. Using numbers rather than dates to name the files makes using ffmpeg super-easy because the argument of ProduceFilmStill is an integer referencing a position in and not the date itself.

To finish, we run this function for all the dates for which we have data and use the ffmpeg command shown (at the command line) to create the mpeg. Done!

And well done for making it this far; I hope you enjoyed it. As I'm still new to blogging, any questions or comments, especially suggestions how to improve my posts, are much appreciated!

For further reading, here's the link to the static version of today's video from The Financial Crisis on Tape Part I. It's worth a read if you haven't seen it yet. While it doesn't show the changes in correlation with the same granularity as the video, it might be easier to compare disparate periods of time when one can see them all at once. You be the judge!

Sunday, 3 March 2013

All That Glitters

"The law itself follows Gold" Sextus Propertius.

But what about stocks, bonds and real estate? Do they follow Gold too? Using the correlation data from my previous post, The Financial Crisis on Tape Part I, this question is easy to investigate. Indeed, it's possible to see the correlations in last week's graph:

where GLD is a Gold tracking fund. Fortunately, the excellent ggplot2 package means that in two slightly chunky lines we can create a bespoke heat-map to answer this question and we don't have to strain our eyes finding them in the graph above!
SPY: S&P 500
QQQ: Nasdaq 100
EEM: Emerging Markets
IWM: Russel 2000
EFA: EAFE (Europe, Australasia and Far East)
TLT: 20 Year Treasury
IYR: U.S. Real Estate
Here are the two lines used to create it (using data obtained in The Financial Crisis on Tape Part I):
In quieter times it seems that Gold is reasonably un-correlated to the other sectors, but in times of stress, strong correlations appear as expected. The really interesting point is the strong negative correlation seen in Q3 2008. Let's take a look at the price time series of the S&P, Nasdaq and Gold during '07-'08:
In terms of the units, these are the prices of representative funds not the assets themselves (see the links to Yahoo Finance which I've provided above). As the prices are similar and fit nicely I've not used any scaling. The negative correlation is probably related to Gold's safe haven status. As stocks started dropping, a strong bid for Gold supported its price. This can result in a negative correlation. The graph was created using the code below.

Saturday, 23 February 2013

The Financial Crisis on Tape Part I

Hello and welcome to Joe's Data Diner's first ever post!

Today, I will touch on both R and Finance, but I'll try and make it accesible for those with an interest in either and not just Quants like myself!

Almost everyone is now aware that asset correlation increases in times of stress. This topic has made its way into the most esteemed journals and even the popular press. However, it remains a defining feature of the '07-'08 financial crisis and, for those working in the Credit markets, it remains a topic of paramount importance.

My idea of graphing the time evolution of the correlation structure between assets is not revolutionary. However, my aim here is to show how easy it is to reproduce such graphics using only the free R software.

In Part I, I will produce a static graph showing quarterly measurements of the correlation structure and, in Part II, I will create a video showing the daily evolution in the corrrelation.

As some of you may be more interested in the results than the R code, let's look at them straight away!

The assets shown are as follow:
SPY: S&P 500
QQQ: Nasdaq 100
EEM: Emerging Markets
IWM: Russel 2000
EFA: EAFE (Europe, Australasia and Far East)
TLT: 20 Year Treasury
IYR: U.S. Real Estate
GLD: Gold

Even from the small thumbnail, the increase in correlation during the crisis, and the decrease by 2013 are immediately evident. I'm certain, a detailed study could yield many more observations - please, feel free to add coments, if any you observe anything! 

However, I will now jump into the code! I'm new to blogging in general and R-blogging in particular, so I've decided well commented code is probably the clearest way to explain the majority of the process while returning to prose to discuss some of the more interesting coding decisions. 

First, let's get the data. Thanks to Systematic Investor for teaching (via his blog) me how to use QuantMod:

Next let's compute some correlations! Later we'll use a rolling window to produce daily updates for the video, but for now lets look at the correlation matrix for each quarter:
And now we're ready to use another of Hadley's packages the famous ggplot (2) to create the first iteration of the graph:
I make two small changes to improve the visual impact. First, the series are re-ordered to put the ones which tend to be negatively-correlated to one side, making for a less cluttered looking graph. Secondly, I feel that the important range of correlations from +0.5 - +1.0 is somewhat difficult to analyse as the colours are very similar. Inspired by a topographical map of Lanzarote (where I've just been cycling), I thought I'd change the very top of the correlation range to purple, increasing the definition in this correlation range:
and, finally, we have the graph shown at the top! However, I don't like how we've sacrificed the pleasant visuals of the mainly red palette to obtain this greater definition. If anyone has any suggestions how to improve this, please let me know!

I hope you enjoyed the opening night at Joe's Data Diner; tips (non-monetary) are always welcome and I hope you'll return to see the film in"The Financial Crisis on Tape Part II" in the next couple of weeks!

Sunday, 4 November 2012

Hello World!

Eventually this will be a blog about a reformed physicist's forays into data analysis in the world of finance. I'll be using R, Python, d3 and anything else I can get my hands on to serve up some tasty nuggets of data!