Sunday, 28 October 2012

Passing networks

My dad sent me a link to an eBook called Objective Barcelona : How To Beat The Most Powerful Team In The World. It was free a few days ago, sorry for not advertising it sooner! It was quite an interesting read, it's a very extended essay, rather than a book. It was very editorial, in that it didn't have much data to back up it's arguments.

For me the most interesting topic in the book, was the discussion of Sergio Busquets. It suggested that cutting off the lines of passing to Busquets from the defense, forces Xavi to drop deep to pick up possession. Sometimes they swap positions, which is good because Xavi is more dangerous higher up the field. This got me thinking about passing networks.

I've seen some diagrams of passing networks from the MCFC data before:

I'm a bit fan of simplicity, so I think a first order attempt should be just plotting the passes. I'm sure plenty have people have done this already, but it's still informative.
Zero order figure. Blue lines show assists (passes leading directly to a goal). All passes successful and failed passes are included here.

My next idea was to start rounding the xy-coordinates to boxes, effectively binning the start and end points. It's not until we start rounding to the nearest 5% (10% in the y direction for scale) that we start to see structure. For the case where the boxes are 10x20% we've probably gone a bit too far for this particular data set.

Rounding to the nearest 5x10% box.
Rounding to the nearest 10x20% box

I think it's fair to say that Bolton play more directly, even in the non-rounded plot you can see the long passes towards the attacking line. It definitely looks like most of the lines go along the left to right axis, rather than up-down axis; especially compared to Man City.

This is another diagram which will be more interesting with more data. Hopefully with more data the main lines of passing will stand out. I also think there might be ways to make this more interesting: breaking the pitch into defined boxes, more detail in midfield; player-by-player analysis; some sort of directional analysis to show if the pass is arriving or departing at a particular point.

Tuesday, 9 October 2012


A rose plot is often used in weather science to describe the direction and strength of winds. Since Opta were kind enough to include the angle of each pass (in radians no less) with respect to the direction of play, I thought I'd create some "pass roses". These show the direction and distance of each pass a player made in the game.

I'm not convinced how much you should read into these plots, but that should never stop a armchair football pundit. Can't be worse than the MOTD lot ;). In these plots 0˚ points towards the opposition goal, and 180˚ points to the defending goal.
  • Sergio Agüero mostly passes backwards, which makes sense since he's probably the most advanced player. It looks like most of his failed passes are in the (vague) direction of the goal; 4 out of 6. Also, he doesn't pass the ball more than ~23 yards.
  • Kolorov mostly passes infield (not surprising for a left back; and Richards is similar but opposite).
  • Joe Hart passes short and wide much more often than Jääskeläine. When going long, both keepers are fairly unsuccessful.
  • Jääskeläine's rose shows that I've not plotted the (successful) patch quite right, and don't have time to work out a fix for it. I put the shading in for aesthetic purposes.
Click the images for bigger versions.

I'm not sure there is much future in pass roses, the lack of initial position information makes it a bit superficial. Over the course of the season, I suppose it might show that a player can't reliably pass in a particular direction ;).  Pretty unlikely though. Anyway, it was a bit of fun.

Monday, 8 October 2012

More maps...

This time I've done some maps of "duelling". I think that duelling is a bit more difficult to classify. The supporting documentation describes which fields are used to describe which metrics. For "Ground duels won" it uses: take on, foul, tackle, smother; with the success attribute. For "Ground duels lost" it uses: take on, foul, tackle, challenge, dispossessed; with the non-success attribute. These seem a bit asymmetrical. The aerial duelling is a bit more straight forward, there is an aerial (duel) event; the success attribute describes the outcome.

To create the maps below I have taken the events: ground duels won, and aerial duels. If the outcome was a success then the position is added to a list for that team; if it was a failure I add it to the opposition team's list. I'm not completely convinced that this is robust (due to the ground duel asymmetry), but there doesn't seem to be any double counting and it gives a general idea of what is going on.

In terms of the data each time I go to histogram/bin something I'm surprised by the sparseness of the data. As I suggested previously, I'm probably trying to bin the data too finely. If the data for the full season is released then the data volume will probably be better.

Anyway a final note before the plots, I've over-plotted the location of the events with circles and pluses. This was just to convince myself I'd got the signs correct.

Sunday, 7 October 2012

Interaction maps

I have created some 2d histograms of events for each player, or interaction maps (as I'm going to call them). The events are binned into 5x10 % boxes, using "numpy.histogram2d". These are only maps of events involving each player, and don't necessarily show a player's positioning throughout the match. A true positional "heat map" can't be created from the current data set, which is a bit disappointing since I wanted to look at player positions in depth.

It's probably a bit hard to draw conclusions from a map of events, except perhaps where a player is positioning himself to be in the game. A few things stand out:
  • Kevin Davies does a lot of stuff between the midfield and defensive lines. I assume he's dropping deep to pick up the ball, or receiving long balls. From the Man City team, it looks like Lescott and Gareth Barry are most likely to be dealing with this threat.
  • The Man City midfield is quite asymmetrical, favouring the left-hand side. Relying on Richards to play quite far up the pitch. 
Click on the picture to enlarge. The number of events is given in brackets after each player's name.

Wednesday, 3 October 2012

More on pass completion

I've done some more on the pass completion histograms that I first did. The first plot shows the pass completion as a percentage for the team, and the second breaks it down to each individual player.

Thanks to Neil for suggesting a percentage; it does look a bit tidier. However you do loose the magnitude information; for example a single long range completed pass, skews the information a bit. This is exacerbated by the relatively small number of passes (not in terms of a single football match but in terms of sampling, but perhaps I'm using too many bins?). To try and keep a sense of scale I've added the total number of successful and failed passes in the legends.

Apologies for the size of the second plot, there's a lot of information there.

Figure 1: Pass completion for the teams

Figure 2: Pass completion for each player. 

Monday, 1 October 2012


I wrote a routine to draw a pitch using matplotlib. This is so I can plot the spatial distribution of a variable. This also highlighted the problem with coordinates in units of percent; when I tried to draw the centre circle it came out as an ellipse because I had changed the aspect ratio of the pitch.

Anyway, here is a collection of Andy Warhol inspired pitches...

Sunday, 30 September 2012


Annoyingly, the Opta position coordinates are in units of percent. In the x-direction zero (%) is the defending goal line, and 100 (%) is the opposition goal line. Then, in the y-direction (cross-field) the "near" side is zero (%), and the "far" side is 100 (%).

I can see the reason for using percent, rather than a typical distance unit (metres, yards, chain, nanometres), not every pitch has the same dimensions. However, it means you need to be very careful when calculating anything related to distance.

Anyway, groan....