Ludovic Tavernier and I have spent quite some time on our collaboration project about the rise of tennis legend Boris Becker. One key element of our viz is something that we have called ‘bump trees‘, a two-layer chart that is combining (horizontal) tournament trees in the first layer with (vertical) bump lines to connect the different tournament trees of a player in the second layer (click the image to play with the interactive version and to read our entire data essay on Tableau Public).
The first step to create such bump trees is to build the tournament trees. In this post we want to share the steps and the process we went through to do this.
For our visualization we used Jeff Sackmann’s huge database (find the data here).
Let’s have a look at the data and at one randomly chosen tournament:
Most tennis tournaments are organized as single-elimination tournaments.
What does this mean?
Look at the draw_size in column D: it is 32, meaning that there are 32 players in the tournament. Then look at match_num in column G: the maximum match_num is 31. You need 31 matches to play-off 31 losers and the remaining lucky one who wins the tournament. This rule is valid for all draw-sizes.
The Data Prep
To create our tournament trees we first needed to do some data prep. Notice in the data, that each record has a winner and a loser. In our tournament tree we wanted to display both player types individually, therefore we had to pivot the data and got a field [Player Type]. Based on [Player Type] we were then able to calculate a [Player Id] and a [Player Name].
We also densified the data to get curved lines in our trees using this model:
This step added the field [Position Type] to our data and plotted 49 points for each record to get enough points to plot the shape of the curve.
That means for example: after data prep we had for each draw size 128 tournament the following number of records:
127 x 2 (Player Type) x 49 (Position Type) = 12,446 records
When we started working on the model to calculate the tournament trees, we quite soon realized that the whole flow and every position for every player id was in the match_num.
The challenge was to convert this first draft into a scalable algorithm.
And that’s the result:
So let’s walk through this model step by step:
Step 1: Nb Round and Round Id
First of all, the positioning is depending on the different rounds of a tournament:
To calculate [Round Id] you first need the total number of rounds [Nb Round].
In a single-elimination and exponential tournament structure the number of games per round grows exponentially: 1 game in the final, 2 games in the semis, 4 games in the quarter-finals, etc. That’s why you can use the logarithm to calculate [Nb Round].
In a 128 draw size e. g.: 127 (max. Match Num) + 1 = 128 players => 2^7 => 7 rounds.
The [Round Id] for every match then can be calculated based on [Nb Rounds] and [Match Num].
We will need [Round Id] and [Nb Round] for the positioning on the x-axis and the y-axis.
Step 2: Position
The position of a player in the tree is depending on three factors, on [position.Round], [position.Match] and [position.Player].
[position.Round] can be calculated based on [Nb Round] and [Round ID]:
To calculate [position.Match] we first need a Match Id by round [Match ID.Round] that can be calculated based on [Match Num], [Nb Round], and [Round ID]:
This enabled us to calculate the position for each Match:
The positioning of a player is depending on the round where a match takes place. For the first round we can position the players randomly. Thus we chose the [Player Id] and built an if-statement calculating a 0 for the player with the lower Id and a 1 for the other.
The positioning of a player in the next rounds is determined by the positioning in the first round. This position can be taken from the [Match Num] by calculating the minimum [Match Num] for every player as [Min_Match Id.Player]:
To have a consistent tournament tree a player with a lower [Match Num] in the first round always has to come first.
Based on this we were finally able to calculate [position.Player]:
[position] finally is just the sum of [position.Round], [position.Match], and [position.Player].
Resulting in these exemplary values:
Step 3: Y
Based on [position], [position_next], and [Sigmoid] we were now able to calculate the Y-values for our tournament trees.
[position_next] is telling our two lines for each player of a match where to go:
Using ‘AVG’ lets the lines end centrally between the two players.
Last step to calculate the Y-values is to bring in the Sigmoid function:
Bringing it all together is calculating our Y-values.
Step 4: X
Based on the [Round Id] and our [Position Type] (this densification thing) we can calculate the positioning on the x-axis.
If we had only [Position Type] on columns, our flows for the different rounds would all be stacked on top of each other:
To ‘unstack’ our flows, we have to multiply them by [Round Id] and by the amplitude between maximum and minimum [Position Type] which is 12 in our case (6 – (-6)):
We also added a parameter [Gap for Text] to bring in some space between the flows for our labels.
Here you can see densification at work:
That’s it! Works perfect for all 16/32/64/128 draw sizes in single-elimination & exponential tournaments!
We hope you enjoyed reading and find own use cases for this! 🙂 You can find the workbook on Tableau Public.
Ludovic & Klaus