On my way to become a Tableau Desktop Specialist we will finish the first topic this week: Connecting to & Preparing Data. Therefore, we look at data connections and how to manage data properties.
This blog is a part of the “Becoming a Tableau Desktop Specialist Series”. Check out the further posts to learn more about it:
- Becoming a Tableau Desktop Specialist: Introduction
- Becoming a Tableau Desktop Specialist – Part 1: Create and Save Data Connections
- Becoming a Tableau Desktop Specialist – Part 2: Connecting and Preparing Data
“Modify data connections”
There are many different ways to modify your data connections. And we distinguish between three main types:
Let’s start with joins. A join offers you the opportunity to combine information from multiple tables. The combination has to be based on at least one shared field. You can choose between different join types. These types have an impact on how your data will be combined.
I found the following overview on the Tableau website. Moreover, the Tableau website offers a free training video that provides a lot of information. This is why I think it is not necessary to write the facts down again.
A union is another possibility to combine your data. But, compared to joins, unions add rows and not columns. This is why unions are described as vertical and joins as horizontal. Unions make the table longer and joins make the table wider. For unions, the tables should have the same number of columns. Moreover, the field names and data types should match.
Important to know is that Tableau always follows the behavior of ‘union all’. The above-listed video includes the following explanation of ‘union all’:
“All rows will be returned in the union, even if there are duplicate values for some rows.” In Tableau, each partition can be identified based on the automatically generated dimension ‘Table Name‘, “which indicates metadata about the union’s source.
It’s worth noting that if field names don’t match across the unioned data sources, Tableau will function similarly to an Outer join, appending the new rows and new columns, with nulls as needed. If the new columns should be considered the same field, the columns can be merged in the data grid.”
Another source I can highly recommend is this blog from Ken Flerlage.
A blend brings your data from different data sources together into a single view. With a blend, Tableau will send separate queries to the data sources and then aggregate the results. Therefore, Tableau needs at least one common field between the data sources. Tableau recognizes fields with the same name, but you can also manually define a relationship.
While working with blends you always have to take care of which data source you put first into the view. The first one is your primary source. The other one is the secondary source. The primary source has a blue checkmark next to the data source and the secondary source shows an orange checkmark.
There are two more symbols you should know: The red link and the grey broken link. The red link shows the common field (or fields) which is (are) used for the blend. The grey broken link marks all potential but currently unused fields. You can change this by clicking the grey link or red link.
You can change the primary and secondary sources when using blends for every worksheet.
Differences Between Joins and Blends
Now that we have an overview, I want to go more into detail and discuss the differences between joins and blends.
The first difference is that joins combine data from the same data source and blends combine data from different data sources. This is easy to understand and to remember. Moreover, a blend doesn’t combine the data on row level. Tableau will always send separate queries to different data sources.
During my research I found the following statement from Tharashasank Davuluru:
“A blend is a post-aggregate left join. That is, the data on the secondary source is grouped by the dimension being joined, and then left-joined with the primary.”
This is what I really want to understand. Let’s go through this statement step by step.
Secondary source: we remember that the first data source we bring into the view is the primary source and the other one is the secondary source.
Group by: I found a good description on Ken Flerlage’s blog. He says that ‘group by’ “allows us to specify the level of detail at which we wish to aggregate our data”.
In our case, the dimension for the ‘group by’ is always the dimension that is being ‘joined’.
left-join: So, after the aggregation our data will be joined. And we have already learned the characteristics of a left join.
“The left part of the join means members of the secondary that have no equivalent in the primary do not get put in the results, but if there are members of the primary without a corresponding member of the secondary, they are joined (or blended in this case) with NULL.”
For this topic, I can highly recommend watching this presentation from Marianne Shaw and Dmitry Chirkov from Tableau Conference 2017.
I already told you that joins combine data from one data source, but there is also an option to use a join for combining different data sources: It’s called cross-database join.
Cross-database joins are similar to blends. Both can combine data from a different data source, the difference is when the (left) join happens. We have already learned that a join happens on row-level and the same applies to cross-database joins. Row-level joins can end up with a replication problem. This can cause incorrect aggregations if your row level of detail and your viz level of detail is not the same.
A blend will aggregate the results from the secondary data source first and then join the data sources. This prevents duplicate values when your data sources have different granularity.
If you think you are on the safe side if you always use data blending I have to disappoint you. In some cases, you can’t use blending and have to choose the cross-database join. Again, a blend tries to aggregate over the linking fields before it joins the data. But what happens when you try to aggregate strings is that Tableau will show you a star (*), because there is no way to aggregate two different strings.
Another problem of data blending is that you cannot chain together data sources. A blend needs a single primary source because the primary source will be extended by the multiple secondaries. Furthermore, you cannot create extracts over multiple data sources.
The last thing I want to add is the fact that blends always use a left join. When you work with cross-database joins you can choose between the different join types.
- Tableau free training video: Cross-database join
- Difference between blend and join by quora
- The application of a blend by edureka
Approach in Tableau
To create a join in Tableau you connect to your data source and drag the first table into the field. Then you double click on the second table. When connected to a database you can also use custom SQL queries and join them.
Click on the circle to choose the join type and the field(s) on which the join should base. It is also possible to define join calculations.
A simple opportunity to create a union is to drag a table into the field and then drag the second table under the existing table.
For tables that are not shown in the left pane, you can define the union manually or you can use the wildcard search. For both the first step is to click on ‘New Union’ on the left pane on your data editing interface. More detailed information you can find here.
To blend your data you have to add a new data source first. To link the data you can use the above-listed symbols (red link & grey broken link) like described above.
To make a cross-database join you have to add a new connection first. And then follow the same steps that I have explained earlier for joins.
Enough talked and written about joins, unions and blends. Let’s head over to the next topic!
“Manage data properties”
Now we want to take a look at our opportunities for managing data properties. I will show you how to do this in Tableau for the following examples:
- Rename a data field
- Assign an alias to a data value
- Assign a geographic role to a data field
- Change data type for a data field
- Change default properties for a data field:
All your changes do not influence the underlying data because Tableau is only referencing the data (remember last week’s post on changing metadata). Setting your data properties cleverly is something that can make your work in Tableau much easier :).
To be honest I didn’t care about data properties until last week. So, last week I worked on a dashboard and from my point of view and my actual Tableau level this dashboard included a lot of calculations. One of my calculations only showed ones and zeros and I thought I made a mistake. It took me over forty minutes to recognize that I only had to change the number format to percentage to get the right value. Klaus then showed me how to set the number format in the default properties to not run into this issue again.
Well, today I think setting the ‘default properties‘ before creating the view is just a smart way to save me from doing the same formatting over and over again.
Rename a data field
By renaming your data fields you can ensure that all names are unique and easy to understand.
Assign an alias to a data value
Aliases give you the opportunity to rename individual members of a dimension to be shown in the view. But, be careful when using these members in calculations because Tableau draws on the original data. So, if you create e.g. an IF-statement you must indicate the original name of the member.
Assign a geographic role to a data field
If Tableau doesn’t recognize automatically that a data field is geographic, you can define the geographic role manually.
Change data type for a data field
You can choose between different options to change the data type.
Furthermore, it is also possible to change the data type in the data source.
Change default properties for a data field
For measures you have five options to edit a data field’s default properties. The first option is to write a comment which will be shown when hovering the data field in the data pane. You can also define a colour and a number format. Furthermore, you can set the default aggregation and determine how totals will be computed.
For dimensions the options are slightly different. It’s possible to sort your dimension, to set a default shape, and to set date properties.
Additional Information: this website lists a lot of symbols and explain their meaning.
Modifying data connections is such a big topic, but I hope this blog gave you a good overview and mentioned enough additional resources to dig more into the details.
I am happy about your feedback, you can reach out to me on Twitter or in the comments below. Stay tuned until next week and the next episode of ‘Becoming a Tableau Desktop Specialist’!