Let’s start with the first skill measured in the Tableau Desktop Specialist Exam: Connecting to & Preparing Data. This week we will learn how to create and save a data connection.
This blog is a part of the “Becoming a Tableau Desktop Specialist Series”. Check out the introduction to learn more about it:
- Create and Save Data Connections
“Create a live connection to a data source”
First of all, when you open Tableau, you can see on the left side your options for connecting a data source in the Connect Pane. You can choose between connecting to a file or a server (click “More…” to see all opportunities). On the bottom of the connect pane you can find the saved data sources. (Found on this Tableau Help page.)
“Explain the differences between using live connections versus extracts”
Next skill measured is to explain the differences between using live connections versus extracts. For this topic, there are a lot of information available and authors explain it very well. I couldn’t find better words for it so, let me just quote the essence of what I’ve found and show you where you can find more.
“Live connections offer the convenience of real-time updates, with any changes in the data source reflected in Tableau. But live connections also rely on the database for all queries. And unlike extracts, databases are not always optimized for fast performance. With live connections, your data queries are only as fast as the database itself.” – Diego Medrano
“A Tableau data extract is a compressed snapshot of data stored on disk and loaded into memory as required to render a Tableau viz” – Gordon Rose
“When you create an extract from a local file (such as a .csv or an Excel workbook) or an on-premise database, you’re speeding up the workbook through optimization. As a result, Tableau doesn’t need the database to build the visualization. Instead, Tableau’s in-memory data engine queries the extract directly.”- Diego Medrano
The biggest advantage of a data extract is that tableau can work faster especially when the visualization is complex and includes a big dataset, a lot of calculations, filters, etc. In return, you lose the real-time updates. This is why you have to refresh your data.
Zen Master Ryan Sleeper indicates in his book ‘Practical Tableau’ that you don’t need an extract, when your dataset is held on powerful, in-memory databases. In this case, working with live connections is recommended.
In My Words
The explanations include two words I didn’t understand immediately: ‘On-premise database’ and ‘in-memory data engine’. For those of you who also have no clear understanding of these terms, maybe the further notes will help.
‘on-premise’ or ‘on-prem’ refers to a database which is managed and maintained locally from a company itself. An opposite of an on-premises database is a cloud database. In this case, the access to the database is provided as a service.
When trying to find out what ‘in-memory data engine‘ means, google only gives me explanations for the ‘in-memory database‘ back. Sounds good, but I was not sure if the database and data engine are used synonymously. Luckily, I found a statement that says: A data engine is a “purpose-built analytic database for extracts”.
This is why I conclude that the explanation for in-memory databases will also fit for the in-memory data engine.
So what is this all about:
The main difference between the in-memory database and a traditional (on-prem) database is that the in-memory database is stored in a computer’s main memory (RAM) contrary to a traditional database stored on disk drives. This is the reason why the extract loads faster: the access to RAM is faster than to a disk drive.
Find my additional sources for this paragraph below:
1. Explanation of ‘on-premises’ by hpe
2. Explanation of ‘cloud database’ by wikipedia
3. ‘In-memory database’ by FlipHTML5
4. Difference between in-memory database and a traditional database by ramia
“Create an extract”
Now it is time to learn how to create an extract in Tableau. There are two options to create an extract. The first one you can see at the upper right corner on the data editing interface.
For the second one, you have to open the authoring interface. Now you can click on the data source and choose ‘Extract Data…’. This option also offers opportunities to filter your data.
After creating the extract, the icon changed from one cylinder to two cylinders with an arrow on it. Every time you see this symbol you now know that the data has been extracted into a faster data engine.
Find all the information about creating extracts also in this video from the Tableau Website.
“Save metadata properties in a TDS”
Finally our last topic for today is how to save metadata properties in a TDS.
What does ‘metadata’ mean for our work in Tableau?
First of all, we need to know what metadata exactly means. In general, metadata defined as “data that provides information about other data”.
Tableau makes a great job and recognizes a lot of details about our data. For example, Tableau often allocates the right datatype like the geographic role automatically.
But there are still some things for us to do to organize our data. I found the following list for the most popular examples:
- Rename variable
- Group variables into folders
- Calculated Fields
- Parameters (Note: A TDS file will only save a parameter if it is referenced by a calculated field)
- Dimension or Measure
- Discrete or Continuous
- Data Type
- Geographical Role
- Default Properties (This includes comments, color, number formats, aggregation and totals)
A TDS file will save all your changes on the metadata.
Let’s come to the main topic we want to discuss: How to save the data source with all the metadata. Therefore, you can click on the data source itself and choose ‘Add to Saved Data Sources…’.
Now Tableau will save the connection as a .tds. If you chose the default directory, Tableau would show the data source on the bottom of the connect pane. But you can of course choose any other directory to save your .tds file.
Note: “This is not the data itself, just the information about the connection”. The consequence is that if you change the place where you store the data, the connection won’t work anymore. You have to locate the original data source in your .tds before you can start working with it.
Publish data sources
Another option to save your data sources is to publish it to a Tableau Server or Tableau Online. Therefore, you click on server-menu and say publish data-source. Then choose the data source you want to publish.
A detailed description of how to publish a data source you can find here on the Tableau Website.
Additional Information from the Tableau Website: Look here to get a good overview of file types and folders.
I hope you gained a good understanding of this first topic. Let me know in the comments below (or reach out to me on Twitter) if you have any suggestions for improvements.
Stay tuned – next week we will go on!