Introduction
This article is intended to bring a user up to speed on steps and usage of tools from importing data from a data source to the data warehouse and then to tableau environment. It assumes that data warehouse server and tableau server has already been setup and users in each environment has been setup appropriately. it also assumes one is already somewhat familiar with SQL and Visual Studio.
Overview of moving data from source to data warehouse
The overall process of moving data from any source to data warehouse happens in multiple steps.
- First the data is copied from source database to DATA WAREHOUSE source DATABASE.
- Data is then copied from DATA WAREHOUSE source to DATA WAREHOUSE staging DATABASE table.
- After viewing data warehousing requirement, data is then copied from DATA WAREHOUSE staging to DATA WAREHOUSE permanent location. In this step data is split into dimension and fact tables. Dimensions tables are data that will be used to group items while facts tables contain actual data.
- Finally views are created that joins dimension tables and facts tables to fulfill business requirements.
Tools for moving data from source to data warehouse
Visual Studio and MicrosoftSQL Server Integration Services (SSIS) plugin is used to move data from an external source DATABASE to the DATA WAREHOUSE. In VISUAL STUDIO, control flow is setup to import/export/query data. Control flow is placing and organizing various objects to control the flow of data in VISUAL STUDIO. For example a step in the control flow in VISUAL STUDIO is to truncate a table in DATA WAREHOUSE staging first before importing data.
The “Data Flow Task” object contains more objects within it. It contains the data connection for source DATABASE and DATA WAREHOUSE staging DATABASE.
The screenshot to the right shows Data Flow screen shot in visual studio showing the
connection between source database and data warehouse database.
After setting up the control flow and the data flow steps in visual studio (these are saved as packages), the packages can be executed in VISUAL STUDIO to copy the data over to the data warehouse. Upon success, the package can be saved into the SQL server so it can be run on a schedule.
So after coping data to the DATA WAREHOUSE source table, data is inspected for errors, etc. If all is good, the data is then copied into the staging database. The program Microsoft SQL Server Management Studio is used to prepare the SQL for coping the DATA WAREHOUSE staging area. After the SQL is tested it can then be pasted into VISUAL STUDIO data flow packages for execution.
Then eventually data is copied into the DATA WAREHOUSE database in usually two or more tables comprising of dimension tables and facts tables. First evaluate the business requirement and then prepare the SQL in Microsoft SQL Server Management Studio. Setup or prepare the dimension table SQL query first and fact table second. Organize these in VISUAL STUDIO as packages then run the packages.
Finally views are created to fulfill the original business requirement. Using Microsoft SQL Server Management Studio, create view(s) joining dimension and facts tables together along with data filters specified in the business requirements. At this point the data is ready to be used by Tableau publishers.
Accessing data in tableau
In tableau desktop there are multiple ways to access data. To access data in the DATA WAREHOUSE stored on a Microsoft SQL Server, in tableau desktop, click on Microsoft SQL Server and log in to the appropriate DATA WAREHOUSE server.
After logging in to the data warehouse database, usually a view from the data source is chosen to build complete tableau sheet or dashboard.
But for evaluating data, a fact table or dimension could be chosen at first.
Tableau also has great support help web page.
Leave a Reply