Oluwafunmilayo C. Sofuwa
4 min readMay 1, 2019

--

Data wrangling: Seychelles travel data

Data wrangling which is often known as data cleaning is an important step in the data analysis process. Data cleaning has often been done with Excel among other tools in the past which was time consuming. The preparation of data with Excel constituted 80% of the work to be done while data analysis constituted 20%. However, with the introduction of Tableau prep, data cleaning can be easily done in a lesser amount of time than before. Our project at Git Girl for the month of April was wrangling the travel data for Seychelles from 2000–2017 with Excel and Tableau. Here is a link to the raw data: https://www.nbs.gov.sc/statistics/tourism

First things first, I wanted opened the data set to explore it as well as understand what the data was all about. The Seychelles travel data contains information regarding the country of departure as well as the continent, monthly and quarterly travel data, mode of transport(air and sea). Thereafter, I took some tutorials on Tableau prep which you can find here: https://www.tableau.com/learn/tutorials/on-demand/join-step?product=all&version=tableau_prep_2019_1_2&topic=tableau_prep

With Tableau prep, you can easily create a union between two or more data set if they have similar structures or headers or fields. You can also join two or more data if they have at least one common field. The pivot step provides you with the ability to convert your data from columns to rows.

The raw data set is readable in Excel as you can see in the screenshot below, however, connecting your data to Tableau prep in that format would make your data unreadable and hard to clean. The data set had two data tables on each sheet(the visitor arrivals by country and visitors arrivals by purpose). I did a basic cleaning on the data by getting rid of the arrivals by purpose table and focused on the arrivals by country. This is because joining these two data tables in Tableau prep would not be possible as they do not have common fields(Data was either collected either through continent and country or through purpose of visit). I also got rid of the grand totals and had to replace the blank cells with 0. To do this, I highlighted the area where the figures are, selected the ‘Go To Special’ function which is in the ‘Find and Select’ function under the ‘Home’ tab. I selected blanks, then used the replace function to fill the blanks with 0. I also defined my headers properly by removing any data above my headers.

The table format for certain years (2013–2017) was quite different from the previous years which can be seen in the image below. I had to restructure it so that they had the same format as the previous years.

I also added another column(country of destination) to my table to make my table more understandable. Below is what my my excel sheet looked liked after the basic data cleaning and prior to cleaning my data with Tableau Prep.

Data prior to Tableau Prep cleaning

I initially struggled with cleaning my data with Tableau Prep. I did a lot of unions and cleaning steps to merge all 18 years which although is not bad but is time consuming. Here is what my Tableau Prep flow looked liked originally:

I had already produced my output when I realized my mistake and had to do it all over again. This time around, I used the data interpreter and a wildcard union rather than single table to combine the different sheets of data. A data interpreter can be found when you connect your data to Tableau Prep and it helps to clean your data properly by defining what the header and data should be. A wildcard union is especially useful when you need to combine data with similar structures that spans across several sheets in Excel.

This allowed me to clean my data quickly and effectively. After connecting my data to Tableau Prep, I cleaned my data by removing certain tabs, renamed some tabs, used the automatic split function on table names to derive the years etc. Thereafter, I used the pivot step to convert the month and mode of transport from columns to rows. This makes the data easier to analyze and visualize. I further cleaned my data by renaming the tabs I pivoted as months and mode of transport. The final step was to create my output which I saved as ‘csv’ rather than ‘tde’ or ‘hyper’. This is what my data looked liked after cleaning my data with Tableau Prep:

That’s it for this project. Hope you had fun!! If you have questions, you can leave your comments. Till another time, Happy Analyzing!

--

--