Oluwafunmilayo C. Sofuwa
7 min readJun 23, 2019

--

An analysis of the FIFA World Cup

What is football to you?

Pelé, regarded by many as the greatest player of all time describes football as “a beautiful game”.

I do not watch football regularly but I do follow the FIFA World Cup. I would describe football as a game of emotions, friends and family. There are always various emotions that people experience when watching football which include but are not limited to happiness, anger, sadness, tension.

The FIFA World Cup was founded in 1930 and has been held every four years ever since with the exception of the 1942 and 1946 World Cup which were cancelled due to the Second World War. Wikipedia states “The World Cup is the most prestigious association football tournament in the world, as well as the most widely viewed and followed sporting event in the world, exceeding even the Olympic Games; the cumulative viewership of all matches of the 2006 World Cup was estimated to be 23.29 billion with an estimated 715.1 million watching the final match, a ninth of the entire population of planet.”

THE DATA-SET

One of our final projects at Gitgirl was to analyze the FIFA World Cup data from 1930–2014 with the aid of Tableau and create a dashboard. I would be taking you through a step by step of how I analyzed my data. Before I could begin analyzing my data I had to understand the data-set. The data-set had three sheets; the first and second sheets were similar with data such as the year, date, time of the match, round, city, country, observation, stadium, home and away teams and as well as the home and away goals. The third sheet had data on year, country, winner, runner up, third, fourth, total goals scored, teams qualified, total matches played and total attendance. I ended up using the first and third sheets because the total goals scored seemed closer. I had to do some minor cleaning of my data to remove symbols from the names of the stadium and remove duplicates before I could start analyzing my data.

DATA ANALYSIS

To begin analyzing my data, I had to come up questions I wanted to answer as well as visualize.

The questions I went with are discussed below:

Here is a link to my visualization: https://public.tableau.com/shared/GFNZ5GNCG?:display_count=yes&:origin=viz_share_link

Which countries have won the FIFA World Cup and how many times have they won?

To determine this, I created a new group from winners which I renamed World Cup Winners. This is because I had to group Germany and Germany FR as one. Thereafter I dragged World Cup Winners to the column label to show the names of the winner and Winner to the row label as a count to show how many times each country won the FIFA World Cup title.

As at 2014, Brazil had won the FIFA World Cup title five times; Italy and Germany four times; Uruguay and Argentina twice; France, Spain and England once. Hovering or selecting any of the bars will bring up a tooltip which contains information. I linked my visualization on what year(s) these countries won the FIFA World Cup title to my tooltip. Here the information that will be shown is about the name of the country, how many times they have won the FIFA World Cup title between 1930 and 2014 as well as visualization on what year(s) each country won the FIFA World Cup title.

What year(s) did these countries win the FIFA World Cup title?

I wanted to further to explore the specific years that each country won the FIFA World Cup title(s). To visualize this, I dragged World Cup Winners to the column label and Year to the row label. I also added a color mark to distinguish between each of the eight countries that have won the FIFA World Cup title. The tooltip, which comes up by hovering or selecting any of the data points states the winning country, the year they won and the host country.

Brazil is the most successful team to win the FIFA World Cup title. What are their statistics at the finals over the years?

Brazil is the most successful team at the FIFA World Cup with five World Cup titles and according to Wikipedia, Brazil is the only team that has played in every World Cup tournament. For my visualization, I had to convert my year dimension from continuous to discrete. This is because leaving my year as a continuous data messed up my visualization and also I wanted my filter for year to be a single value drop-down menu rather than a slider. From the round dimension, I created a group called stage where I combined all the groups (Group A, B etc.) into one and also grouped play-off for third place which had multiple names in the data into one. I also wrote two SQL queries to determine if the match played ended in a win, tie or loss and named it match results for home team and match results for away team. I did this because some teams such as Brazil played as the home and away team in different matches and I wanted to get the full scope of the match statistics. For Brazil match statistics as the home team, I dragged home goals which automatically aggregated as a sum to the column label and year, stage, home team, away team, match results of home team onto my row label. I used year, home team and stage as my filters where year was filtered to all, home team to Brazil and stage to final. For Brazil match statistics as the away team, I repeated the above in another sheet with the exception of using away goals rather than home goals in my column label and match results for away team rather than match results for home team. . I used year, away team and stage as my filters where year was filtered to all, away team to Brazil and stage to final.

SQL Query for Home Team
SQL Query for Away Team

The visualization shows the years Brazil won, tied and lost their final match as well as what they scored against their opponents. Brazil won the World Cup title in 1958, 1962, 1970, 1994 and 2002. In 1994, they tied with Italy but went on to win against them with penalties. Hovering or selecting any of the bars will bring up the tooltip which shows information on the year, host country, stadium, home and away teams, home and away goals, stage, match results for the home or away team and comments if any(for example penalties).

Is there a correlation between sum of goals scored and number of qualified teams at each FIFA World Cup?

Finally, I wanted to determine if there was any correlation between the total number of goals scored and the number of teams that qualified in each World Cup. To visualize this, I simply dragged goals scored to the column label and qualified teams to the row label. The number of teams that qualified for the FIFA World Cup between 1930 and 1978 ranged between 13 and 16. This number increased to 24 between 1982 and 1994, which further increased to 32 in 1998 and has been the format that has been used since then. However, the number of teams that will qualify to play at the FIFA World Cup will increase from 32 to 48 and will be implemented in the year 2026.

The visualization shows that there is a positive correlation between total goals scored at each World cup and number of qualified teams. With every significant increase in the number of teams qualified to play at the World Cup, the total number of goals also increases significantly. My tooltip shows the number of qualified teams, year, host country and total number of goals scored.

PUTTING IT ALL TOGETHER

My final objective was to merge my visualizations into one sheet as a dashboard. Creating a dashboard was relatively easy. I just had to drag the sheets I wanted into the dashboard, rearrange them the way I wanted and just carry out some minor tweaks. One of the minor tweaks I did included formatting my home and away team sheet names so that whichever home or away team is selected in the filter updates in the sheet names as well, for example if I choose Italy as my home team, the sheet name will update from “Brazil World Cup Statistics (Home Team)” to “Italy World Cup Statistics (Home Team)”. I named my dashboard, applied the year and stage filter to both my home and away team sheets so that whatever year or stage is selected applies only to those sheets. The home team filter affects just the home team sheet and the away team filter affects the away team sheet only. To get the best result or full scope of each country’s statistics, I stated that the home and away team filters should be set to the same country.

I hope you enjoyed this because I sure enjoyed working on this!

Till next time,

Bye.

--

--