PROBABILITY AND DATA SCIENCE

Oluwafunmilayo C. Sofuwa
3 min readOct 18, 2020

Leveraging data with data science and analytics is becoming more important across industries. We live in a world of technology and with growing developments in the technology, data generated by individuals and corporations keep increasing.

How then does probability affect data science and analytics?

Probability just has to do with the likelihood of an event occurring or not occurring. In the world we live in, we apply probability in our daily lives without even realizing it. Nothing is 100% certain. There is always some form of uncertainty. Probability is usually measured between 0 and 1 with 1 being the highest, that is, 100% certainty.

Some examples include the likelihood of being selected for a job or to a school you applied to, the likelihood of getting to a particular place early considering the flow traffic or even the likelihood that someone you really like would like you back. What if there was a way to measure the likelihood of these events? Probability is the way to go.

It is at the core of decision making, applications that are developed, technology advancements etc. By using Google maps to find out how long it would take for you to get to a certain place from your current location depends on probability given certain conditions e.g. weather conditions, traffic, speed of driving etc. It is never 100% accurate all the time but it does give a close enough estimate.

Let’s take a look at some basic concepts in probability along with some examples.

Random experiment: An experiment or a process for which the outcome cannot be predicted with certainty. It is called random if it has more than one outcome e.g. heads or tails in a coin toss.

Empirical probability: Probability that is calculated by performing an experiment one or more times. It considers the number of successful outcomes and the number of times the experiment was carried out E.g. what is the probability of making a basket after tossing a basketball into the hoop. To do this, you can toss the basketball for example, 500 times (the number of times the experiment was carried out) and count the number of times you made a basket, say 245 (the successful outcome). The probability of making a basket will be 245/500 = 0.49. For better understanding and readability, convert your answer to a percentage; 0.49 x 100% = 49%. Therefore such an individual has a 49% chance of making a basket based on his/her current skill level.

Theoretical probability: In real life scenarios, we can’t always perform experiments multiple times. Theoretical probability looks at the number of successful outcomes and the total number of possible outcomes. E.g. In a candy jar, there are 200 candies; 65 blue, 80 red , 55 purple and 90 yellow. What is the probability of choosing a red candy?

Total number of possible outcome = 200

Number of red candies = 80

Probability = 80/ 200 = 0.4 x 100% = 40%

The probability of choosing a red candy then is 40%

In data science, probability plays a huge role in machine learning and predictions, deep learning etc. For example, in predicting the profit a company, previous as well as present data is utilized when building a machine learning model. By considering this data, profit can be predicted for the next month(s) or even year(s). This prediction gives us a likelihood of what the profit might be for the period being predicted for and to what extent such predicted values are accurate.

--

--