I recently started my three months mentorship program with She Code Africa (Cohort 3) and it has been great so far. I took up an interest in data science and analytics some months ago and while I have been trying my hands on working with datasets from Kaggle, it’s important for me go back the basics once in a while. Another reason I’m glad to participate in this program is because I have an opportunity to be mentored.
In this post, I’ll give you a brief insight to tasks I did, new things I learnt and the likes.
The task this week was to write two functions; guess the number function and a password generator function. The guess the number function was for users to guess a number between 0 and 20. In my function, the user gets to guess the number three times after which the right number will be printed if the guesses were wrong. During the three instances where user guesses the number, any number below or above the specified range, a warning will be printed telling the user that their guess is high or low and that they should guess lower or higher.
The second function was that of a password generator. I particularly enjoyed writing this function because it allowed me to work with the ‘string’ python module which I had never worked with before. The picture below shows my code for the password generator.
Probability was the focus this week. I wrote a medium post about this which you can find here: https://medium.com/@oluwafunmilayo/probability-and-data-science-8842c6f51601 .
I met with the Chinook dataset once again. The first time I worked with this dataset was when I was learning the basics of SQL. It was definitely interesting to work with this data in python. Working with this data in python required me to do a lot of merges which I wasn’t accustomed to. This is because the dataset has multiple tables and to get information about certain customers or maybe sales revenue, I needed to do a lot of merging across different data frames.
This is one of the most important steps when analyzing data. The data you are working with has to be sorted, filtered, cleaned etc. as real world data is always dirty. For example, your data may contain null values, also known as missing values. You have to deal with these values by either getting rid of such values or using statistics such as mean, median or mode to replace the missing values. Dealing with these values has to be done with caution.
I’m definitely looking forward to what the next two months will hold in store.
That’s it for today!
See you soon! 😀