Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,151,395 members, 7,812,160 topics. Date: Monday, 29 April 2024 at 09:16 AM

Chronicle Of A Data Scientist/analyst - Programming (47) - Nairaland

Nairaland Forum / Science/Technology / Programming / Chronicle Of A Data Scientist/analyst (330700 Views)

Chronicle Of A Data/cloud Engineer / Net Salary For A Data Analyst Or Scientist Or Web Dev / Aspiring Data Scientist. (2) (3) (4)

(1) (2) (3) ... (44) (45) (46) (47) (48) (49) (50) ... (146) (Reply) (Go Down)

Re: Chronicle Of A Data Scientist/analyst by yemyke001(m): 4:51pm On Jun 09, 2020
KunSegzy100:
hello guys, i have my first assessment (aptitude test followed by interview) for data analyst role scheduled for tomorrow, kindly give me tips to aid preparation.

I will be glad if you can share the site where you see the opening and my mail is yemyke@gmail.com.. I will be more glad if you can remove your vital info from your CV and help send the carcass to my mail. Thank you in anticipation.
Re: Chronicle Of A Data Scientist/analyst by yemyke001(m): 5:05pm On Jun 09, 2020
I will be glad if you can help with a template of your CV to model mine and good sites to search for opening. Thanks
Re: Chronicle Of A Data Scientist/analyst by yemyke001(m): 5:06pm On Jun 09, 2020
mcemmy0z:

Be ready to work on their data. Is one thing of knowing how to do all those data trick, plotting chat, or graph, prediction, but explaining them is the main work there, ground yourself well on how to tell a story with your data. Know how to explain a relationship between data, correlation. And you will be asked to show your previous works. If part of their requirements required you to know SQL, Google on self join, there is this one common question they use to ask, manger to employee relations. If you are going to work on python, make sure you store your code somewhere, if u are using Jupiter just create a new page or folder and store most of your code for data preprocessing and machine learning on it.. perhaventure if you will be giving any data just copy the data into the folder where you stored your code and fire on with Jupiter.
I will also advice you to use power Bi for your Visualization, tableau will be good cause of it's dashboard and story telling page, it will catch the audience than using seaborn or matplotlib.
Don't just do anything without knowing how to explain them. I can assure you that you will forget all your code.. store them well. They might ask u to present your work with power point. Be ready to defend every u will do.
Master that power Bi well, for now is really treading in Nigeria, lost a job opportunity last two months cause I was yet to mastered it then, they were looking for a professional in PowerBI and I was new to that app then.


I will be glad if you can help with a template of your CV to model mine and good sites to search for opening. Thanks
Re: Chronicle Of A Data Scientist/analyst by Boxers02: 5:23pm On Jun 09, 2020

Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 6:46pm On Jun 09, 2020
yemyke001:
I will be glad if you can help with a template of your CV to model mine and good sites to search for opening. Thanks
Download indeed app on Google play store register as a data analyst and wait for data analyst job on your mail everyday.

11 Likes 3 Shares

Re: Chronicle Of A Data Scientist/analyst by yemyke001(m): 8:02pm On Jun 09, 2020
mcemmy0z:

Download indeed app on Google play store register as a data analyst and wait for data analyst job on your mail everyday.
Thanks Bro
Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 8:10pm On Jun 09, 2020
mcemmy0z:

Be ready to work on their data. Is one thing of knowing how to do all those data trick, plotting chat, or graph, prediction, but explaining them is the main work there, ground yourself well on how to tell a story with your data. Know how to explain a relationship between data, correlation. And you will be asked to show your previous works. If part of their requirements required you to know SQL, Google on self join, there is this one common question they use to ask, manger to employee relations. If you are going to work on python, make sure you store your code somewhere, if u are using Jupiter just create a new page or folder and store most of your code for data preprocessing and machine learning on it.. perhaventure if you will be giving any data just copy the data into the folder where you stored your code and fire on with Jupiter.
I will also advice you to use power Bi for your Visualization, tableau will be good cause of it's dashboard and story telling page, it will catch the audience than using seaborn or matplotlib.
Don't just do anything without knowing how to explain them. I can assure you that you will forget all your code.. store them well. They might ask u to present your work with power point. Be ready to defend every u will do.
Master that power Bi well, for now is really treading in Nigeria, lost a job opportunity last two months cause I was yet to mastered it then, they were looking for a professional in PowerBI and I was new to that app then.



pls I am just a learner, curious to know something..pls u indicated that the guy should store his code somewhere..

Do pple have any particular code they work with?

I believe code will be based on the task given,How will he know the task he would be given to be sure that his code will work.
?.

pls enlighten me more or I want to get the part u said "store your code".

1 Like

Re: Chronicle Of A Data Scientist/analyst by Nobody: 8:33pm On Jun 09, 2020
Boxers02:
Guys I'm giving out one of this

All you need to do is, send a screenshot of you watching this video to the link in my signature and you receive the course

https://www.youtube.com/watch?v=xYNKKXgtt0Q&feature=youtu.be
are they udemy videos ?
Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 9:33pm On Jun 09, 2020
Gcool2:


pls I am just a learner, curious to know something..pls u indicated that the guy should store his code somewhere..

Do pple have any particular code they work with?

I believe code will be based on the task given,How will he know the task he would be given to be sure that his code will work.
?.

pls enlighten me more or I want to get the part u said "store your code".
If you familiarize yourself well enough to python for data preprocessing you will realize that most of the process are same, from importing all necessary libraries to importing your data, checking the head, doing some descriptive analysis on it, removing NAN values or dropping unwanted columns, melting columns, changing columns name.. there are sepeate code for all these, but have saved everything somewhere cause I don't have time to code, I just have to rename the data name to the one am using. Don't know when last I even type code. Same with machine learning, both supervised and unsupervised, whether to create dummie variable or I want to scale data. I saved all the code there for each machine learning model too ranging from linear to logistics regression, KNN, decision trees etc.. even when using Visualization libraries like seaborn there are some crazy plot that I can't even cram the code so I have to save them all and all those code for different lines, colors too. Working smart is what it takes.

15 Likes 2 Shares

Re: Chronicle Of A Data Scientist/analyst by Boxers02: 10:03pm On Jun 09, 2020
Electricboy:
are they udemy videos ?
Yes they are
Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 10:13pm On Jun 09, 2020
mcemmy0z:

If you familiarize yourself well enough to python for data preprocessing you will realize that most of the process are same, from importing all necessary libraries to importing your data, checking the head, doing some descriptive analysis on it, removing NAN values or dropping unwanted columns, melting columns, changing columns name.. there are sepeate code for all these, but have saved everything somewhere cause I don't have time to code, I just have to rename the data name to the one am using. Don't know when last I even type code. Same with machine learning, both supervised and unsupervised, whether to create dummie variable or I want to scale data. I saved all the code there for each machine learning model too ranging from linear to logistics regression, KNN, decision trees etc.. even when using Visualization libraries like seaborn there are some crazy plot that I can't even cram the code so I have to save them all and all those code for different lines, colors too. Working smart is what it takes.
Thanks for this explicit response ..pls check ur mail.
Re: Chronicle Of A Data Scientist/analyst by Theflint1(m): 12:44pm On Jun 10, 2020
Macsjebs check the visualization for top front-page posters.
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.

2 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by Grandlord: 1:02pm On Jun 10, 2020
Saw this somewhere and I thought it might help someone on here organize their learning process. You're welcome cool

20 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:15pm On Jun 10, 2020
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.

Following up on this dataset, I started wondering, can one define a machine learning question with this data set given its limited number of features? Apparently yes!

In this second part, I examine procedures for fitting a model with this data set. The research question takes the form: given a post title, can one tell which section it is from? For example, given a title "COVID-19: Governor Ikpeazu's Two Aides Test Positive", can our model tell that it is from the Health section?

Using NLP procedures, one can design a machine learning model which takes some part of this data set and fits a model to it, so that with the test data, we can ask questions of the remainder titles. This is a typical unsupervised model design known as classification. This particular task is multi-class classification with about 37 classes (all sections on Nairaland). This is a little harder than binary classification which has just two labels, because there are many more labels and the chances of being right for any one prediction is quite low (1/37 in this case, if we consider independence).

Before fitting, I generated a wordcloud to see which words are the most prominent. Apparently, "buhari" has been a prominent word over the past year on Nairaland's front page. Little wonder it chances of occurrence was quite high in the initial analysis I did. "lagos" is also prominent. And unsurprisingly, "coronavirus" is also.

SVM with a linear kernel was used for the classification task, and worked quite well. Ended up with an overall accuracy of about 69%. For some specific keywords, the accuracy was even higher. For example, for the keyword "buhari", the model placed the frontpage topic in "Politics" all the time and was correct for 96% of the time. For the keyword "rape", it had a choice of three different sections and still managed an accuracy of 84%. For the keyword "coronavirus", it didn't do so well. Managed only an accuracy of 69%. In any case, it shows that some of these predictions are possible. One can probably improve this model by training it on more features such as number of posts, post author, time of post. More features should improve its accuracy. I may get around to that if I've got more time. If you are able to do it, drop a message.

10 Likes 3 Shares

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:18pm On Jun 10, 2020
Finally. this won't be complete without mentioning the resultant "Confusion Matrix".
Managed to create a visualization for it. When visualized, we see that for most of the sections, the correct prediction was made. There were a few sections where probably not enough values to form a cell. These appear to be "Programming", "Pets", etc. Red cells mean zero. For example, there were no successful predictions for "Webmasters". Light cells mean successful predictions. Most of the cells along the diagonal are light, correlating with the fairly good accuracy obtained.
If this catches your interest, you can download the notebook here on my github and play around with it, if you want. The code is in R.

5 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 2:49pm On Jun 10, 2020
Guys,I want to drop two vital information.

1. Pls,I will advise everyone to get a Twitter account.I get motivated towards Tech everyday cos of the community of pple I surround myself with,both online and offline.You can try using Twitter and add some Tech gurus who through their posts share more insights towards programming.I will recommend @DThompsonDev .This guy will simplify every problem you feel you are encountering as a beginner.

2.If you are a beginner,I recommend "Automate the boring stuff udemy course" for you as the first step towards learning data science or data analysis.With this,other stuff you would be learning would be easier.For the beginners who started with hash analytics internship,I have spoken with a lot and I cam conclude the lack of python basics serve as setback towards understanding major concepts.

Here is the link for the free udemy course.

https://udemycoupons.me/automate-the-boring-stuff-with-python-programming-100-off/


pls you can follow me on Twitter and let's build each other,I follow back ASAP.

Follow me on Twitter @DatumCue

I am a beginner like you,I want to see us Succeed .

11 Likes 4 Shares

Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 2:55pm On Jun 10, 2020
Theflint1:
Macsjebs check the visualization for top front-page posters.
Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 3:02pm On Jun 10, 2020
cochtrane:
Finally. this won't be complete without mentioning the resultant "Confusion Matrix".
Managed to create a visualization for it. When visualized, we see that for most of the sections, the correct prediction was made. There were a few sections where probably not enough values to form a cell. These appear to be "Programming", "Pets", etc. Red cells mean zero. For example, there were no successful predictions for "Webmasters". Light cells mean successful predictions. Most of the cells along the diagonal are light, correlating with the fairly good accuracy obtained.
If this catches your interest, you can download the notebook here on my github and play around with it, if you want. The code is in R.


cochtrane,you have done a wonderful job... Well-done...You made my day with this insight.keep it up.I will pm you.
Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:24pm On Jun 10, 2020
Gcool2:
cochtrane,you have done a wonderful job... Well-done...You made my day with this insight.keep it up.I will pm you.
Thanks man
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 9:12pm On Jun 10, 2020
cochtrane:


Following up on this dataset, I started wondering, can one define a machine learning question with this data set given its limited number of features? Apparently yes!

In this second part, I examine procedures for fitting a model with this data set. The research question takes the form: given a post title, can one tell which section it is from? For example, given a title "COVID-19: Governor Ikpeazu's Two Aides Test Positive", can our model tell that it is from the Health section?

Using NLP procedures, one can design a machine learning model which takes some part of this data set and fits a model to it, so that with the test data, we can ask questions of the remainder titles. This is a typical unsupervised model design known as classification. This particular task is multi-class classification with about 37 classes (all sections on Nairaland). This is a little harder than binary classification which has just two labels, because there are many more labels and the chances of being right for any one prediction is quite low (1/37 in this case, if we consider independence).

Before fitting, I generated a wordcloud to see which words are the most prominent. Apparently, "buhari" has been a prominent word over the past year on Nairaland's front page. Little wonder it chances of occurrence was quite high in the initial analysis I did. "lagos" is also prominent. And unsurprisingly, "coronavirus" is also.

SVM with a linear kernel was used for the classification task, and worked quite well. Ended up with an overall accuracy of about 69%. For some specific keywords, the accuracy was even higher. For example, for the keyword "buhari", the model placed the frontpage topic in "Politics" all the time and was correct for 96% of the time. For the keyword "rape", it had a choice of three different sections and still managed an accuracy of 84%. For the keyword "coronavirus", it didn't do so well. Managed only an accuracy of 69%. In any case, it shows that some of these predictions are possible. One can probably improve this model by training it on more features such as number of posts, post author, time of post. More features should improve its accuracy. I may get around to that if I've got more time. If you are able to do it, drop a message.

Man you are on another level....
Seun should employ you.
I'll like to ask what you use for NLP; spacy, NLTK or Textblob?
Re: Chronicle Of A Data Scientist/analyst by Singingbae(m): 12:19am On Jun 11, 2020
mcemmy0z:
If you reside around songo Ota I have these available
*Udemy - Beginner to Pro in Excel Financial Modeling and Valuation
*Udemy - SQL - MySQL for Data Analytics and Business Intelligence
*Tableau 10 A-Z Hands-On Tableau Training For Data Science!
*Tableau Hands-on Learn Data Visualization with Tableau
*Udemy - Power BI A-Z Hands-On Power BI Training For Data Science
*Udemy - Machine Learning A-Z™ Hands-On Python & R In Data Science
*Udemy - Python for Financial Analysis and Algorithmic Trading

Good evening sir, please I don’t know how I can meet you, I reside in Agege, but I can come to Sango one of these weekends if you can give me which is comfortable for you, I just finished python and I just got admitted to a virtual data science boot camp, I just want to be ahead of them! Thanks in anticipation
Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 1:51am On Jun 11, 2020

1 Like

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 6:53am On Jun 11, 2020
ibromodzi:


Man you are on another level....
Seun should employ you.
I'll like to ask what you use for NLP; spacy, NLTK or Textblob?
The code is written in R, and uses the tm package
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 8:25am On Jun 11, 2020
cochtrane:

The code is written in R, and uses the tm package

Alright! Thanks.
Re: Chronicle Of A Data Scientist/analyst by elunico: 9:38am On Jun 11, 2020
Gcool2:
Guys,I want to drop two vital information.

1. Pls,I will advise everyone to get a Twitter account.I get motivated towards Tech everyday cos of the community of pple I surround myself with,both online and offline.You can try using Twitter and add some Tech gurus who through their posts share more insights towards programming.I will recommend @DThompsonDev .This guy will simplify every problem you feel you are encountering as a beginner.

2.If you are a beginner,I recommend "Automate the boring stuff udemy course" for you as the first step towards learning data science or data analysis.With this,other stuff you would be learning would be easier.For the beginners who started with hash analytics internship,I have spoken with a lot and I cam conclude the lack of python basics serve as setback towards understanding major concepts.

Here is the link for the free udemy course.

https://udemycoupons.me/automate-the-boring-stuff-with-python-programming-100-off/


pls you can follow me on Twitter and let's build each other,I follow back ASAP.

Follow me on Twitter @DatumCue

I am a beginner like you,I want to see us Succeed .

That course isn't free.
Re: Chronicle Of A Data Scientist/analyst by Gcool2(m): 12:22pm On Jun 11, 2020
elunico:


That course isn't free.
They revoked it..it was free..I downloaded couple of free courses there
Re: Chronicle Of A Data Scientist/analyst by elunico: 4:59pm On Jun 11, 2020
Gcool2:
They revoked it..it was free..I downloaded couple of free courses there

OK.
Re: Chronicle Of A Data Scientist/analyst by DrinkWater10: 10:55pm On Jun 11, 2020
Ejiod:
Loading........
Re: Chronicle Of A Data Scientist/analyst by DrinkWater10: 2:44am On Jun 12, 2020
Graspad:

Kindly explain how it works, because I'm having issues working with csv files on the laptop.
If I try to import the csv,it's saying not found.
Hi. Tried downloading tutorial files from freetutorial website but I keep getting a compressed torrent file.
How can I get around it?
Re: Chronicle Of A Data Scientist/analyst by Mikechinos(m): 6:14am On Jun 12, 2020
DrinkWater10:

Hi. Tried downloading tutorial files from freetutorial website but I keep getting a compressed torrent file.
How can I get around it?
download it with UC browser
Re: Chronicle Of A Data Scientist/analyst by Generalkorex(m): 9:24am On Jun 12, 2020
Kaycee54321:


I think so too. He made Python Basics, a walk in the park. My email is ble2gen.360@gmail.com

Hope to hear from you soon.
Link to video please
Re: Chronicle Of A Data Scientist/analyst by BoleAndFish: 9:41am On Jun 12, 2020
DrinkWater10:

Hi. Tried downloading tutorial files from freetutorial website but I keep getting a compressed torrent file.
How can I get around it?
What course were you trying to download?

(1) (2) (3) ... (44) (45) (46) (47) (48) (49) (50) ... (146) (Reply)

I Want To Learn Programming. Which Language Should I Start With?

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 94
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.