Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,153,495 members, 7,819,808 topics. Date: Tuesday, 07 May 2024 at 12:15 AM

Chronicle Of A Data Scientist/analyst - Programming (41) - Nairaland

Nairaland Forum / Science/Technology / Programming / Chronicle Of A Data Scientist/analyst (332196 Views)

Chronicle Of A Data/cloud Engineer / Net Salary For A Data Analyst Or Scientist Or Web Dev / Aspiring Data Scientist. (2) (3) (4)

(1) (2) (3) ... (38) (39) (40) (41) (42) (43) (44) ... (146) (Reply) (Go Down)

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 10:57am On Jun 03, 2020
Dum20:


Okay great.

1. Firstly how did you start?
I started with self learning. But now am taking some courses online (edX, dataquest and Datacamp)

2. What are the courses you took to get to this stage of competence.
Not a lot. I don't even think am so competent just yet. Still work in progress. But most of the courses I have taken have been by myself, online.
Scraping Nairaland was with BeautifulSoup, and I just used the documentation.

3. For how long have been in Data Science
Six months, give or take. I have been learning python for 9 mnths.

My background i am taking a Bussiness Intelligence course on Udemy. I have taken the statistics, SQL, Tableau parts of the course. I am just started the python section. But i still feel inadequate.
I have also taken courses on Excel Power tools and Power BI.
Try to get into the act of doing, not only while you are watching the tutorials, but also at other times.
Doing makes you learn a lot. Start with something that interests you. If you like Nairaland, you may actually start with that nairaland dataset.


N.B. Can you give a step by step guide on how you got to final charts above.
You do not need to go into details. Something like:
1. You used XYZ to extract data
2. You used ABC to clean the data
3. Used MNOP to visualise the data
-Use jupyter notebook. Most of the libraries are pre-installed
-Import the required libraries
-Scraping was with beautifulsoup and you can consult the documentation online
-used pandas to clean the data. It was very dirty.
-Used seaborn to visualize the data. There are other options, including plotly.


There are a lot of courses to study. I am wondering if i should stop for now and practice real life examples of the subjects i have learnt.
Make sure to practice real-life examples. That's something you should always do. You can stop studying for now. Practice, and when you have challenges, go back to refer to your notes or to the videos.

11 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by lalasticlala(m): 11:06am On Jun 03, 2020
cochtrane:

You tried

3 Likes 8 Shares

Re: Chronicle Of A Data Scientist/analyst by Abcruz(m): 11:54am On Jun 03, 2020
@cochtrane

Your data analysis is mind blowing keep it up bro!

6 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020
lalasticlala:

You tried
haha grin
Make I create new thread, make you put am for front page? At least, get "programming" noticed a bit.
Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020
Abcruz:
@cochtrane

Your data analysis is mind blowing keep it up bro!
Thanks.
Re: Chronicle Of A Data Scientist/analyst by iCode2: 1:19pm On Jun 03, 2020
Cochtrane

You're just 6 months into Data Science? Wow
Can I pm you and probably talk on WhatsApp?

1 Like

Re: Chronicle Of A Data Scientist/analyst by Toppytek(m): 1:51pm On Jun 03, 2020
Zabiboy:

Nice one @ cochtrane cool
I'll use Tableau/Power bi to analyse mine..
Pandas needs more logic...not to talk of matplotlib grin...
Although i'll still use them ( pandas nd mpl) later

Haven’t used Tableau before but I doubt if Power Bi can go to this learnt in web scrapping.

1 Like

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:00pm On Jun 03, 2020
DivineGrace123:


Nice one, Cochtrane.
Thanks

4 Likes

Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:01pm On Jun 03, 2020
iCode2:
Cochtrane

You're just 6 months into Data Science? Wow
Can I pm you and probably talk on WhatsApp?
You could probably just ask questions here, so everyone can learn.

5 Likes

Re: Chronicle Of A Data Scientist/analyst by uuspace(m): 2:19pm On Jun 03, 2020
Abcruz:
@cochtrane

Your data analysis is mind blowing keep it up bro!

I tell you. He is on another level. He thought outside the box.

Seun should contact you for more analysis.
Re: Chronicle Of A Data Scientist/analyst by Dum20: 2:25pm On Jun 03, 2020
[quote author=cochtrane post=90262126]
I started with selfng learni

Please can you explain more by starting with self learning.

But now am taking some courses online (edX, dataquest and Datacamp)


Not a lot. I don't even think am so competent just yet. Still work in progress. But most of the courses I have taken have been by myself, online.
Scraping Nairaland was with BeautifulSoup, and I just used the documentation.


Six months, give or take. I have been learning python for 9 mnths.
Do you advise learning Python first before going into Data Science or study Data Science courses that have Python as part of the subjects to be learnt.


Try to get into the act of doing, not only while you are watching the tutorials, but also at other times.
Can you give example? Most of the courses have exercises.

Doing makes you learn a lot. Start with something that interests you. If you like Nairaland, you may actually start with that nairaland dataset.



-Use jupyter notebook. Most of the libraries are pre-installed
-Import the required libraries
-Scraping was with beautifulsoup and you can consult the documentation online
-used pandas to clean the data. It was very dirty.
-Used seaborn to visualize the data. There are other options, including plotly.



Make sure to practice real-life examples.
Can you suggest where to get real life examples?

Thanks for your answers so far.
Re: Chronicle Of A Data Scientist/analyst by Dum20: 2:28pm On Jun 03, 2020
lalasticlala:

You tried

Lala can you push some threads in the programming section to first page once in a while
Re: Chronicle Of A Data Scientist/analyst by Zabiboy: 3:10pm On Jun 03, 2020
Toppytek:


Haven’t used Tableau before but I doubt if Power Bi can go to this learnt in web scrapping.

Yeah thats true...
Both can't web-scrape cry ...
I'll work out a means cool ..
Would post it here when i'm done wink

3 Likes

Re: Chronicle Of A Data Scientist/analyst by lalasticlala(m): 3:36pm On Jun 03, 2020
Dum20:


Lala can you push some threads in the programming section to first page once in a while

Yes if there are interesting ones

1 Like 6 Shares

Re: Chronicle Of A Data Scientist/analyst by Regards2U: 6:00pm On Jun 03, 2020
grin
Re: Chronicle Of A Data Scientist/analyst by Regards2U: 6:01pm On Jun 03, 2020
grin
Re: Chronicle Of A Data Scientist/analyst by Regards2U: 6:15pm On Jun 03, 2020
cool
Re: Chronicle Of A Data Scientist/analyst by brashear: 7:02pm On Jun 03, 2020

3 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by Samzeal(m): 7:29pm On Jun 03, 2020
Please who knows about hash analytics, I was just offered internship program with them yesterday, but I don't know how their internship will be for data analysis because I'm in dataville research internship now which have not been learning any software.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Nobody: 7:34pm On Jun 03, 2020
Please, who knows if I can do machine learning on this old laptop??

1 Like

Re: Chronicle Of A Data Scientist/analyst by Ajibade123(m): 8:54pm On Jun 03, 2020
lalasticlala:


Yes if there are interesting ones
please Sir why is the anti-spam bot always banning me anytime I comment anything with a link??
Re: Chronicle Of A Data Scientist/analyst by teamoneline: 9:10pm On Jun 03, 2020
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.

If I want to be the first to comment I now know where to look. .. don't let those spammers see this. grin grin
Re: Chronicle Of A Data Scientist/analyst by Regards2U: 9:31pm On Jun 03, 2020
ebooks for data science from Manning, O'Reilly etc

All of statistics

https://drive.google.com/file/d/1kbVBjMhYcmuXowAZFVvRnCiC1SiFizDT/view?usp=drivesdk
.
Classic computer science problem in python

https://drive.google.com/file/d/1541MZpnnNzf_9Ih9dcZoz4QsxYAlP4KI/view?usp=drivesdk
.
Deep learning by Francois chollet ( creator of keras )

https://drive.google.com/file/d/159n0bKb4kQ5g6zsqBjMgjH9dgEiWrOFy/view?usp=drivesdk
.
Grooking deep learning

https://drive.google.com/file/d/13TYmwfr5OSIZcmwxv62CTGxEA4jBcy4t/view?usp=drivesdk
.
Introduction to machine learning with python

https://drive.google.com/file/d/13IpRS47fR99U5WW8Co1PNJbVQ-uUcijM/view?usp=drivesdk
.
Learning SQL
https://drive.google.com/file/d/15BwoXruI5yyiormxw-LznULkh_ISesLI/view?usp=drivesdk
.
Machine learning for hackers

https://drive.google.com/file/d/16kPGPqBsGwHKwtjnHOa0ZVSqNRwrJT1l/view?usp=drivesdk
.
Machine learning with python cookbook practical solution for programmers

https://drive.google.com/file/d/1jF8RI7qa45vPFmFFXW9fwqhKwAlEiff6/view?usp=drivesdk
.
Mathematics for machine learning

https://drive.google.com/file/d/1Qz9rrxhkzM1AAkqv1EMhLm9qKUaQVm2s/view?usp=drivesdk
.
Probability for enthusiastic beginner


https://drive.google.com/file/d/12wiXnCmSsplvp_QNYZBdiEusvU8-HK8F/view?usp=drivesdk
.
Python for data analysis data wrangling with pandas Numpy and Ipython

https://drive.google.com/file/d/1RkvJOsrci8L9C_-lbg-bhON3_3c_wF0P/view?usp=drivesdk
.
R cookbook proven recipes for data analysis statistics and graphics

https://drive.google.com/file/d/16yeCKWfC0Kax_mYgsbOpEZdB2ewee6vt/view?usp=drivesdk
.
R for data science import tidy, transform,
visualize and model data

https://drive.google.com/file/d/16nsYinyl36LYZAphbnRlJ9UQ64Vd2xB5/view?usp=drivesdk
.
The hundred page machine learning book
https://drive.google.com/file/d/13lKPbOtm37VoM1kCjOi46dVU5CK_1WPy/view?usp=drivesdk
.
Visualize this: the flowing data guide to design, visualisation and statistican

https://drive.google.com/file/d/16uS1MM2161tXR3zUTQwh2T1UUKMaOEO0/view?usp=drivesdk
.
I'm sorry about the restrictions earlier

11 Likes 8 Shares

Re: Chronicle Of A Data Scientist/analyst by Regards2U: 9:32pm On Jun 03, 2020
brashear:


You blocked access to the books
Try the one below now.
Re: Chronicle Of A Data Scientist/analyst by sleit: 9:35pm On Jun 03, 2020
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.

Guy this is really impressive.

What's your educational background like?
Re: Chronicle Of A Data Scientist/analyst by Abcruz(m): 9:44pm On Jun 03, 2020
Graspad:
Please, who knows if I can do machine learning on this old laptop??

Yes you can but it has to be on the cloud i.e you'll have to use Jupyter notebook from your web browser.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Regards2U: 9:54pm On Jun 03, 2020

7 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by Ejiod(m): 10:43pm On Jun 03, 2020
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.
Awesome!!!!
Love this...

4 Likes

Re: Chronicle Of A Data Scientist/analyst by Samzeal(m): 12:11am On Jun 04, 2020
Regards2U:
ebooks for data science from Manning, O'Reilly etc

All of statistics

https://drive.google.com/file/d/1kbVBjMhYcmuXowAZFVvRnCiC1SiFizDT/view?usp=drivesdk
.
Classic computer science problem in python

https://drive.google.com/file/d/1541MZpnnNzf_9Ih9dcZoz4QsxYAlP4KI/view?usp=drivesdk
.
Deep learning by Francois chollet ( creator of keras )

https://drive.google.com/file/d/159n0bKb4kQ5g6zsqBjMgjH9dgEiWrOFy/view?usp=drivesdk
.
Grooking deep learning

https://drive.google.com/file/d/13TYmwfr5OSIZcmwxv62CTGxEA4jBcy4t/view?usp=drivesdk
.
Introduction to machine learning with python

https://drive.google.com/file/d/13IpRS47fR99U5WW8Co1PNJbVQ-uUcijM/view?usp=drivesdk
.
Learning SQL
https://drive.google.com/file/d/15BwoXruI5yyiormxw-LznULkh_ISesLI/view?usp=drivesdk
.
Machine learning for hackers

https://drive.google.com/file/d/16kPGPqBsGwHKwtjnHOa0ZVSqNRwrJT1l/view?usp=drivesdk
.
Machine learning with python cookbook practical solution for programmers

https://drive.google.com/file/d/1jF8RI7qa45vPFmFFXW9fwqhKwAlEiff6/view?usp=drivesdk
.
Mathematics for machine learning

https://drive.google.com/file/d/1Qz9rrxhkzM1AAkqv1EMhLm9qKUaQVm2s/view?usp=drivesdk
.
Probability for enthusiastic beginner


https://drive.google.com/file/d/12wiXnCmSsplvp_QNYZBdiEusvU8-HK8F/view?usp=drivesdk
.
Python for data analysis data wrangling with pandas Numpy and Ipython

https://drive.google.com/file/d/1RkvJOsrci8L9C_-lbg-bhON3_3c_wF0P/view?usp=drivesdk
.
R cookbook proven recipes for data analysis statistics and graphics

https://drive.google.com/file/d/16yeCKWfC0Kax_mYgsbOpEZdB2ewee6vt/view?usp=drivesdk
.
R for data science import tidy, transform,
visualize and model data

https://drive.google.com/file/d/16nsYinyl36LYZAphbnRlJ9UQ64Vd2xB5/view?usp=drivesdk
.
The hundred page machine learning book
https://drive.google.com/file/d/13lKPbOtm37VoM1kCjOi46dVU5CK_1WPy/view?usp=drivesdk
.
Visualize this: the flowing data guide to design, visualisation and statistican

https://drive.google.com/file/d/16uS1MM2161tXR3zUTQwh2T1UUKMaOEO0/view?usp=drivesdk
.
I'm sorry about the restrictions earlier





Thank you for this resourceful materials
Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020
Graspad:
Please, who knows if I can do machine learning on this old laptop??
You can. It's only going to be a little inefficient. I think you should get something better.

(1) (2) (3) ... (38) (39) (40) (41) (42) (43) (44) ... (146) (Reply)

I Want To Learn Programming. Which Language Should I Start With?

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 82
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.