Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,151,649 members, 7,813,191 topics. Date: Tuesday, 30 April 2024 at 08:25 AM

Chronicle Of A Data Scientist/analyst - Programming (24) - Nairaland

Nairaland Forum / Science/Technology / Programming / Chronicle Of A Data Scientist/analyst (330844 Views)

Chronicle Of A Data/cloud Engineer / Net Salary For A Data Analyst Or Scientist Or Web Dev / Aspiring Data Scientist. (2) (3) (4)

(1) (2) (3) ... (21) (22) (23) (24) (25) (26) (27) ... (146) (Reply) (Go Down)

Re: Chronicle Of A Data Scientist/analyst by scave(m): 8:55pm On Apr 15, 2020
Ejiod:

This is beautiful. Nice work
thank you Sir

1 Like

Re: Chronicle Of A Data Scientist/analyst by Chukwudaalu(m): 1:21pm On Apr 16, 2020
scave:
https://scave222.github.io/covid19statistics/

check out my covid19 stat
Great work.
Re: Chronicle Of A Data Scientist/analyst by Evanspaul(m): 1:59pm On Apr 16, 2020
scave:
https://scave222.github.io/covid19statistics/

check out my covid19 stat
its fine
Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 9:27pm On Apr 16, 2020
Bibitayo2:
I started with an excel basics eBook and took courses on coursera for the advanced level.
The eBook is in PDF format and I have been unable to share it on Nairaland.

Pls bro kindly send the ebook to my email;
adebowalefemi2468@gmail.com
Thanks

1 Like

Re: Chronicle Of A Data Scientist/analyst by Vecto(m): 10:36pm On Apr 16, 2020
Evanspaul:

grin
No bro, he is a white dude.
Has tutorials on YouTube
Just search for his name and put programming tutorials

Seen, thanks cheesy
Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 2:19am On Apr 17, 2020
Toppytek:


Python Pandas, Sql(MySQL), Power Bi or Tableau.

Hello bro, I already downloaded the python & pycharm app and seeing this new one ie .python Panda. I want to ask if the two exhibit the same function, and if not can I still download the Panda.
Thanks.

3 Likes

Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 2:27am On Apr 17, 2020
peterincredible:
yes very good udemy tutorial for mysql by colt steele and python by jose portilla the problem is that i stay in ogun state(sagam) if you are serious i will give it to you for free if you can come to sagam tongue
Pls bro I am also interested in getting this, probably after the lockdown moment over there. I will like if you can give a means of communicating that I can use, if possible. Thanks.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 3:39am On Apr 17, 2020
abdeiz:


Lenovo and Dell are bae. I got a dell laptop, core i7 vpro, 8gb ram and 3-4 hours battery for a bit less than 100k. Lenovo is same price with same specs were I am and even better battery.
Hello sir, pls can I know the particular model of the Dell laptop?
I'm currently using a Dell latitude E-6320 model, but I'm planning to upgrade as soon as I start making my hands dirty.
Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 3:50am On Apr 17, 2020
Hello here, pls I am in need of someone who can help in or can give details of how to crack a modem of any network into a universal one.
Thanks.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Ejiod(m): 5:25am On Apr 17, 2020
Phemysam19:
Hello here, pls I am in need of someone who can help in or can give details of how to crack a modem of any network into a universal one.
Thanks.
Please create a separate thread for this.
We speak data here

4 Likes

Re: Chronicle Of A Data Scientist/analyst by abdeiz(m): 9:16am On Apr 17, 2020
Phemysam19:

Hello sir, pls can I know the particular model of the Dell laptop?
I'm currently using a Dell latitude E-6320 model, but I'm planning to upgrade as soon as I start making my hands dirty.

Dell Latitude e7450 500gb, 8gb ram. 13' inches 4 hours battery life

2 Likes

Re: Chronicle Of A Data Scientist/analyst by Toppytek(m): 10:14am On Apr 17, 2020
So I downloaded the covid-19 dataset from the link @ejiod shared.

I dropped the province/state, lat, and long column as well.

Re: Chronicle Of A Data Scientist/analyst by Toppytek(m): 10:17am On Apr 17, 2020
The problem is that I want to group and aggregate the country/region column so that each country will fall on one row instead of multiple rows

1 Like

Re: Chronicle Of A Data Scientist/analyst by Toppytek(m): 10:25am On Apr 17, 2020
It’s quite unfortunate that I’ve been getting errors.
I need help concerning this, I want to have a total column of each row as well so that I can ignore the dates and plot a graph of each country and total number of deaths.

Thanks

@mcemmy0z

1 Like

Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 10:56am On Apr 17, 2020
Toppytek:
It’s quite unfortunate that I’ve been getting errors.
I need help concerning this, I want to have a total column of each row as well so that I can ignore the dates and plot a graph of each country and total number of deaths.

Thanks

@mcemmy0z

Lemme give you example
If you are to group a particular column, the column has to have some repeating values in them. eg, countries from the Corona virus. Which also have regions, confirmed, deaths, and recoverd.
So if you are to plot on this to give u a more clearing values you have to group them.
data.groupby(['country'])['confirmed, 'death', 'recoverd'].max()

You can first group the countries alone to confirm if your countries column is ok. data.groupby('countries').max()

Using datetime, first pass the date to datetime series
data['data'] = pd.datatime(data['data'])

and then group using the data
data.groupby('date')['confrimed', 'death', 'recoverd'].max()

5 Likes

Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 11:00am On Apr 17, 2020
Toppytek:
So I downloaded the covid-19 dataset from the link @ejiod shared.

I dropped the province/state, lat, and long column as well.
All those other columns remaining, you need to melt them down
use
pd.melt(data, id_varse =[columns you don't want to melt]
Re: Chronicle Of A Data Scientist/analyst by Toppytek(m): 11:23am On Apr 17, 2020
mcemmy0z:


Lemme give you example
If you are to group a particular column, the column has to have some repeating values in them. eg, countries from the Corona virus. Which also have regions, confirmed, deaths, and recoverd.
So if you are to plot on this to give u a more clearing values you have to group them.
data.groupby(['country'])['confirmed, 'death', 'recoverd'].max()

You can first group the countries alone to confirm if your countries column is ok. data.groupby('countries').max()

Using datetime, first pass the date to datetime series
data.groupby('date')['confrimed', 'death', 'recoverd'].max()
Thanks so much, really appreciate this, I’ll give it a try later, machine is down for now.
Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 11:31am On Apr 17, 2020
Toppytek:

Thanks so much, really appreciate this, I’ll give it a try later, machine is down for now.

You have to first melt before you start analysing and plotting, from the melt that's where the date will come out.
If you want to melt that data, you can leave lat and lag if you are going to plot on geo

df.melt(id_vars= [ 'Province/state', 'country/region','lat', 'lang'])
You can now remain the variable and value using
df.rename(columns={'variables': 'date', 'value': 'confirmed'})

then you can group

2 Likes

Re: Chronicle Of A Data Scientist/analyst by Abcruz(m): 11:44am On Apr 17, 2020
Phemysam19:


Hello bro, I already downloaded the python & pycharm app and seeing this new one ie .python Panda. I want to ask if the two exhibit the same function, and if not can I still download the Panda.
Thanks.

Pandas is a package for data analysis that you'll install into Python. For now focus on learning Python basics and you'll get to know how it works as time goes on.

4 Likes

Re: Chronicle Of A Data Scientist/analyst by graciousolo(m): 2:54pm On Apr 17, 2020
Ejiod:
DAY 1: GET DATA

Task one was to pull the data directly online rather than downloading.
Some using jupyter notebook I already have it on the ipynb which you can always rerun.Also the excel file of the data. The data is dirty which needs to be clean


To get online data use this,

from bs4 import BeautifulSoup
import requests
import pandas as pd
s = requests.Session()
s.headers = {'user-agent': 'Corona'}
page = s.get('https://www.worldometers.info/coronavirus/')
data = pd.read_html(page.text)[0]
data.to_excel('Corona.xlsx')


Will continue tomorrow.

Did this analysis with excel... Could be better.

Plan is to keep learning Excel and SQL then Python.

4 Likes

Re: Chronicle Of A Data Scientist/analyst by Nobody: 4:24pm On Apr 17, 2020
mcemmy0z:


Lemme give you example
If you are to group a particular column, the column has to have some repeating values in them. eg, countries from the Corona virus. Which also have regions, confirmed, deaths, and recoverd.
So if you are to plot on this to give u a more clearing values you have to group them.
data.groupby(['country'])['confirmed, 'death', 'recoverd'].max()

You can first group the countries alone to confirm if your countries column is ok. data.groupby('countries').MAX()

Using datetime, first pass the date to datetime series
data['data'] = pd.datatime(data['data'])

and then group using the data
data.groupby('date')['confrimed', 'death', 'recoverd'].max()


I have been using SUM instead of MAX in cases like this, but I saw in an online tutorial that we should use MAX and you confirmed it but I haven't gotten the reason.

If you can explain, I'll appreciate.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 4:24pm On Apr 17, 2020
Ejiod:

Please create a separate thread for this.
We speak data here
Ok sir Ejiod, I'm sorry for that.
Re: Chronicle Of A Data Scientist/analyst by Nobody: 4:30pm On Apr 17, 2020
Small time series I did comparing rate of increase of death and confirmed cases using Seaborn library (python).

1 Like

Re: Chronicle Of A Data Scientist/analyst by mcemmy0z: 5:00pm On Apr 17, 2020
Graspad:



I have been using SUM instead of MAX in cases like this, but I saw in an online tutorial that we should use MAX and you confirmed it but I haven't gotten the reason.

If you can explain, I'll appreciate.

Not that you can't use SUM using it will double the data value, you know the virus is coming day to day . So if advisable to use MAX which will give you the cumulative frequency.

1 Like

Re: Chronicle Of A Data Scientist/analyst by Nobody: 5:04pm On Apr 17, 2020
mcemmy0z:


Not that you can't use SUM using it will double the data value, you know the virus is coming day to day . So if advisable to use MAX which will give you the cumulative frequency.

Thanks!!!

3 Likes

Re: Chronicle Of A Data Scientist/analyst by Abcruz(m): 6:22pm On Apr 17, 2020
Kaggle is no longer working on all my browsers has anyone encountered the same issue?

3 Likes

Re: Chronicle Of A Data Scientist/analyst by RealTrump: 6:44pm On Apr 17, 2020
KunSegzy100:

Experts in the house please help see to this issue, I just pip installer Numpy, Pandas and jupyter so as to start learning using Pandas for analysis, I opened a new folder stores the file to be processed, navigated to the folder through the CMD prompt and launched jupyter from there. But to my surprise I was not able to import the file until I went on YouTube and saw a video where he added (encoding, "Latin" nrows=(number of rows) I tried it and it worked but I don't understand jack about what I did and why I had to add the encoding in order for the file to be imported on jupyter notebook.

Secondly have been going through some Pandas videos online and 80% of the analysis have seen can be done using excel it Dax function in PowerBI, can someone highlight the important and unique things used during preparing a file for analysis in platforms like PowerBI.

Also, I have been seeing videos where they use Pandas for analysis where they extract useful information from a file for example determining countries with over 5000cases of Corona virus from a dataframe, after getting this separate info, how can this information be stored separately?

Also I'd like to know for those using matplotlib, after getting the visuals what next? Do they export the graph? Generally I just want to know how they gather findings together in Pandas for purpose of reuse or continuation of analysis in other platforms.

Pardon my questions I am inquisitive by nature at things I don't know. God bless you for answering

Better to always use the right encoding before loading your file on Jupyter or pycharm.
just download sublime3. Na shikini mb e go chop. Load the file on sublime and save it with UTF-8 encoding. Them no born the file, e go load everytime grin. After doing this, just read as csv file. that's all.

4 Likes 1 Share

Re: Chronicle Of A Data Scientist/analyst by RealTrump: 6:56pm On Apr 17, 2020
KunSegzy100:

Experts in the house please help see to this issue, I just pip installer Numpy, Pandas and jupyter so as to start learning using Pandas for analysis, I opened a new folder stores the file to be processed, navigated to the folder through the CMD prompt and launched jupyter from there. But to my surprise I was not able to import the file until I went on YouTube and saw a video where he added (encoding, "Latin" nrows=(number of rows) I tried it and it worked but I don't understand jack about what I did and why I had to add the encoding in order for the file to be imported on jupyter notebook.

Secondly have been going through some Pandas videos online and 80% of the analysis have seen can be done using excel it Dax function in PowerBI, can someone highlight the important and unique things used during preparing a file for analysis in platforms like PowerBI.

Also, I have been seeing videos where they use Pandas for analysis where they extract useful information from a file[b] for example determining countries with over 5000cases of Corona virus from a dataframe, after getting this separate info, how can this information be stored separately?[/b]

Also I'd like to know for those using matplotlib, after getting the visuals what next? Do they export the graph? Generally I just want to know how they gather findings together in Pandas for purpose of reuse or continuation of analysis in other platforms.

Pardon my questions I am inquisitive by nature at things I don't know. God bless you for answering

I know the solution to the problem in bold, it actually took me 4 days to find it out. But las las na small thing, but I 4 days were not wasted as i learnt other things relating to data cleaning.

I will give a hint, after you filter out the cases over 5k, the code you used for the filtering, you need to assign the code to a new 'name.' From here, you can always call the filtered value. Try it out, if you don't get it, I will give the full code you need.
Re: Chronicle Of A Data Scientist/analyst by RealTrump: 7:10pm On Apr 17, 2020
Phemysam19:


Hello bro, I already downloaded the python & pycharm app and seeing this new one ie .python Panda. I want to ask if the two exhibit the same function, and if not can I still download the Panda.
Thanks.

Pandas is used in handling any data that involves rows and columns, i.e tables.

To your bolded question, yes, you can still download pandas...you might not need it yet though.

Open pycharm, click on terminal(see screenshot). Then the cursor will be blinking in the terminal window, just type "pip install pandas". Make sure you have an internet connection and chill, it will write a successful message after it is done.

5 Likes

Re: Chronicle Of A Data Scientist/analyst by RealTrump: 7:25pm On Apr 17, 2020
@Ejiod
@mcemmy0z
@Chukwudaalu

I beg make una rate my own racing chart on tableau. Did this after learning tableau for under 24hours. plotted the animation based on my own intuition, there was no direct example anywhere to help me. I will couple more countries, but i started with one. There is still a lot to learn, but tableau is making Seaborn look like a big waste of time.

https://public.tableau.com/profile/bola2980#!/vizhome/covid-OneNine/Covid19-BelgiumCases?publish=yes

To play the chart, 1. drag the slider back. 2. Press play (unfortunately, you won't be able to fast forward it, you can drag the slider to like halfway to save time). The slider and the play button are around the bottom right corner...see attached file.

Ladies and gentlemen, encourage your boi if you like this small animation wey i arrange. I intend to celebrate every victory! Programming is sweet when you are getting it.

3 Likes

Re: Chronicle Of A Data Scientist/analyst by Phemysam19(m): 9:57pm On Apr 17, 2020
RealTrump:


Pandas is used in handling any data that involves rows and columns, i.e tables.

To your bolded question, yes, you can still download pandas...you might not need it yet though.

Open pycharm, click on terminal(see screenshot). Then the cursor will be blinking in the terminal window, just type "pip install pandas". Make sure you have an internet connection and chill, it will write a successful message after it is done.
Thanks so much sir.

(1) (2) (3) ... (21) (22) (23) (24) (25) (26) (27) ... (146) (Reply)

I Want To Learn Programming. Which Language Should I Start With?

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 54
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.