Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / NewStats: 3,152,721 members, 7,816,969 topics. Date: Friday, 03 May 2024 at 09:34 PM |
Nairaland Forum / Cochtrane's Profile / Cochtrane's Posts
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (of 46 pages)
Travel / Re: Canadian Express Entry/federal Skilled Workers Program - Connect Here Part 9 by cochtrane(m): 11:06am On Jun 11, 2020 |
pinkbananas:That's fine. Send a pm. Would be nice if we had more people though, so this can be maximized. We can come up with a convenient date. I used to give tutorials here for free months ago. So, it'd be cool to do this again. 9 Likes 1 Share |
Travel / Re: Canadian Express Entry/federal Skilled Workers Program - Connect Here Part 9 by cochtrane(m): 10:01am On Jun 11, 2020 |
Mirian91:I'll help you with your writing. Send me a pm. 1 Like 1 Share |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 6:53am On Jun 11, 2020 |
ibromodzi:The code is written in R, and uses the tm package |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:24pm On Jun 10, 2020 |
Gcool2:Thanks man |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:18pm On Jun 10, 2020 |
Finally. this won't be complete without mentioning the resultant "Confusion Matrix". Managed to create a visualization for it. When visualized, we see that for most of the sections, the correct prediction was made. There were a few sections where probably not enough values to form a cell. These appear to be "Programming", "Pets", etc. Red cells mean zero. For example, there were no successful predictions for "Webmasters". Light cells mean successful predictions. Most of the cells along the diagonal are light, correlating with the fairly good accuracy obtained. If this catches your interest, you can download the notebook here on my github and play around with it, if you want. The code is in R. 5 Likes 1 Share
|
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:15pm On Jun 10, 2020 |
cochtrane: Following up on this dataset, I started wondering, can one define a machine learning question with this data set given its limited number of features? Apparently yes! In this second part, I examine procedures for fitting a model with this data set. The research question takes the form: given a post title, can one tell which section it is from? For example, given a title "COVID-19: Governor Ikpeazu's Two Aides Test Positive", can our model tell that it is from the Health section? Using NLP procedures, one can design a machine learning model which takes some part of this data set and fits a model to it, so that with the test data, we can ask questions of the remainder titles. This is a typical unsupervised model design known as classification. This particular task is multi-class classification with about 37 classes (all sections on Nairaland). This is a little harder than binary classification which has just two labels, because there are many more labels and the chances of being right for any one prediction is quite low (1/37 in this case, if we consider independence). Before fitting, I generated a wordcloud to see which words are the most prominent. Apparently, "buhari" has been a prominent word over the past year on Nairaland's front page. Little wonder it chances of occurrence was quite high in the initial analysis I did. "lagos" is also prominent. And unsurprisingly, "coronavirus" is also. SVM with a linear kernel was used for the classification task, and worked quite well. Ended up with an overall accuracy of about 69%. For some specific keywords, the accuracy was even higher. For example, for the keyword "buhari", the model placed the frontpage topic in "Politics" all the time and was correct for 96% of the time. For the keyword "rape", it had a choice of three different sections and still managed an accuracy of 84%. For the keyword "coronavirus", it didn't do so well. Managed only an accuracy of 69%. In any case, it shows that some of these predictions are possible. One can probably improve this model by training it on more features such as number of posts, post author, time of post. More features should improve its accuracy. I may get around to that if I've got more time. If you are able to do it, drop a message. 10 Likes 3 Shares
|
Business / Re: Giving Out Free Call Credits by cochtrane(m): 12:05pm On Jun 10, 2020 |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:29pm On Jun 09, 2020 |
KunSegzy100:are you familiar with data structures and algorithms? |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:58pm On Jun 08, 2020 |
teewhydope:I used beautiful soup. I'm working on congregating the code I used. I'll share it on my github when done. 2 Likes |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 5:43pm On Jun 07, 2020 |
Samzeal:How do you mean? The file format is csv. You just import it into whatever package you are using. |
Programming / Re: Funny Programming Memes. Just For Laughs by cochtrane(m): 5:20pm On Jun 06, 2020 |
SUPREMOTM:Apt lol |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 11:20am On Jun 06, 2020 |
maybe the mods have an answer why there was such decline. There likely is a causal factor, as the statistics show it's not due to chance. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 11:19am On Jun 06, 2020 |
Hardheolar:This is good. Not unexpectedly, we saw lots more posts from health section. I'm guessing you are onto something there regarding a factor directly causing the decline in number of posts hitting frontpage from around November. I'd say that decline is statistically significant. That's what I though originally, before going into R to check. And it's kinda true. The mean decline in post between September and October wasn't statistically significant. But once we got to November, the mean decline became statistically significant between November and the previous month. Same results between December and the previous month. This is inferential statistics. The density plots show why this may be the case since the peaks are aligned differently. But one wouldn't judge only visually, unless you conduct a two-sample t-test like below. t-test between September and October values: p-value is quite high and not less than 0.05 > t.test(t.sep, t.oct) Decline between November and October shows some statistical significance (pvalue <0.05) > t.test(t.nov, t.oct)
|
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:47am On Jun 06, 2020 |
Hardheolar:Brilliant! That monthly analysis thing is quite revealing. I also wonder if there's any insight deriveable from the influence of coronavirus on hours/days of Frontpage posts. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 5:22pm On Jun 05, 2020 |
Hardheolar:Did some reorganization. Here it is. I'll be uploading some more scraped data which includes details of each thread, including user, sex and number of posts. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 3:44pm On Jun 05, 2020 |
elunico:probability distributions, inferential statistics, of course descriptive statistics. I feel like descriptive statistics is what you wanna focus more on, cos it's foundational. And naturally, Bayesian statistics. Check out R for applied statistics, if you use R. Or the one here: faculty(dot)marshall(dot)usc(dot)edu(forward_slash)gareth(hyphen)james(forward_slash)ISL(forward_slash) 6 Likes |
Programming / Re: SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 2:50pm On Jun 05, 2020 |
Ogechukwu01:Send a PM, if interested. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:09pm On Jun 05, 2020 |
Hardheolar:Just looked at the csv now. This is where it stops. That's 31st of May: 2019-05-31 15:38:00 1 Like |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 10:08am On Jun 05, 2020 |
elunico:I actually love the statistics. I'm building my knowledge on inferential statistics. It's beautiful. I think it's not been very difficult generally for me to understand most of the topics. The difficult things are the maths associated with PCA, Gaussian processes (and related...). I'm familiar with linear algebra however, so I believe in a while I'll be good with these topics. 1 Like |
Programming / Re: SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 2:46pm On Jun 04, 2020 |
Dum20:replied |
Programming / SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 12:45pm On Jun 04, 2020 |
I think many data scientists/enthusiasts will agree that SQL is the foundation of data analysis. In fact, some jobs require you to know SQL and nothing more. If you are getting to data science and don't have knowledge of SQL, then you've still got a long way to go. For those struggling to learn SQL, I've got some news for you. I'll be organizing a SQL class for free in some days (maybe a few weeks?). If you are interested, send me a PM and I'll add your name to the list. When we get to a reasonable number, I'll open the class. Perhaps we can get new people interested in SQL, or help beginners firm up their knowledge. Mode of delivery: online (zoom) Time: will be about 7- 9pm Nigerian time on a weekend. Resources needed: A laptop and good internet connection What to expect: You will be doing some coding yourself, while following the lecture We will use SQLite. But most of what will be learnt can be extended to MySQL and Postgres. Shoot me a PM if interested. Hopefully we can get some more engagement going here for enthusiasts! If we get enough interested candidates, we'll open the class. 1 Like 1 Share |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:42pm On Jun 04, 2020 |
Abcruz:Okay, good idea actually. 1 Like |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 6:53am On Jun 04, 2020 |
5 Likes |
Programming / Re: Funny Programming Memes. Just For Laughs by cochtrane(m): 6:48am On Jun 04, 2020 |
Hilarious sht lol 1 Like |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020 |
Ejiod:Thanks! |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020 |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020 |
Graspad:You can. It's only going to be a little inefficient. I think you should get something better. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:01pm On Jun 03, 2020 |
iCode2:You could probably just ask questions here, so everyone can learn. 5 Likes |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:00pm On Jun 03, 2020 |
DivineGrace123:Thanks 4 Likes |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020 |
Abcruz:Thanks. |
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020 |
lalasticlala:haha Make I create new thread, make you put am for front page? At least, get "programming" noticed a bit. |
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (of 46 pages)
(Go Up)
Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health religion celebs tv-movies music-radio literature webmasters programming techmarket Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 66 |