Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,152,721 members, 7,816,969 topics. Date: Friday, 03 May 2024 at 09:34 PM

Cochtrane's Posts

Nairaland Forum / Cochtrane's Profile / Cochtrane's Posts

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (of 46 pages)

Travel / Re: Canadian Express Entry/federal Skilled Workers Program - Connect Here Part 9 by cochtrane(m): 11:06am On Jun 11, 2020
pinkbananas:


Pls can you help me too? I'm writing next month
That's fine. Send a pm. Would be nice if we had more people though, so this can be maximized. We can come up with a convenient date.
I used to give tutorials here for free months ago. So, it'd be cool to do this again.

9 Likes 1 Share

Travel / Re: Canadian Express Entry/federal Skilled Workers Program - Connect Here Part 9 by cochtrane(m): 10:01am On Jun 11, 2020
Mirian91:
People that passed IELTS writing please how did you guys do it. I ve used all the materials at my disposal and online. Liz, chris Pell, buddy all to no avail.

My ideas are always poor and difficult to develop. Are there other things I'm missing to get a 7 in writing. I'm stocked in 6.5.

Pls help!
I'll help you with your writing. Send me a pm.

1 Like 1 Share

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 6:53am On Jun 11, 2020
ibromodzi:


Man you are on another level....
Seun should employ you.
I'll like to ask what you use for NLP; spacy, NLTK or Textblob?
The code is written in R, and uses the tm package
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:24pm On Jun 10, 2020
Gcool2:
cochtrane,you have done a wonderful job... Well-done...You made my day with this insight.keep it up.I will pm you.
Thanks man
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:18pm On Jun 10, 2020
Finally. this won't be complete without mentioning the resultant "Confusion Matrix".
Managed to create a visualization for it. When visualized, we see that for most of the sections, the correct prediction was made. There were a few sections where probably not enough values to form a cell. These appear to be "Programming", "Pets", etc. Red cells mean zero. For example, there were no successful predictions for "Webmasters". Light cells mean successful predictions. Most of the cells along the diagonal are light, correlating with the fairly good accuracy obtained.
If this catches your interest, you can download the notebook here on my github and play around with it, if you want. The code is in R.

5 Likes 1 Share

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:15pm On Jun 10, 2020
cochtrane:
As a budding data scientist who visits NL often, it's not surprising that you start to get more than interested in the some of the topics making front page and how frequent topics from individual sections reach the top. I have been looking into this for a while and thought it would be nice to do some investigation in this regard. For example, which section makes front page most often? How often do we see programming topics get to the front page? Who posts more often on the front page? Is it really lalasticlala, as is frequently supposed, or is it someone else? What exactly has been the relationship between lalasticalala and snakes over the past year? Some people think he loves to push snake topics to the frontpage more often than other topics. What else can we learn from the topics making frontpage? Like for example, are they mostly about Buhari or something else?

To this end, I scrapped the front page data and obtained more than 28,000 records. You can download this data set I obtained here on my github. If you are a data science enthusiast who also likes Nairaland, this may be good motivation to dig into a topic that interests you. You will find a metadata file in the sublink as well and can investigate what the attributes are about. You've got titles, links, sections and time that posts made front page. It's a year of data from 31st May 2019 till date. It turns out to get the whole frontpage information may need more than 230,000 records! That's huge, and probably not so wise to collect for a quick, lazy analysis. Except, of course, you have business motives

For me, I was interested in a few topics.
First, from which section did we get the most frontpage material over the past year? Apparently, it is "Politics". It trumps everything. "Celebrities" come a close second. Not surprising, right? What with the volume of Bobrisky posts and co. And then "Crime" comes third. Does this point to a high frequency of crime in Nigeria? I leave that question to you. "Programming"? Didn't even make bank one time!
The fact that politics make frontpage more often clearly shows that top on our discourse as Nigerians is probably politics, if Nairaland reflects a microcosm of the Nigerian environment, which I feel it does.

Who posts more often on the frontpage? Not lalasticlala like you might think. It's a person called dre11; at least over the last year. Maybe you know him, may you don't. Lalasticlala is not even in the top three.

One quirky thing I found, however, was that the time it takes for a post to get to frontpage has a heavily right-skewed distribution. Before plotting this, I lazily thought it might be normally distributed, cos...well, a lot of things are normally distributed and it shouldn't be unusual to have this normally distributed as well; few make front page early, few late, and most are in between. Right? On the contrary, the reality is skewed. I feel the heavy skewness probably points to deliberate human intervention. Most posts make front page early, not late. They are created and in little time pushed to the front page. Evidently in a deliberate fashion. Else the data should be normally distributed, don't you think? Anyways, that's what my data shows. Maybe, better insight could be derived though if one scraped randomly over the past several years in order to obtain a truly random sample.

And there were a few threads which made front page late. Very late! In the past year, we have had threads from 8 years ago make frontpage. Yes, 8 years ago! Thats's 2012. And then there are those that were initially posted 5 years ago before they made front page. Perhaps you can find more if you looked into the data set?

Anyways, getting your hands dirty with a data set is always a good way to learn data analysis. If you need help with navigating this, you can buzz me.

Following up on this dataset, I started wondering, can one define a machine learning question with this data set given its limited number of features? Apparently yes!

In this second part, I examine procedures for fitting a model with this data set. The research question takes the form: given a post title, can one tell which section it is from? For example, given a title "COVID-19: Governor Ikpeazu's Two Aides Test Positive", can our model tell that it is from the Health section?

Using NLP procedures, one can design a machine learning model which takes some part of this data set and fits a model to it, so that with the test data, we can ask questions of the remainder titles. This is a typical unsupervised model design known as classification. This particular task is multi-class classification with about 37 classes (all sections on Nairaland). This is a little harder than binary classification which has just two labels, because there are many more labels and the chances of being right for any one prediction is quite low (1/37 in this case, if we consider independence).

Before fitting, I generated a wordcloud to see which words are the most prominent. Apparently, "buhari" has been a prominent word over the past year on Nairaland's front page. Little wonder it chances of occurrence was quite high in the initial analysis I did. "lagos" is also prominent. And unsurprisingly, "coronavirus" is also.

SVM with a linear kernel was used for the classification task, and worked quite well. Ended up with an overall accuracy of about 69%. For some specific keywords, the accuracy was even higher. For example, for the keyword "buhari", the model placed the frontpage topic in "Politics" all the time and was correct for 96% of the time. For the keyword "rape", it had a choice of three different sections and still managed an accuracy of 84%. For the keyword "coronavirus", it didn't do so well. Managed only an accuracy of 69%. In any case, it shows that some of these predictions are possible. One can probably improve this model by training it on more features such as number of posts, post author, time of post. More features should improve its accuracy. I may get around to that if I've got more time. If you are able to do it, drop a message.

10 Likes 3 Shares

Business / Re: Giving Out Free Call Credits by cochtrane(m): 12:05pm On Jun 10, 2020
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:29pm On Jun 09, 2020
KunSegzy100:
hello guys, i have my first assessment (aptitude test followed by interview) for data analyst role scheduled for tomorrow, kindly give me tips to aid preparation.
are you familiar with data structures and algorithms?
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:58pm On Jun 08, 2020
teewhydope:


please how did you scrape the data from NL using powerBI. I'm more interested in how you went about getting the sections the topics came from and the author. did you also implement it with python script
I used beautiful soup. I'm working on congregating the code I used. I'll share it on my github when done.

2 Likes

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 5:43pm On Jun 07, 2020
Samzeal:


Please how can this dataset be arranged inform of table for easy access?
How do you mean?
The file format is csv. You just import it into whatever package you are using.
Programming / Re: Funny Programming Memes. Just For Laughs by cochtrane(m): 5:20pm On Jun 06, 2020
SUPREMOTM:
When all the children are mature and the papa is senile.
Apt lol
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 11:20am On Jun 06, 2020
maybe the mods have an answer why there was such decline. There likely is a causal factor, as the statistics show it's not due to chance.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 11:19am On Jun 06, 2020
Hardheolar:


The advent of coronavirus in March increased the number of posts hitting front page from health section as seen in the image attached below. 1,617 threads on coronavirus have made it to front page.

Instead of having the months start from January to December which might be confusing, I rearranged the months to show when the the data started from, which was June 2019. There has been a huge decline in the number of posts that made it to front page since December 2019 which slightly increased by 17% in March 2020 due to Covid.

The questions that needs to be answered are:
- was there a change in NL policy regarding the number of posts that hits front page?
- or the change in moderators during that period?
This is good. Not unexpectedly, we saw lots more posts from health section.
I'm guessing you are onto something there regarding a factor directly causing the decline in number of posts hitting frontpage from around November. I'd say that decline is statistically significant. That's what I though originally, before going into R to check. And it's kinda true.

The mean decline in post between September and October wasn't statistically significant. But once we got to November, the mean decline became statistically significant between November and the previous month. Same results between December and the previous month. This is inferential statistics.

The density plots show why this may be the case since the peaks are aligned differently. But one wouldn't judge only visually, unless you conduct a two-sample t-test like below.

t-test between September and October values: p-value is quite high and not less than 0.05
> t.test(t.sep, t.oct)

Welch Two Sample t-test

data: t.sep and t.oct
t = 0.012042, df = 56.259, p-value = 0.9904
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-13.86717 14.03492
sample estimates:
mean of x mean of y
100.6000 100.5161



Decline between November and October shows some statistical significance (pvalue <0.05)
> t.test(t.nov, t.oct)

Welch Two Sample t-test

data: t.nov and t.oct
t = -2.7617, df = 56.929, p-value = 0.007727
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-32.57471 -5.19088
sample estimates:
mean of x mean of y
81.63333 100.51613

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:47am On Jun 06, 2020
Hardheolar:
Hi guys, I did further digging on the dataset that cochtrane scraped using a business intelligence tool called Power BI.
Note: The analysis is for the date when the thread made it to front page, not when it was created which spanned from 3rd of June 2019 to 2nd of June 2020 . The data has 28,516 threads from 38 different sections by 5025 different accounts.

Few insights from the dataset
-dre11 is the king of political threads with 361 threads followed by Islie(329) and ijustdey(225) during the time captured. Lala is more interested in celebrity threads compared to politics, followed by Alex. Ogbiwa is the defending champion of sports threads.
-Threads are mostly pushed to front page in the morning, which is reasonable since that is when the day begins.
-There was spike in threads that were pushed to front page in July 2019, but I can't tell if that is the norm during that period since we don't have previous year's data to make the comparison.
-You will think that threads in religion section should top the threads that make it to front page on Sundays, but it came third after politics and celebrity threads.
-853 threads made it to front page with "kill" keyword. That is worrisome and a cause of concern.
e.tc.
Lots of insight can be derived from the data.
Brilliant! That monthly analysis thing is quite revealing. I also wonder if there's any insight deriveable from the influence of coronavirus on hours/days of Frontpage posts.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 5:22pm On Jun 05, 2020
Hardheolar:

Was actually looking at the date it hit front page.
Btw, the link is no longer accessible
Did some reorganization. Here it is.
I'll be uploading some more scraped data which includes details of each thread, including user, sex and number of posts.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 3:44pm On Jun 05, 2020
elunico:


Yeah, your background in engineering wouldn't see the associated math much as a hurdle to surmount.

Please, I'd like to know the areas of statistics I'd need to arm myself with in order to obtain proficiency in the Data Scientist field. Materials you have found helpful too.
probability distributions, inferential statistics, of course descriptive statistics.
I feel like descriptive statistics is what you wanna focus more on, cos it's foundational. And naturally, Bayesian statistics.
Check out R for applied statistics, if you use R. Or the one here: faculty(dot)marshall(dot)usc(dot)edu(forward_slash)gareth(hyphen)james(forward_slash)ISL(forward_slash)

6 Likes

Programming / Re: SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 2:50pm On Jun 05, 2020
Ogechukwu01:
cochtrane... very interested
Send a PM, if interested.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 1:09pm On Jun 05, 2020
Hardheolar:

The data in the link you pasted is showing 3rd of June 2019 to 2nd of June, 2020 which defers from your stated starting date.
Just looked at the csv now. This is where it stops. That's 31st of May:

2019-05-31 15:38:00

1 Like

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 10:08am On Jun 05, 2020
elunico:


Great!! I'm also an engineer, I haven't much knowledge in programming, in fact, I just started learning HTML. How has your limited knowledge of statistics played a role in your progress.

I'm good with calculations thouh.
I actually love the statistics. I'm building my knowledge on inferential statistics. It's beautiful. I think it's not been very difficult generally for me to understand most of the topics. The difficult things are the maths associated with PCA, Gaussian processes (and related...). I'm familiar with linear algebra however, so I believe in a while I'll be good with these topics.

1 Like

Programming / Re: SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 2:46pm On Jun 04, 2020
Dum20:
Sent you a pm
replied
Programming / SQL Basics & Advanced Topics: Free Webinar! by cochtrane(m): 12:45pm On Jun 04, 2020
I think many data scientists/enthusiasts will agree that SQL is the foundation of data analysis. In fact, some jobs require you to know SQL and nothing more. If you are getting to data science and don't have knowledge of SQL, then you've still got a long way to go.

For those struggling to learn SQL, I've got some news for you. I'll be organizing a SQL class for free in some days (maybe a few weeks?). If you are interested, send me a PM and I'll add your name to the list. When we get to a reasonable number, I'll open the class. Perhaps we can get new people interested in SQL, or help beginners firm up their knowledge.

Mode of delivery: online (zoom)
Time: will be about 7- 9pm Nigerian time on a weekend.
Resources needed: A laptop and good internet connection
What to expect: You will be doing some coding yourself, while following the lecture

We will use SQLite. But most of what will be learnt can be extended to MySQL and Postgres.
Shoot me a PM if interested. Hopefully we can get some more engagement going here for enthusiasts!

If we get enough interested candidates, we'll open the class.

1 Like 1 Share

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:42pm On Jun 04, 2020
Abcruz:


I suggest you create a thread concerning this for wider exposure.
Okay, good idea actually.

1 Like

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 6:53am On Jun 04, 2020

5 Likes

Programming / Re: Funny Programming Memes. Just For Laughs by cochtrane(m): 6:48am On Jun 04, 2020
Hilarious sht lol grin grin

1 Like

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020
Ejiod:

Awesome!!!!
Love this...
Thanks!
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 4:32am On Jun 04, 2020
Graspad:
Please, who knows if I can do machine learning on this old laptop??
You can. It's only going to be a little inefficient. I think you should get something better.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:01pm On Jun 03, 2020
iCode2:
Cochtrane

You're just 6 months into Data Science? Wow
Can I pm you and probably talk on WhatsApp?
You could probably just ask questions here, so everyone can learn.

5 Likes

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 2:00pm On Jun 03, 2020
DivineGrace123:


Nice one, Cochtrane.
Thanks

4 Likes

Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020
Abcruz:
@cochtrane

Your data analysis is mind blowing keep it up bro!
Thanks.
Programming / Re: Chronicle Of A Data Scientist/analyst by cochtrane(m): 12:27pm On Jun 03, 2020
lalasticlala:

You tried
haha grin
Make I create new thread, make you put am for front page? At least, get "programming" noticed a bit.

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (of 46 pages)

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 66
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.