Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,150,882 members, 7,810,382 topics. Date: Saturday, 27 April 2024 at 08:03 AM

How I Built A Nairaland Web Scraper - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / How I Built A Nairaland Web Scraper (1122 Views)

Looking For Web Scraper For A Paid Scraping Job / Hi, Nairaland Web Devs, Checkout My Freelance Website. / Review My App I Built From Scratch (2) (3) (4)

(1) (2) (Reply) (Go Down)

How I Built A Nairaland Web Scraper by DataMina: 11:20am On Oct 19, 2023
Hello everyone. I am a data analyst who enjoys webscraping. Nairaland has been my go to place to keep up with trending news asides Twitter, and being a lover of this platform I decided to build a webscrapper with Selenium and Python.

This scraper is designed to extract the thread topics the count of views, users, and guests from Nairaland.

The Scraped data can be used to develop a power bi report that can be refreshed each time Scraping is done. The published dashboard can be used to morning threads that generate the highest and lowest views. You can check out the project in my GitHub repository:https://github.com/StephDAnalyst/Nairaland

3 Likes 3 Shares

Re: How I Built A Nairaland Web Scraper by Babangidapikin: 11:27am On Oct 19, 2023
DataMina:
Hello everyone. I am a data analyst who enjoys webscraping. Nairaland has been my go to place to keep up with trending news asides Twitter, and being a lover of this platform I decided to build a webscrapper with Selenium and Python.

This scraper is designed to extract the thread topics the count of views, users, and guests from Nairaland.

The Scraped data can be used to develop a power bi report that can be refreshed each time Scraping is done. The published dashboard can be used to morning threads that generate the highest and lowest views. You can check out the project in my GitHub repository:https://github.com/StephDAnalyst/Nairaland
Good for practice, can you scrap LinkedIn for email address.
Re: How I Built A Nairaland Web Scraper by DataMina: 11:30am On Oct 19, 2023
I use a different trick to do it. If you need the service you can let me know
Re: How I Built A Nairaland Web Scraper by BlackhatMentor: 11:32am On Oct 19, 2023
A scrapper that scrapes emails, phone numbers with their usernames will be better.

This one u did isn't very useful
Re: How I Built A Nairaland Web Scraper by DataMina: 11:35am On Oct 19, 2023
I know, I have worked on a scraper that scraps emails from LinkedIn using Google search after which I used stringr libraries to extract emails from the text. I get your point though...
Re: How I Built A Nairaland Web Scraper by princely4ever: 11:37am On Oct 19, 2023
My own webscraper/datascraper lets you extract web assets including html, css and javascript files

1 Like 1 Share

Re: How I Built A Nairaland Web Scraper by Babangidapikin: 11:40am On Oct 19, 2023
DataMina:
I use a different trick to do it. If you need the service you can let me know
Okay by the way can you visualize data
Re: How I Built A Nairaland Web Scraper by airsaylongcome: 11:48am On Oct 19, 2023
BlackhatMentor:
A scrapper that scrapes emails, phone numbers with their usernames will be better.

This one u did isn't very useful

How isn't this useful? A social media management team would use the report from this scrape to target threads and topics they should be driving contentand engagement with. Some of you need to think laterally

1 Like

Re: How I Built A Nairaland Web Scraper by airsaylongcome: 11:49am On Oct 19, 2023
OP,

Is it possible to do the "impossible"? Scrape the entire site's content and dump in an LLM

1 Like

Re: How I Built A Nairaland Web Scraper by BlackhatMentor: 11:52am On Oct 19, 2023
airsaylongcome:


How isn't this useful? A social media management team would use the report from this scrape to target threads and topics they should be driving contentand engagement with. Some of you need to think laterally

I don't just comment for commenting sake.

It's not useful...

Nairaland already provides that data so what's the point creating another one
Re: How I Built A Nairaland Web Scraper by airsaylongcome: 11:56am On Oct 19, 2023
BlackhatMentor:


I don't just comment for commenting sake.

It's not useful...

Nairaland already provides that data so what's the point creating another one

Ehhhh...what's the point? How about the OP learning? How about having a visual Dashboard instead of Nairaland bland numbers only without any visual perception of how those numbers compare? I don't get up and make random comments. But to say the OP's work isn't useful is absolutely not correct. It is useful. I find it useful and I'm sure there are loads and loads of non-data science or data engineering folks that will find it very useful
Re: How I Built A Nairaland Web Scraper by DataMina: 11:57am On Oct 19, 2023
airsaylongcome:
OP,

Is it possible to do the "impossible"? Scrape the entire site's content and dump in an LLM
Re: How I Built A Nairaland Web Scraper by DataMina: 11:59am On Oct 19, 2023
[quote author=DataMina post=126494004][/quote]
It is very much doable, but you know you have to content with CAPTCHas
Re: How I Built A Nairaland Web Scraper by BlackhatMentor: 11:59am On Oct 19, 2023
airsaylongcome:


Ehhhh...what's the point? How about the OP learning? How about having a visual Dashboard instead of Nairaland bland numbers only without any visual perception of how those numbers compare? I don't get up and make random comments. But to say the OP's work isn't useful is absolutely not correct. It is useful. I find it useful and I'm sure there are loads and loads of non-data science or data engineering folks that will find it very useful

It's useful for learning purpose only.

I don't see any problem it offers a solution to sha.

1 Like

Re: How I Built A Nairaland Web Scraper by DataMina: 12:00pm On Oct 19, 2023
Babangidapikin:

Okay by the way can you visualize data
Integrating the data to Power BI allows you to do that. With BI tools you can even do data refresh

1 Like

Re: How I Built A Nairaland Web Scraper by Cheryl463337: 12:19pm On Oct 19, 2023
DataMina:
Hello everyone. I am a data analyst who enjoys webscraping. Nairaland has been my go to place to keep up with trending news asides Twitter, and being a lover of this platform I decided to build a webscrapper with Selenium and Python.

This scraper is designed to extract the thread topics the count of views, users, and guests from Nairaland.

The Scraped data can be used to develop a power bi report that can be refreshed each time Scraping is done. The published dashboard can be used to morning threads that generate the highest and lowest views. You can check out the project in my GitHub repository:https://github.com/StephDAnalyst/Nairaland
why can't you use normal requests and BeautifulSoup which i believe will be faster than selenium?
Re: How I Built A Nairaland Web Scraper by DataMina: 1:25pm On Oct 19, 2023
Cheryl463337:
why can't you use normal requests and BeautifulSoup which i believe will be faster than selenium?
I was just experimenting with it because when I tried using Octoparse (a badass no code tool) to scrap nairaland website, it couldn't work because the site didn't appear structured. So I decided to experiment with Selenium and it worked
Re: How I Built A Nairaland Web Scraper by BeLookingIDIOT(m): 2:26pm On Oct 19, 2023
You're doing something illegal while announcing it on the very platform grin

1 Like

Re: How I Built A Nairaland Web Scraper by airsaylongcome: 3:08pm On Oct 19, 2023
BeLookingIDIOT:
You're doing something illegal while announcing it on the very platform grin

Illegal is pushing it a bit. Unethical, yes. But if NL doesn’t expose APIs for devs to legally consume data, then people have no option than to scrape shege from it.

2 Likes

Re: How I Built A Nairaland Web Scraper by airsaylongcome: 3:10pm On Oct 19, 2023
BlackhatMentor:


It's useful for learning purpose only.

I don't see any problem it offers a solution to sha.

Think of it as first alpha. They definitely would refine it until they get to v1
Re: How I Built A Nairaland Web Scraper by Cheryl463337: 3:18pm On Oct 19, 2023
DataMina:

I was just experimenting with it because when I tried using Octoparse (a badass no code tool) to scrap nairaland website, it couldn't work because the site didn't appear structured. So I decided to experiment with Selenium and it worked
Okay
Re: How I Built A Nairaland Web Scraper by Felixitie(m): 3:44pm On Oct 19, 2023
DataMina:

I was just experimenting with it because when I tried using Octoparse (a badass no code tool) to scrap nairaland website, it couldn't work because the site didn't appear structured. So I decided to experiment with Selenium and it worked

Tho, it seems the page loads dynamically making Bs4 hard to easily get the data out, selenium can load the page and render the javascript, then you may now use Bs4 to soup it and get the stuff (combination of Sele&Bs4). Scrappy works too easily.

You can as well grab all the front page topic links first and then loop through it using Bs4 to get all the data points , to improve the speed.

You have done so well.

Can we work on a portfolio project together using scrapy with splash or scrapy with playwright to generate leads, then we dump it into a database plus scheduling using airflow?
Re: How I Built A Nairaland Web Scraper by Felixitie(m): 3:49pm On Oct 19, 2023
airsaylongcome:


Illegal is pushing it a bit. Unethical, yes. But if NL doesn’t expose APIs for devs to legally consume data, then people have no option than to scrape shege from it.

To scrape 'SHEGE' from it. Lol. grin
Re: How I Built A Nairaland Web Scraper by BlackhatMentor: 4:08pm On Oct 19, 2023
airsaylongcome:


Think of it as first alpha. They definitely would refine it until they get to v1

If you say so lol
Re: How I Built A Nairaland Web Scraper by DataMina: 5:41pm On Oct 19, 2023
Felixitie:


Tho, it seems the page loads dynamically making Bs4 hard to easily get the data out, selenium can load the page and render the javascript, then you may now use Bs4 to soup it and get the stuff (combination of Sele&Bs4). Scrappy works too easily.

You can as well grab all the front page topic links first and then loop through it using Bs4 to get all the data points , to improve the speed.

You have done so well.

Can we work on a portfolio project together using scrapy with splash or scrapy with playwright to generate leads, then we dump it into a database plus scheduling using airflow?
I was trying to use scrapy and playwright and it turned out that playwright doesn't do well with windows. I tried virtualizing with wsl2 yet it still didn't work. So I decided to stick with Selenium pending when i lay hands on a Mac or Linux PC.

My WhatsApp is zero813six3six5six03

1 Like

Re: How I Built A Nairaland Web Scraper by landiqa(m): 6:10pm On Oct 19, 2023
Can you develop one that scrape for phone numbers and validate the phone numbers for Whatsapp.
Re: How I Built A Nairaland Web Scraper by DataMina: 8:09pm On Oct 19, 2023
landiqa:
Can you develop one that scrape for phone numbers and validate the phone numbers for Whatsapp.
I can scrap Jiji for phone numbers but the validation on Whatsapp is what I don't know about

1 Like

Re: How I Built A Nairaland Web Scraper by Paystack: 10:02pm On Oct 19, 2023
DataMina:

I can scrap Jiji for phone numbers but the validation on Whatsapp is what I don't know about

I believe validating on WhatsApp shouldn't be an issue tho
Re: How I Built A Nairaland Web Scraper by DyingFetus: 2:39am On Oct 20, 2023
I did something but not web scrapping just extraction of useful threads and posts by certain monikers using requests
Re: How I Built A Nairaland Web Scraper by DyingFetus: 2:42am On Oct 20, 2023
airsaylongcome:
OP,

Is it possible to do the "impossible"? Scrape the entire site's content and dump in an LLM

cheesy
Re: How I Built A Nairaland Web Scraper by turmacs(f): 6:00am On Oct 20, 2023
BlackhatMentor:
A scrapper that scrapes emails, phone numbers with their usernames will be better.

This one u did isn't very useful
idiot, do your own then.
Re: How I Built A Nairaland Web Scraper by airsaylongcome: 8:01am On Oct 20, 2023
DyingFetus:
I did something but not web scrapping just extraction of useful threads and posts by certain monikers using requests

That would be an interesting one. There are some monikers that I believe are alts for the Nairaland bbq griller. Would be interesting to scrape his main and the suspected alts for comparison of writing style and similarity

(1) (2) (Reply)

Networking Or Database..pls Help Me Out. / Building For Android And Blackberry In C# / PHP: Five Common Causes Of White Screen Of Death

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 49
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.