Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,154,182 members, 7,822,003 topics. Date: Thursday, 09 May 2024 at 12:40 AM

How Do I Extract Data From Online Websites? - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / How Do I Extract Data From Online Websites? (2702 Views)

Discover How to make money from online Travel Agency / Extract & Edit BTC Mining/market, Check Here / Please How Do I Print This Data From MY Database. MYSQL, PHP (2) (3) (4)

(1) (Reply) (Go Down)

How Do I Extract Data From Online Websites? by parkicism: 5:39pm On May 17, 2016
I want to be able to extract data from online news sites like punchng, naij, and nairaland for later offline viewing and just wanted to know if you guys know any APIs or tools that can help me extracting data with ease.
Thanks
Re: How Do I Extract Data From Online Websites? by Nobody: 6:06pm On May 17, 2016
parkicism:
I want to be able to extract data from online news sites like punchng, naij, and nairaland for later offline viewing and just wanted to know if you guys know any APIs or tools that can help me extracting data with ease.
Thanks

Simple.

Learn SQL injection
Re: How Do I Extract Data From Online Websites? by elfico(m): 6:49pm On May 17, 2016
parkicism:
I want to be able to extract data from online news sites like punchng, naij, and nairaland for later offline viewing and just wanted to know if you guys know any APIs or tools that can help me extracting data with ease.
Thanks
It depends on what you want to achieve. There is a firefox extension called Scrapbook that can save an entire website including links to other pages. To target some other form of data, you can try import.io
Re: How Do I Extract Data From Online Websites? by elfico(m): 6:51pm On May 17, 2016
Sugarhugs:

Simple.
Learn SQL injection
yeah. good luck with that.
Re: How Do I Extract Data From Online Websites? by Nobody: 7:07pm On May 17, 2016
You needa study ''web scraping''....
Google that...
Re: How Do I Extract Data From Online Websites? by parkicism: 7:20pm On May 17, 2016
elfico:
It depends on what you want to achieve. There is a firefox extension called Scrapbook that can save an entire website including links to other pages. To target some other form of data, you can try import.io
Essentially, I just want to extract the title text and the content text from the websites so that I can save it on to my program for later offline viewing. I want it to be fully automated daily, so I wanted to know if there are any data APIs out there for these websites.
Thanks for your help!
Re: How Do I Extract Data From Online Websites? by Kodejuice: 7:58pm On May 17, 2016
If the websites use RSS, you could just grab the url and use an XMLParser to conver it to object and arrays, XMLParser are found in all web prog-languages, or a plugin for parsing XML will certainly exist for the language u wish to use.

Or you could go hardcore, i actually built a Nairaland App for android using phonegap(screenshots included below), my popsi seized my Laptop untill i write my neco finish, after then i could push the app to Android app stores and probably build the ios version.

You know nairaland provides no API for getting it posts,it has RSS but it only includes 12 post from the FP (very annoying), so i used JavaScript to parse the pages fire for fire,

first of all i got the source code of each sections in nairaland, and i inspected the HTML, got the elements that holds the posts, comments, topics and stuffs like that, i then used jQuery to get the contents of these elements and stored them in an array, i did this for all sections apart from the FP, that one is quite different, different element for its posts and stuffs like that.

Acts much like an API but its just a hack, not saying u should follow this route, because its a very cumbersome task and a dangerous one any slight change in nairalands html may cause the app to fail, the app allows you to do everything, i couldnt include things like (login) so i added an option 'Nairaland web', gives u the nairaland webview inside the app, allowing you to do everything!, will be releasing the app after my Neco exam.

If you need, i could give you the script i wrote for parsing NL pages, it doesnt parse the (search, edit profile, new) and other irrelevant pages.

The screenshot i included is an old one, updated the app with more options b4 my Lapi got seized, options like (Trending, Recent, ...)

So, best of luck with the hacks!.

1 Like

Re: How Do I Extract Data From Online Websites? by Nobody: 8:09pm On May 17, 2016
Sugarhugs:


Simple.

Learn SQL injection

Sql injection ke?
Re: How Do I Extract Data From Online Websites? by Laolballs: 8:13pm On May 17, 2016
What you wanted to do is easy, just get the rss field of the sites , then use the google rss to json api. Digest the json in your app with angularjs then save to localStorage .. Every time you go online, the app would check for new update and cache insode localStorage else it just load result from web storage
Re: How Do I Extract Data From Online Websites? by Urine: 10:07pm On May 17, 2016
Sugarhugs:

Simple.
Learn SQL injection
Lmao! This is what happens when ignorance is high on confidence.

4 Likes

Re: How Do I Extract Data From Online Websites? by Urine: 10:09pm On May 17, 2016
.
Re: How Do I Extract Data From Online Websites? by parkicism: 10:21pm On May 17, 2016
Kodejuice:
If the websites use RSS, you could just grab the url and use an XMLParser to conver it to object and arrays, XMLParser are found in all web prog-languages, or a plugin for parsing XML will certainly exist for the language u wish to use.

Or you could go hardcore, i actually built a Nairaland App for android using phonegap(screenshots included below), my popsi seized my Laptop untill i write my neco finish, after then i could push the app to Android app stores and probably build the ios version.

You know nairaland provides no API for getting it posts,it has RSS but it only includes 12 post from the FP (very annoying), so i used JavaScript to parse the pages fire for fire,

first of all i got the source code of each sections in nairaland, and i inspected the HTML, got the elements that holds the posts, comments, topics and stuffs like that, i then used jQuery to get the contents of these elements and stored them in an array, i did this for all sections apart from the FP, that one is quite different, different element for its posts and stuffs like that.

Acts much like an API but its just a hack, not saying u should follow this route, because its a very cumbersome task and a dangerous one any slight change in nairalands html may cause the app to fail, the app allows you to do everything, i couldnt include things like (login) so i added an option 'Nairaland web', gives u the nairaland webview inside the app, allowing you to do everything!, will be releasing the app after my Neco exam.

If you need, i could give you the script i wrote for parsing NL pages, it doesnt parse the (search, edit profile, new) and other irrelevant pages.

The screenshot i included is an old one, updated the app with more options b4 my Lapi got seized, options like (Trending, Recent, ...)

So, best of luck with the hacks!.

Thanks a lot for your help. A lot of the websites do use RSS, so I'll just extract XML content from those websites. Could you please send the script you used, I would really love to check it out and do let me know when you release the app.
Re: How Do I Extract Data From Online Websites? by Darangi007: 12:09am On May 18, 2016
have u heard of kimono labs?
Re: How Do I Extract Data From Online Websites? by Nobody: 6:56am On May 18, 2016
Urine:


Lmao! This is what happens when ignorance is high on confidence.

Keep laughing OK.
Re: How Do I Extract Data From Online Websites? by Kodejuice: 11:25am On May 18, 2016
parkicism:


Thanks a lot for your help. A lot of the websites do use RSS, so I'll just extract XML content from those websites. Could you please send the script you used, I would really love to check it out and do let me know when you release the app.

Download link (Download nl-pageparse.zip), includes tests to see how it works and a manual.
Re: How Do I Extract Data From Online Websites? by parkicism: 3:14pm On May 18, 2016
Darangi007:
have u heard of kimono labs?

Never heard of it.
I'll check it out right now.
Thanks!
Re: How Do I Extract Data From Online Websites? by parkicism: 3:15pm On May 18, 2016
Kodejuice:


Download link (Download nl-pageparse.zip), includes tests to see how it works and a manual.

Thanks so much!
Re: How Do I Extract Data From Online Websites? by DavidTheGeek: 11:03pm On May 18, 2016
Kodejuice:

So, best of luck with the hacks!.

Bro that blue "add icon" below.. Did you use FrameLayout or RelativeLayout(alignParentBottom & wrap_content)?
Re: How Do I Extract Data From Online Websites? by DavidTheGeek: 11:06pm On May 18, 2016
Sugarhugs:
Keep laughing OK.
SQL injection tho grin
Re: How Do I Extract Data From Online Websites? by Nobody: 11:07pm On May 18, 2016
DavidTheGeek:


SQL injection tho grin

View my new topic and help out if you can cheesy
Re: How Do I Extract Data From Online Websites? by DavidTheGeek: 11:11pm On May 18, 2016
Sugarhugs:

View my new topic and help out if you can cheesy
Alright
*Mod: Can't help sorry
Re: How Do I Extract Data From Online Websites? by Djade007: 11:27pm On May 18, 2016
Check this programming competition we did www.nairaland.com/2959841/programming-competition-search-engine-task just look through any of the source codes and you will get your answer, most of the projects implemented crawling nairaland
Re: How Do I Extract Data From Online Websites? by teampregar(m): 5:20pm On May 19, 2016
Sugarhugs:

Simple.
Learn SQL injection
This is what happens when instagram babes visits nairaland programming section
Re: How Do I Extract Data From Online Websites? by teampregar(m): 5:21pm On May 19, 2016
Sugarhugs:

Simple.
Learn SQL injection
This is what happens when selfie babes visits nairaland programming section
Re: How Do I Extract Data From Online Websites? by overtlyderanged: 10:29pm On May 19, 2016
Darangi007:
have u heard of kimono labs?

Mehn...you know those guys got acquired this year? They won't be supporting all of that scraping again so they gave me a notice period after which they'd clear out my stuff...nice app but they were probably playing jump rope with copyright infringement law though.
Re: How Do I Extract Data From Online Websites? by overtlyderanged: 10:33pm On May 19, 2016
parkicism:
I want to be able to extract data from online news sites like punchng, naij, and nairaland for later offline viewing and just wanted to know if you guys know any APIs or tools that can help me extracting data with ease.
Thanks

Baba you sure say na for offline viewing and not just you trying to build a content curation site/app?
Re: How Do I Extract Data From Online Websites? by monk1: 1:29pm On Nov 16, 2016
You needa study ''web scraping''....

Google that...

You could use jsoup for web scraping. Very easy to implement using Java.

Check this out [url="https://www.evertechie.com/jsoup-parsing-html-using-jsoup"]Scraping HTML using jsoup[/url]
Re: How Do I Extract Data From Online Websites? by Sibrah: 7:41pm On Nov 16, 2016
Google a software called HTTRACK. I used it to grab significant portion of w3school.com and view offline. Also Free Download Manager (FDM).

(1) (Reply)

Friendzone / Face Recognition / Certification Or Skill

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 36
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.