123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Others >> View Article

How To Extract Web Data Using Node.js?

Profile Picture
By Author: 3i Data Scraping
Total Articles: 46
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

we’ll find out how to utilize Node.js as well as its packages for doing a quick and efficient data extraction for single-page applications. It will help us collect and use important data that isn’t always accessible using APIs. Let’s go through it.

Tip: Sharing and Reusing JS Modules using bit.dev

Utilize Bit for summarizing components or modules with all the setup and dependencies. Share them using Bit’s cloud, work together with the team as well as utilize them anywhere.

What is Web Data Extraction?
Web data extraction is a method used for scraping data from websites with a script. Data scraping is a way of automating the difficult task of copying data from different websites.

Generally, web Scraping is performed when the desired websites don’t render the API to fetch data. Some general data scraping scenarios include:

Extracting emails from different websites for the sales leads.
Extracting news headlines from different news websites.
Extracting product data from different e-commerce sites.
Why do we require web scraping while e-commerce sites expose APIs ...
... (Product Advertising APIs) to fetch or collect product data?

E-Commerce sites only uncover some of the product’s data for fetching through APIs so, web scraping is a more efficient way of collecting maximum product data.

Product comparison websites normally do data scraping. Even Google does scraping and crawling to index search results.

What Would We Want?
Starting with data scraping is easy as well as it is divided into two easy parts:

Extracting data by doing an HTTP request
Scraping important data through parsing HTML DOM
We would be utilizing Node.js for data scraping. We would also utilize two open-source npm modules:

Axios – It is a promise-based HTTP client for browser as well as node.js.
Cheerio —Cheerio makes that easy to choose, edit, as well as view DOM components.
You may learn more regarding comparing well-known HTTP request libraries.

Tip: Don’t duplicate the common code. Utilize tools like Bit for organizing, sharing, and discovering components for apps to create quicker. Just take one look.

what-will-we-need
Setup
The setup is very easy. We make a new folder as well as run the command within the folder to make a package.json file. Let’s make a recipe for making the food delicious.

npm init -y
Before starting cooking, let’s get ingredients for the recipe. Add Cheerio and Axios from npm like our dependencies.

npm install axios cheerio
Then, use them in the `index.js` file

const axios = require('axios');
const cheerio = require('cheerio');
Making a Request
After collecting all the ingredients, let’s begin our cooking. We are extracting data from a HackerNews site for which we have to make the HTTP request for getting website content. And that’s where Axios has a role to play.

make-the-request
Our answer will appear like this —


Hacker News


.
.
.

We are collecting related HTML content that we find while making the request from browsers like Chrome. Then, we want some help from Chrome Developer Tools for searching through the HTML of the webpage as well as choosing the necessary data.

We need to extract News headings as well as their related links. You could view the HTML of a webpage through right-clicking on a webpage as well as choosing “Inspect”.

html-screen-shot
Parse with HTML using Cheerio.js
Cheerio is a jQuery for Node.js, where we utilize selectors to choose tags of the HTML document. A selector syntax got borrowed from jQuery. With Chrome DevTools, we have to get selectors for different news headlines as well as their links. Let’s add a few spices to the food.

parsing-html
Initially, we have to load the HTML. The step in the jQuery is implied as jQuery works on one, supported-in DOM. Using Cheerio, we want to pass the HTML documents. After loading an HTML, we repeat all the table row incidences to scrape every news available on a page.

The result will appear like this:

[
{
title: 'Malaysia seeks $7.5B in reparations from Goldman Sachs (reuters.com)',
link: 'https://www.reuters.com/article/us-malaysia-politics-1mdb-goldman/malaysia-seeks-7-5-billion-in-reparations-from-goldman-sachs-ft-idUSKCN1OK0GU'
},
{
title: 'The World Through the Eyes of the US (pudding.cool)',
link: 'https://pudding.cool/2018/12/countries/'
},
.
.
.
]
As we have a whole array of JavaScript Objects having titles as well as links of news from a HackerNews site. Here, we can extract data from a different large number of websites. Therefore, our food gets prepared as well as looks wonderful too.

Conclusion
In this blog, we initially understood what web scraping is as well as how we can utilize it to automate different operations to collect data from different websites.

A lot of websites are utilizing Single Page Application (SPA) architecture for generating content dynamically for their websites with JavaScript. We would get responses from initial HTTP requests as well as can’t implement the JavaScript to render dynamic content with Axios as well as other parallel npm packages like requests. Therefore, we can extract data from static sites only.

For more information, contact 3i Data Scraping or ask for a free quote!

More About the Author

3i Data Scraping is an Experienced Web Scraping Services Company in the USA. We are Providing a Complete Range of Web Scraping, Mobile App Scraping, Data Extraction, Data Mining, and Real-Time Data Scraping (API) Services. We have 11+ Years of Experience in Providing Website Data Scraping Solutions to Hundreds of Customers Worldwide.

Total Views: 197Word Count: 817See All articles From Author

Add Comment

Others Articles

1. Discover The Finest Window Treatments With Parker Window Blinds
Author: Aman Singh

2. The Psychology Of Aging: How The Mind Changes Over Time
Author: ImPerfect

3. Pet Grooming
Author: Anurag Ranjan

4. Stylish And Functional Glass Solutions: Window Tints And Frosting In Auckland
Author: Tinting Experts

5. Framing A Canvas: A Guide To Choosing The Right Frame
Author: Framous Picture Framing

6. Nurturing Our Elders: Aashritha Charitable Trust's Old Age Home In Vijayawada
Author: Aashritha Charitable Trust, a not-for-profit organ

7. Turn Your Unwanted Car Into Cash With Auckland’s Car Removal Services
Author: Cars 4 Cash

8. Raise The Storage Solution Using Stainless Shelving In Auckland
Author: Kiwi Stainless

9. A Beginner’s Guide To Kado Bar Flavors: What You Should Know
Author: Kado bar

10. What Are Some Maintenance Tips For Maximising The Lifespan Of Your Combi Oven?
Author: Leading Catering

11. Aws Devops -palveluntarjoaja: Hyödynnä Pilvipalveluiden Täysi Potentiaali
Author: harju

12. Pilviturvallisuuden Maksimointi Ja Yhteensopivuus Aws Devops -palvelun Kanssa
Author: harju

13. Operatiiviset Analytiikkapalvelujen Voiman Valjastaminen Kilpailuetuksi
Author: harju

14. Operatiiviset Analytiikkaratkaisut: Tiedolla Johtamisen Kulmakivi
Author: harju

15. Liiketoimintaanalytiikkapalvelut: Tehokkuutta Ja Kannattavuutta Datan Avulla
Author: harju

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: