123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Service >> View Article

How To Extract Different Prices From An E-commerce Website? - Ecommerce Website Data Scraping Services

Profile Picture
By Author: owen wilsonn
Total Articles: 24
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

Let’s take a quick look at some product pages as well as identify some design patterns about how the product prices get displayed on different sites.

Sephora.com

Amazon.com

Patterns and Observations
Certain patterns, which we recognized by searching at the product pages include:

Price generally comes above further currency figures
Price is a currency figure having the biggest font sizes
Prices look like currency figures (never like words)
Prices comes within initial 600 pixels of height
Certainly, there might be exemptions to these comments, we’ll chat how to cope with these exemptions later in the blog. We can use all the observations to make an effective and general scraper.

Execution of General E-commerce Scrapers
1st Step: Installation
Here, the tutorial utilizes Google Chrome as a web browser. In case, you are not using it, you can just install it and follow the instruction.

Rather than Google Chrome, the developers use programmable versions of the Google Chrome named Puppeteer. It will eliminate the requirement of running GUI apps to ...
... run a scraper. Though, it is outside the range of the tutorial.

2nd Step: Chrome Developer Tool
Different codes presented here are designed in as easy as possible manner so it can’t fetch the prices from all product pages available there.

For the meantime, we’ll visit any Sephora or Amazon product pages in the Google Chrome browser.

Visit that product pages in the Google Chrome
Then right-click anyplace on a page to choose ‘Inspect’ option and open Chrome DevTools
Then click on a DevTools’ Console tab
Within a Console tab, enter some JavaScript codes and browser will accomplish the codes in context of a web page, which have been loaded. Also, you can study more about the DevTools through the official documentation.

3rd Step: Running a Javascript snippet
You need to copy this JavaScript snippet given below and paste that in a console.

let elements = [
...document.querySelectorAll(' body *')
]

function createRecordFromElement(element) {

const text = element.textContent.trim()

var record = {}

const bBox = element.getBoundingClientRect()

if(text.length 600 ||

record['fontSize'] == undefined || !record['text'].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/) )

return false

else return true

}

-

let possiblePriceRecords = records.filter(canBePrice)

let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {

if (a['fontSize'] == b['fontSize']) return a['y'] > b['y']

return a['fontSize'] < b['fontSize']

})

console.log(priceRecordsSortedByFontSize[0]['text']);

Press the ‘Enter’ key and you will see the product price displayed on a console.

If you don’t do that, you have perhaps visited the product page that is an exemption to our explanations. It is completely common, we’ll chat how we can increase our script for covering more product pages about these types. You can try any sample pages given in the step 2.

This animated GIF given below indicates how we extract the prices from Amazon.com

How Does It Work?
First, we need to draw all the HTML DOM elements in a page

let elements = [
...document.querySelectorAll(' body *')
]

We have to convert all these elements into easy JavaScript objects that stores the XY position value, font size and text content that looks anything like {'text':'Tennis Ball', 'fontSize':'14px', 'x':100,'y':200}. Therefore, we need to write some functions for that like given below:

function createRecordFromElement(element) {

const text = element.textContent.trim() // Brings content of an element

var record = {} // Starts an easy JavaScript object

const bBox = element.getBoundingClientRect()

// getBoundingClientRect is the function given by Google Chrome, this returns

// an object that comprises x,y values, width and height

if(text.length 600 ||

record['fontSize'] == undefined || !record['text'].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/) )

return false

else return true

}

We use Regular Expression option for checking if the provided text is the currency figures or not. Also, you may modify that regular expression if it doesn’t include any pages, which you’re testing with.

Currently, we may filter only the records, which are perhaps pricing records

let possiblePriceRecords = records.filter(canBePrice)
To conclude, as we’ve witnessed, prices come as a currency figure getting the maximum font size. In case, there are several currency figures having equally higher font sizes, then price perhaps corresponds to one residing with the higher positions. We sort out all our records depending on the conditions, through JavaScript’s sort functions.

let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {

if (a['fontSize'] == b['fontSize']) return a['y'] > b['y']

return a['fontSize'] < b['fontSize']

})

Currently, we just have to show that on a console

console.log(priceRecordsSortedByFontSize[0]['text'])

Take that Further
Affecting to the GUI-less-dependent Scalable Programs
You may replace the Google Chrome having the headless variety of that named Puppeteer. It is perhaps the quickest option for web rendering. This works completely depending on the similar ecosystem given in the Google Chrome. When the Puppeteer is all set, you can programmatically insert our script into a headless browser as well as have the pricing returned to the function in a program.

Improve and Enhance the Scripts
You will immediately notice that a few product pages won’t work with a script as they don’t trail the expectations we have fulfilled about how product prices are displayed as well as the patterns that we have recognized.

Unfortunately, there are no “holy grails” or perfect solutions for that problem. This is quite possible to produce more pages and recognize more patterns as well as improve this scraper.

Another important step, which you would utilize to deal with other pages include employing Artificial Intelligence or Machine Learning dependent methods to recognize and categorize patterns as well as automate the procedure to a bigger amount. This sector is a growing field we at X-Byte are using these methods already with variable degrees of attainment.

If you want any help in Amazon price scraping, you can investigate our tutorial specially intended for Amazon:

We Can Assist With Data and Automation Requirements
Convert the Internet to structured, meaningful, and practical data

Your Name

Please enter data sources, details, requests - everything relevant

You SHOULD NOT contact X-Byte for all help with the Tutorials as well as Codes using a form or through calling us, in its place please add the comments to the end of this tutorial page to get help.

Disclaimer
Any codes given in the tutorials are for learning objectives and illustration. We aren’t accountable for how this is used as well as undertake no liabilities for any harmful usage of source codes. The mere occurrence of these codes on our website does not indicate that we inspire scraping or scraping the sites referenced in a code as well as supplementary tutorial. This tutorial only helps in illustrating the method of programming the web scraper for general internet sites. We aren’t thankful to offer any help for a code, though, in case you are adding your questions within the comment section, we might occasionally address them.

Total Views: 226Word Count: 1188See All articles From Author

Add Comment

Service Articles

1. Mosquito Nets For Windows And Doors In Hyderabad – A Smart Solution For A Pest-free Home
Author: modernscreenshyd

2. Mosquito Screen Services In Hyderabad – Keep Your Home Pest-free
Author: modernscreenshyd

3. Premier Outdoor Led Advertising Display Boards In Hyderabad
Author: ledsignsboard

4. Top Signage Board Manufacturers In Hyderabad
Author: ledsignsboard

5. Custom Cabinet & Joinery Design Melbourne Is Going To Mesmerize You!
Author: William Harvey

6. Essential Steps To Extract Blinkit Product Data From All Dark Stores
Author: Devil Brown

7. Best Astrologer In Latur
Author: Vasudev21

8. The Role Of An Artist Management Agency- Elevating Talent To Stardom
Author: Teflas

9. Global Publishings: Turning Literary Dreams Into Published Reality
Author: John Francis

10. How Hiring A Licensed And Insured Locksmith Protects You In Colorado
Author: Locksmiths Of Colorado Springs

11. Top Techniques For Driveway Cleaning In Tonbridge: A Homeowner's Guide
Author: Aqua Blasters Limited

12. Black Magic Astrologer In Amravati
Author: Vasudev21

13. Manatelugu Foundation: Leading Education And Healthcare Initiatives For A Better Hyderabad
Author: manatelugufoundation

14. Un Lavage De Tapis Pas Cher Sans Compromis Sur La Qualité
Author: Lavage tapis artisanal

15. Web Scraping Food Data From Doordash, Uber Eats, Grubhub And Instacart
Author: Devil Brown

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: