ALL >> Service >> View Article
How To Extract Different Prices From An E-commerce Website? - Ecommerce Website Data Scraping Services
Let’s take a quick look at some product pages as well as identify some design patterns about how the product prices get displayed on different sites.
Sephora.com
Amazon.com
Patterns and Observations
Certain patterns, which we recognized by searching at the product pages include:
Price generally comes above further currency figures
Price is a currency figure having the biggest font sizes
Prices look like currency figures (never like words)
Prices comes within initial 600 pixels of height
Certainly, there might be exemptions to these comments, we’ll chat how to cope with these exemptions later in the blog. We can use all the observations to make an effective and general scraper.
Execution of General E-commerce Scrapers
1st Step: Installation
Here, the tutorial utilizes Google Chrome as a web browser. In case, you are not using it, you can just install it and follow the instruction.
Rather than Google Chrome, the developers use programmable versions of the Google Chrome named Puppeteer. It will eliminate the requirement of running GUI apps to ...
... run a scraper. Though, it is outside the range of the tutorial.
2nd Step: Chrome Developer Tool
Different codes presented here are designed in as easy as possible manner so it can’t fetch the prices from all product pages available there.
For the meantime, we’ll visit any Sephora or Amazon product pages in the Google Chrome browser.
Visit that product pages in the Google Chrome
Then right-click anyplace on a page to choose ‘Inspect’ option and open Chrome DevTools
Then click on a DevTools’ Console tab
Within a Console tab, enter some JavaScript codes and browser will accomplish the codes in context of a web page, which have been loaded. Also, you can study more about the DevTools through the official documentation.
3rd Step: Running a Javascript snippet
You need to copy this JavaScript snippet given below and paste that in a console.
let elements = [
...document.querySelectorAll(' body *')
]
function createRecordFromElement(element) {
const text = element.textContent.trim()
var record = {}
const bBox = element.getBoundingClientRect()
if(text.length 600 ||
record['fontSize'] == undefined || !record['text'].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/) )
return false
else return true
}
-
let possiblePriceRecords = records.filter(canBePrice)
let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {
if (a['fontSize'] == b['fontSize']) return a['y'] > b['y']
return a['fontSize'] < b['fontSize']
})
console.log(priceRecordsSortedByFontSize[0]['text']);
Press the ‘Enter’ key and you will see the product price displayed on a console.
If you don’t do that, you have perhaps visited the product page that is an exemption to our explanations. It is completely common, we’ll chat how we can increase our script for covering more product pages about these types. You can try any sample pages given in the step 2.
This animated GIF given below indicates how we extract the prices from Amazon.com
How Does It Work?
First, we need to draw all the HTML DOM elements in a page
let elements = [
...document.querySelectorAll(' body *')
]
We have to convert all these elements into easy JavaScript objects that stores the XY position value, font size and text content that looks anything like {'text':'Tennis Ball', 'fontSize':'14px', 'x':100,'y':200}. Therefore, we need to write some functions for that like given below:
function createRecordFromElement(element) {
const text = element.textContent.trim() // Brings content of an element
var record = {} // Starts an easy JavaScript object
const bBox = element.getBoundingClientRect()
// getBoundingClientRect is the function given by Google Chrome, this returns
// an object that comprises x,y values, width and height
if(text.length 600 ||
record['fontSize'] == undefined || !record['text'].match(/(^(US ){0,1}(rs\.|Rs\.|RS\.|\$|₹|INR|USD|CAD|C\$){0,1}(\s){0,1}[\d,]+(\.\d+){0,1}(\s){0,1}(AED){0,1}$)/) )
return false
else return true
}
We use Regular Expression option for checking if the provided text is the currency figures or not. Also, you may modify that regular expression if it doesn’t include any pages, which you’re testing with.
Currently, we may filter only the records, which are perhaps pricing records
let possiblePriceRecords = records.filter(canBePrice)
To conclude, as we’ve witnessed, prices come as a currency figure getting the maximum font size. In case, there are several currency figures having equally higher font sizes, then price perhaps corresponds to one residing with the higher positions. We sort out all our records depending on the conditions, through JavaScript’s sort functions.
let priceRecordsSortedByFontSize = possiblePriceRecords.sort(function(a, b) {
if (a['fontSize'] == b['fontSize']) return a['y'] > b['y']
return a['fontSize'] < b['fontSize']
})
Currently, we just have to show that on a console
console.log(priceRecordsSortedByFontSize[0]['text'])
Take that Further
Affecting to the GUI-less-dependent Scalable Programs
You may replace the Google Chrome having the headless variety of that named Puppeteer. It is perhaps the quickest option for web rendering. This works completely depending on the similar ecosystem given in the Google Chrome. When the Puppeteer is all set, you can programmatically insert our script into a headless browser as well as have the pricing returned to the function in a program.
Improve and Enhance the Scripts
You will immediately notice that a few product pages won’t work with a script as they don’t trail the expectations we have fulfilled about how product prices are displayed as well as the patterns that we have recognized.
Unfortunately, there are no “holy grails” or perfect solutions for that problem. This is quite possible to produce more pages and recognize more patterns as well as improve this scraper.
Another important step, which you would utilize to deal with other pages include employing Artificial Intelligence or Machine Learning dependent methods to recognize and categorize patterns as well as automate the procedure to a bigger amount. This sector is a growing field we at X-Byte are using these methods already with variable degrees of attainment.
If you want any help in Amazon price scraping, you can investigate our tutorial specially intended for Amazon:
We Can Assist With Data and Automation Requirements
Convert the Internet to structured, meaningful, and practical data
Your Name
Please enter data sources, details, requests - everything relevant
You SHOULD NOT contact X-Byte for all help with the Tutorials as well as Codes using a form or through calling us, in its place please add the comments to the end of this tutorial page to get help.
Disclaimer
Any codes given in the tutorials are for learning objectives and illustration. We aren’t accountable for how this is used as well as undertake no liabilities for any harmful usage of source codes. The mere occurrence of these codes on our website does not indicate that we inspire scraping or scraping the sites referenced in a code as well as supplementary tutorial. This tutorial only helps in illustrating the method of programming the web scraper for general internet sites. We aren’t thankful to offer any help for a code, though, in case you are adding your questions within the comment section, we might occasionally address them.
Add Comment
Service Articles
1. Flight Booking EngineAuthor: arpithashetty
2. Filing A Quash Petition & Process Guide
Author: Sangare and associates
3. What Is The Difference Between Regular Bail And Anticipatory Bail Lawyers?
Author: Sangare and associates
4. Opentable Reviews Data Scraping
Author: DataZivot
5. How Astrology Can Help You Deal With Bad Dreams
Author: Bestpsychichealers
6. Food Delivery Service Review Scraping Boosted Food Delivery Sales
Author: DataZivot
7. What Is The Best Smartphone Right Now?
Author: mobilehospitalindia
8. La Sal Cleaners: Offering Top-quality Dry Cleaning Services In Fort Lauderdale And Beyond
Author: La Sal Cleaners
9. Power Pole Mounted Transformers: Enhancing Efficiency And Reliability By Tristar Electrical
Author: Tristar Electrical
10. Smart Magic Production – A Full-service Advertising Agency In Mumbai
Author: Smart Magic Production
11. Quiet Garage Door Openers: Benefits Of Belt-driven Systems
Author: Ronen shifman
12. Reliable Locksmith Services In Denver For Homes, Businesses, And Vehicles
Author: Colorado Dependable Locksmith
13. Ramapoldi Dining: An Exceptional Culinary Experience
Author: rampoldi
14. Helical It Solutions Unveils Helical Insight 5.2.2
Author: Vhelical
15. Geofencing Advertising For Local Legal Clients
Author: jamewilliamss