Welcome to 123ArticleOnline.com!
ALL >> Service >> View Article

How To Scrape Amazon Stores For Generating Price Alerts?

By Author: 3i Data Scraping
Total Articles: 46
Comment this article

Initially, you will need a file named Tracker_PRODUCTS.csv with the links for the products you wish to check. After executing run, the scraper will save the results in a different file known as “search_history_[date].xlsx. These are the files placed inside search_history folder.

screenshot
For completing this task, we will require BeautifulSoup as our web scraping tool. If you need to install any of them, simple script that includes simple pip/conda install will do. There are various sources which will help, but usually Python package Index will have it.

Code
import requests
from glob import glob
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
from time import sleep
# http://www.networkinghowtos.com/howto/common-user-agent-list/
HEADERS = ({'User-Agent':
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'})
# imports a csv file with the url's to scrape
prod_tracker = pd.read_csv('trackers/TRACKER_PRODUCTS.csv', ...
... sep=';')
prod_tracker_URLS = prod_tracker.url
# fetch the url
page = requests.get(prod_tracker_URLS[0], headers=HEADERS)
# create the object that will contain all the info in the url
soup = BeautifulSoup(page.content, features="lxml")

The HEADERS variables need to pass along the get method. Once the file is passes, it might be a good time to get the CSV file known as TRACKER_PRODUCTS.csv from the repository and place it in the folder named “trackers.”

Then, we will execute this through BeautifulSoup. This will convert the HTML in to some more convenient file named soup.

If ever you will find soup.find, it will mean that we are in search of an element of the page that uses its HTML tag(such as div, or span, etc.) With soup.select we will use CSS selectors.

# product title
title = soup.find(id='productTitle').get_text().strip()
# to prevent script from crashing when there isn't a price for the product
try:
price = float(soup.find(id='priceblock_ourprice').get_text().replace('.', '').replace('€', '').replace(',', '.').strip())
except:
price = ''
# review score
review_score = float(soup.select('.a-star-4-5')[0].get_text().split(' ')[0].replace(",", "."))
# how many reviews
review_count = int(soup.select('#acrCustomerReviewText')[0].get_text().split(' ')[0].replace(".", ""))
# checking if there is "Out of stock" and if not, it means the product is available
try:
soup.select('#availability .a-color-state')[0].get_text().strip()
stock = 'Out of Stock'
except:
stock = 'Available'
(De)constructing the soup

screenshot
There is also script for getting prices in USD is added

screenshot
screenshot
screenshot
Once the testing is completed, the proper script is to be written which will:

Fetch the URLs from a csv file.
Will use a while loop to scrape every product and save the information.
Save all the results that will include previous searches in an excel file.
You will also need a scraper, and we will call it Amazon_scraper.py.

import requests
from glob import glob
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
from time import sleep

# http://www.networkinghowtos.com/howto/common-user-agent-list/
HEADERS = ({'User-Agent':
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'})

def search_product_list(interval_count = 1, interval_hours = 6):
"""
This function lods a csv file named TRACKER_PRODUCTS.csv, with headers: [url, code, buy_below]
It looks for the file under in ./trackers

It also requires a file called SEARCH_HISTORY.xslx under the folder ./search_history to start saving the results.
An empty file can be used on the first time using the script.

Both the old and the new results are then saved in a new file named SEARCH_HISTORY_{datetime}.xlsx
This is the file the script will use to get the history next time it runs.

Parameters
----------
interval_count : TYPE, optional
DESCRIPTION. The default is 1. The number of iterations you want the script to run a search on the full list.
interval_hours : TYPE, optional
DESCRIPTION. The default is 6.

Returns
-------
New .xlsx file with previous search history and results from current search

"""
prod_tracker = pd.read_csv('trackers/TRACKER_PRODUCTS.csv', sep=';')
prod_tracker_URLS = prod_tracker.url
tracker_log = pd.DataFrame()
now = datetime.now().strftime('%Y-%m-%d %Hh%Mm')
interval = 0 # counter reset

while interval < interval_count:

for x, url in enumerate(prod_tracker_URLS):
page = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(page.content, features="lxml")

#product title
title = soup.find(id='productTitle').get_text().strip()

# to prevent script from crashing when there isn't a price for the product
try:
price = float(soup.find(id='priceblock_ourprice').get_text().replace('.', '').replace('€', '').replace(',', '.').strip())
except:
# this part gets the price in dollars from amazon.com store
try:
price = float(soup.find(id='priceblock_saleprice').get_text().replace('$', '').replace(',', '').strip())
except:
price = ''

try:
review_score = float(soup.select('i[class*="a-icon a-icon-star a-star-"]')[0].get_text().split(' ')[0].replace(",", "."))
review_count = int(soup.select('#acrCustomerReviewText')[0].get_text().split(' ')[0].replace(".", ""))
except:
# sometimes review_score is in a different position... had to add this alternative with another try statement
try:
review_score = float(soup.select('i[class*="a-icon a-icon-star a-star-"]')[1].get_text().split(' ')[0].replace(",", "."))
review_count = int(soup.select('#acrCustomerReviewText')[0].get_text().split(' ')[0].replace(".", ""))
except:
review_score = ''
review_count = ''

# checking if there is "Out of stock"
try:
soup.select('#availability .a-color-state')[0].get_text().strip()
stock = 'Out of Stock'
except:
# checking if there is "Out of stock" on a second possible position
try:
soup.select('#availability .a-color-price')[0].get_text().strip()
stock = 'Out of Stock'
except:
# if there is any error in the previous try statements, it means the product is available
stock = 'Available'

log = pd.DataFrame({'date': now.replace('h',':').replace('m',''),
'code': prod_tracker.code[x], # this code comes from the TRACKER_PRODUCTS file
'url': url,
'title': title,
'buy_below': prod_tracker.buy_below[x], # this price comes from the TRACKER_PRODUCTS file
'price': price,
'stock': stock,
'review_score': review_score,
'review_count': review_count}, index=[x])

try:
# This is where you can integrate an email alert!
if price < prod_tracker.buy_below[x]:
print('************************ ALERT! Buy the '+prod_tracker.code[x]+' ************************')

except:
# sometimes we don't get any price, so there will be an error in the if condition above
pass

tracker_log = tracker_log.append(log)
print('appended '+ prod_tracker.code[x] +'\n' + title + '\n\n')
sleep(5)

interval += 1# counter update

sleep(interval_hours*1*1)
print('end of interval '+ str(interval))

# after the run, checks last search history record, and appends this run results to it, saving a new file
last_search = glob('[REPLACE WITH YOUR OWN PATH -> C:/Amazon Webscraper/search_history/*.xlsx')[-1] # path to file in the folder
search_hist = pd.read_excel(last_search)
final_df = search_hist.append(tracker_log, sort=False)

final_df.to_excel('search_history/SEARCH_HISTORY_{}.xlsx'.format(now), index=False)
print('end of search')
Tracker Products
Observations on the file TRACKER_PRODUCTS.csv It's a simple file only with three columns ("link," "code," and "buy below"). That's where you'll enter the product URLs you'd like to track.

You may even put this file in a synced Dropbox folder (and then update the script with the updated file URLs) so that you really can update it from your phone at any time. If you execute the script on a server or on your personal laptop at home, it will pick up that new product link from the file in its next run.

Search History
The SEARCH HISTORY files are the same way. On the very first run, you must add an empty file to both the folder "search history" (which may be found in the repository). When establishing the last search variable on line 116 of the script above, we're looking for the most recent file in the search history folder. As a result, you must additionally put your own directory here. Simply change the text with the name of the folder where you'll be working on this project (in my case, "Amazon Scraper").

Price Alert
Line 97 of the script above contains a section that you may use to send an email with an alert if the price falls below your limit. The reason it's inside a try command is that we don't always get a real price from the product page, so the logical comparison would fail – for example, if the product was unavailable.

Setting up a Schedule Task for Running Script
Setting up an automated task for executing small scripts.

1. You will initiate it by opening “Task Scheduler”. Then select “Create task” and pick up the “triggers” tab.

screenshot
2. Next, you will need to move to the actions tab. Here, you will need to add an action and pick your Python folder location for “Program/script” box.

3. In the arguments box, you will need to type the name of our file with function

4. We will tell the system to start the command in the folder where our file Amazon_Scraper.py is.

screenshot
After running this script, task is ready to run.

For any queries related to scraping Amazon stores for generating price alerts, contact 3i Data Scraping.

More About the Author

3i Data Scraping is an Experienced Web Scraping Services Company in the USA. We are Providing a Complete Range of Web Scraping, Mobile App Scraping, Data Extraction, Data Mining, and Real-Time Data Scraping (API) Services. We have 11+ Years of Experience in Providing Website Data Scraping Solutions to Hundreds of Customers Worldwide.

Total Views: 465Word Count: 1961See All articles From Author

Add Comment

Service Articles

1. How Professional Tile & Grout Cleaning Extends Floor Life
Author: Steamly Pro LLC

2. Premium Metal Backlight Signage Boards & Ms Fabrication Welding Work In Hyderabad
Author: ledsignboard

3. Acp Cladding Signage Boards & Building Hoarding Signage In Hyderabad – Modern Branding Solutions For Businesses
Author: ledsignboard

4. 3d Acrylic Signage Board: The Perfect Choice For Modern Building Name Signage
Author: edsignboard

5. Building Construction Signage Boards & Building Elevation Signage Boards: Enhancing Safety,visibility,and Brand Identity
Author: ledneonsigncompany

6. Metal Backlight Signage Boards & Uv Digital Printing Signage: Elevating Brand Visibility With Modern Display Solutions
Author: ledneonsigncompany

7. 3d Acrylic Signage Board & Acp Sign Boards: The Perfect Branding Solution For Modern Businesses
Author: ledneonsigncompany

8. Professional Acp Sign Board Fabrication Hyderabad For Modern Business Branding
Author: allsignvc

9. Precision Laser Cutting Services In Hyderabad For Every Industry
Author: allsignvc

10. Quality Flex & Vinyl Printing Services In Hyderabad
Author: allsignvc

11. Garage Door Repair In Alexandria Va: Reliable Solutions For Safe And Smooth Operation
Author: ABC Garage Doors & Repair

12. Premium Wine And Liquor Labels Printing | 3p Labels
Author: 3p Label

13. How Custom Web Applications Enhance Customer Experience?
Author: Webpadi

14. Moving To A New City? A Complete Guide To Planning A Stress-free Relocation
Author: Rudra Packer Mover

15. Unlock Greater Energy Independence For Your Home
Author: 3P solar