123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Others >> View Article

How To Use Web Scraping With Selenium And Beautifulsoup For Dynamic Pages?

Profile Picture
By Author: 3i Data Scraping
Total Articles: 46
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

Web scraping could be well-defined as:

“The creation of an agent for downloading, parsing, as well as organizing data from the web in an automated manner.”

In other words: rather than a human end-user clicking away in the web browser as well as copy-paste interesting parts like a spreadsheet, web data scraping offloads the job to any computer program that can implement it much quicker, and more properly, than any human can.

Web scraping is very important in the data science arena.

Why is Python an appropriate Language to Get Used for Web Scraping?
This has the most extravagant and helpful ecosystem when comes to doing web scraping. While several languages have the libraries to assist in using web scraping, Python’s libraries come with the most advanced features and tools.

A few Python libraries used for web scraping include:

BeautifulSoup
LXML
Requests
Scrapy
Selenium
In this blog, we will use Selenium and BeautifulSoup to extract review pages from Trip Advisor.

Why Use Selenium Also? Isn’t BeautifulSoup Sufficient Alone?
Web scraping ...
... using Python often needs not more than the usage of BeautifulSoup to fulfill the objective. BeautifulSoup is an extremely powerful library, which makes data scraping by navigating the DOM (Document Object Model) easier to apply. However, it does static scraping only. Static scraping disregards JavaScript. This draws web pages from a server without any help from the browser. You have exactly what you get in the “view page source”, as well as you slice & dice it then. If any data you are searching for is accessible in “view page source”, you don’t have to go much further. However, if you require data, which is available in components that get rendered by clicking the JavaScript links, what comes to the rescue is dynamic scraping. The combination of Selenium and BeautifulSoup will complete the dynamic scraping job. Selenium powers web browser collaboration from Python. Therefore, the data extracted by JavaScript links could be made accessible by automating button clicks using Selenium as well as could be scraped by BeautifulSoup.

Installation

pip install bs4 selenium
Selenium Used for JavaScript Links Buttons
Initially, we will utilize Selenium for automating button clicks needed to render hidden and useful data. For review pages of Trip Advisor, longer reviews are somewhat accessible in the last DOM. They become completely accessible by clicking the “More” button. Therefore, we would automate the clicking of different “More” buttons using Selenium.

screenshot
Selenium Needs to Use the Browser’s Driver
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options)
Here, Selenium uses a Chrome browser driver into incognito mode as well as without opening the browser window (it looks headless argument).

Open TripAdvisor Review Page and Click the Applicable Buttons
import time
driver.get("https://www.tripadvisor.com/Airline_Review-d8729157-Reviews-Spirit-Airlines#REVIEWS")
more_buttons = driver.find_elements_by_class_name("moreLink")
for x in range(len(more_buttons)):
if more_buttons[x].is_displayed():
driver.execute_script("arguments[0].click();", more_buttons[x])
time.sleep(1)
page_source = driver.page_source
Here, a Selenium web driver navigates through the DOM of the TripAdvisor reviews page as well as gets “More” buttons. After that, it repeats through different “More” buttons as well as automates the clicking. On automated clicking of the “More” buttons, reviews that were moderately accessible before become completely accessible.

After that, Selenium gives a manipulated page resource to BeautifulSoup.

BeautifulSoup to Extract Data
The received page source from Selenium contains complete reviews.

from bs4 import BeautifulSoup

soup = BeautifulSoup(page_source, 'lxml')
reviews = []
reviews_selector = soup.find_all('div', class_='reviewSelector')
for review_selector in reviews_selector:
review_div = review_selector.find('div', class_='dyn_full_review')
if review_div is None:
review_div = review_selector.find('div', class_='basic_review')
review = review_div.find('div', class_='entry').find('p').get_text()
review = review.strip()
reviews.append(review)
At this time, BeautifulSoup loads a page source. This scrapes reviews texts by iterating through different review divs. This logic in the given code is for the review pages of TripAdvisor. This can differ as per the HTML structure of a page. For coming use, you could write the scraped reviews to the file.

Practical
We have scraped a TripAdvisor page review, scraped the reviews as well as wrote them into the file.

Here are the reviews that we have scraped from one of TripAdvisor's pages.

JOKE of an airline. You act like you have such low fares, then turn around and charge people for EVERYTHING you could possibly think of. $65 for carry on, a joke. No seating assignments without an upcharge for newlyweds, a joke. Charge a veteran for a carry on, a f***ing joke. Personally, I will never fly spirit again, and I’ll gladly tell everyone I know the kind of company this airline is. No room, no amenities, nothing. A bunch of penny pinchers, who could give two sh**s about the customers. Take my flight miles and shove them, I won’t be using them with this pathetic a** airline again.
My first travel experience with NK. Checked in on the mobile app and printed the boarding pass at the airport kiosk. My fare was $30.29 for a confirmed ticket. I declined all the extras as I would when renting a car. No, no, no and no. My small backpack passed the free item test as a personal item. I was a bit thirsty so I purchased a cold bottle of water in flight for $3.00 but I brought my own snacks. The plane pushed off the gate in Las Vegas on time and arrived in Dallas early. Overall an excellent flight.
Original flight was at 3:53pm and now the most recent time in 9:28pm. Have waisted an entire day on the airport. Worst airline. I have had the same thing happen in the past were it feels like the are trying to combine two flights to make more money. If I would have know it would have taken this long I would have booked a different airline without a doubt.
Made a bad weather flight great. Bumpy weather but they got the beverage and snack service done in style
Flew Spirit January 23rd and January 26th (flights 1672 from MCO to CMH and 1673 CMH to MCO). IF you plan accordingly you will have a good flight. We made sure our bag was correct, and checked in online. I do think the fees are ridiculous and aren't needed. $10 to check in at the terminal? Really.. That's dumb in my opinion. Frontier does not do that, and they are a no frill airline (pay for extras). I will say the crew members were very nice, and there was decent leg room. We had the Airbus A320. Not sure if I'd fly again because I prefer Frontier Airlines, but Spirit wasn't bad for a quick flight. If you get the right price on it, I would recommend it... just prepare accordingly, and get your bags early. Print your boarding pass at home!
worst flight i have ever been on. the rear cabin flight attendents were the worst i have sever seen. rude, no help. the seats are the most cramped i have every seen. i looked up the seat pitch is the smallest in the airline industry. 28" delta and most other arilines are 32" plus. maybe ok for a short hop but not for a 3 or 4 hour flight no free water or anything. a manwas trying to get settle in with his kids and asked the male flight attendent for some help with luggage in the overhead andthe male flight attendent just said put your bags in the bin and offered no assitance. my son got up and help the manget the kidscarryons put away
I was told incorrect information by the flight counter representative which costed me over $450 i did not have. I spoke with numerous customer service reps who were all very rude and unhelpful. It is not fair for the customer to have to pay the price for being told incorrect information.
We got a great price on this flight. Unfortunately, we were going on a cruise and had to take our luggage. By the time we added our luggage and seats the price more than doubled.
Fun crew. Very friendly and happy--from the tag your bag kiosk to the ticket desk to the flight crew--everyone was exceptionally happy to help and friendly. We find this to be true of the many Spirit flights we've taken.
Not impressed with the Spirit check-in staff at either airport. Very rude and just not inviting. The seats were very comfortable and roomy on my first flight in the exit row. On the way back there was a very little cushion and narrow seats. The flight attendants and pilots were respectful, direct, and welcoming. Overall would fly Spirit again, but please improve airport staff at check-in.
Conclusion
BeautifulSoup is an extremely powerful tool to do web scraping. However, when JavaScript starts working and hides the content, BeautifulSoup and Selenium do the job of data scraping. Selenium could also get used to navigating the next page. You could also utilize Scrapy or other web scraping tools rather than BeautifulSoup to do web scraping. And lastly after collecting data, you could feed data for the data scientist’s work.

If you have any queries, you can contact 3i Data Scraping and if you want any web scraping services, ask for a free quote!

More About the Author

3i Data Scraping is an Experienced Web Scraping Services Company in the USA. We are Providing a Complete Range of Web Scraping, Mobile App Scraping, Data Extraction, Data Mining, and Real-Time Data Scraping (API) Services. We have 11+ Years of Experience in Providing Website Data Scraping Solutions to Hundreds of Customers Worldwide.

Total Views: 168Word Count: 1494See All articles From Author

Add Comment

Others Articles

1. Essential Applications Of Hplc In The Pharmaceutical Industry
Author: Peter Lee

2. Boat Launches Airdopes Progear: Next-gen Open-ear Earbuds
Author: Digital Terminal

3. Tips To Find Out The Best Remington 870 Heat Shield Online
Author: Slade Street Tactical

4. The Future Of Data Analytics: Trends To Watch In 2025
Author: Ben Gross

5. International Conference On Materials Science And Nanotechnology
Author: Noveltics Group LLC

6. Emerging Technologies That Will Shape Your Business In 2025
Author: Orson Amiri

7. The Ultimate Guide To A Stress-free Move With Packers And Movers In Ranchi
Author: Shree Ashirwad Packers and Movers

8. Emerging Search Engine Marketing Trends To Watch For 2025
Author: Orson Amiri

9. How To Choose The Right Storage Unit In Liverpool: A Comprehensive Guide
Author: Big Padlock

10. Best Astrologer In Vizianagaram
Author: Astrology56

11. Liquid Injection Molding Led Silicone Lens For High Power Tunnel, Seaport Industrial Lighting
Author: yejiasilicone

12. Best Astrologer In Sudhama Nagar
Author: Famousbanglore

13. Pier Seven’s Guide To Building A Career In Aviation
Author: pierseven

14. 5 Innovative Uses Of Pipes And Tubes You Never Knew About
Author: sagar steel

15. Automation Trends Shaping The Future Of Business In 2025
Author: Orson Amiri

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: