123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Business >> View Article

How To Extract Ebay Data For Original Comic Art Sales Information?

Profile Picture
By Author: 3i Data Scraping
Total Articles: 46
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

Data Fields to Be Scrapped
There is an example shown of the artwork drawn by hand in pencil by some artist and then another artist inks the drawing over them. Typically, 11 × 17-inch panels are used. The vitality of the drawing style, as well as the obvious skill, appeal to everyone.

Get two panels of original art for inside pages from Spiderman comics from the 1980s a few years ago, around 2010. You can pay perhaps $200 or $300 for them and made slightly more than twice that much when you sell them a year later.

Nonetheless, if you are interested in purchasing several pieces in the $200 level right now and wanted to get additional information before doing so.

Below written is the full code with the main output in two csv files.

The leading 800 listings of original comic art from Marvel comics in the form of internal pages, covers, or splash pages are ordered by price in the first csv file. The following fields are scraped from eBay in the csv:

the title (which usually includes a 20-word description of the item, character, type of page)
Price
Link to the item's full eBay ...
... sales page complete list of all figures in the artwork *just after first eBay search, the software cycles through the page numbers of new matches at the bottom. eBay flags the application as a bot and prevents it from scraping pages with numbers greater than four. This is fine because it only includes goods that are normally sold for less than $75, and nearly none of them are original comic art – they are largely copies or fan art.
The second file format is doing the same thing, but for things that have previously been sold, using the same search criteria. Because it requires Selenium.

If you execute Selenium more than two or three times in an hour, eBay will disable it and you will have to manually download the HTML of sold comic art.

Expected Result
You can check the result by executing the code once a day and looking through the csv file for mostly lesser-known characters of $100-$300 US dollar range currently for the sale.

expected-result
Tools that are used: Python, requests, BeautifulSoup, pandas
Here are the below steps that we will follow:

We will scrape the following product

https://ebay.to/3qaWDIw
Using the “original comic art” as the search string
only cover, interior pages, or splash pages
only comic art from Marvel or DC
comics above the price of $50
sorted by price + shipping and highest to lowest
200 results per page
We'll find a comprehensive of available original comic art based on your search parameters. We'll retrieve the title / brief explanation of the listing (as a single string), the page URL of the real listing, and the price for each listing.

We'll get the main comic book character's name in one field and the identities of all the characters in the image in a second field for each listing.

We'll make a CSV file using an eBay product data scraper in the following format: a title, a price, a link, a character, and a character with several characters.

Installing all the Packages for the Project

!pip install requests --upgrade --quiet
!pip install bs4 --upgrade --quiet
!pip install pandas --upgrade --quiet
!pip install datetime --upgrade --quiet
!pip install selenium --upgrade --quiet
!pip install selenium_stealth --upgrade --quiet
Initially use the time package so that you can keep the record of the program’s progress and slowly use the date and time in the csv file name

import time
from datetime import date
from datetime import datetime

now = datetime.now()
today = date.today()
today = today.strftime("%b-%d-%Y")
date_time = now.strftime("%H-%M-%S")
today = today + "-" + date_time
print("date and time:", today)
date and time: Jul-17-2021-15-14-55
Create a Function to Print the Data and Time
def update_datetime():
global now
global today
global date_time
now = datetime.now()
today = date.today()
today = today.strftime("%b-%d-%Y")
date_time = now.strftime("%H-%M-%S")
today = today + "-" + date_time
print("date and time:", today)
Next Scrape the search URL

To download the page, use the requests package.
Employ Beautiful Soup (BS4) to look for appropriate HTML tags, parse them.
Transform the artwork information to a Pandas dataframe.
import requests
from bs4 import BeautifulSoup

# original comic art, marvel or dc only, buy it now, over 50, interior splash or cover, sorted by price high to low
orig_comicart_marv_dc_50plus_200perpage = 'https://www.ebay.com/sch/i.html?_dcat=3984&_fsrp=1&Type=Cover%7CInterior%2520Page%7CSplash%2520Page&_from=R40&_nkw=original+comic+art&_sacat=0&Publisher=Marvel%2520Comics%7CDC%2520Comics%7CUltimate%2520Marvel%7CMarvel%2520Age%7CMarvel%2520Adventures%7CMarvel&LH_BIN=1&_udlo=50&_sop=16&_ipg=200'
orig_comicart_marv_dc_50plus_200perpage_sold = 'https://www.ebay.com/sch/i.html?_fsrp=1&_from=R40&_sacat=0&LH_Sold=1&_mPrRngCbx=1&_udlo=50&_udhi&LH_BIN=1&_samilow&_samihi&_sadis=15&_stpos=10002&_sop=16&_dmd=1&_ipg=200&_fosrp=1&Type=Cover%7CInterior%2520Page%7CSplash%2520Page&LH_Complete=1&_nkw=original%20comic%20art&_dcat=3984&rt=nc&Publisher=DC%2520Comics%7CMarvel%7CMarvel%2520Comics'

search_url = orig_comicart_marv_dc_50plus_200perpage

# there is a way to use headers in this function call to change the
# user agent so the site thinks the request is coming from
# different computers with different broswers but I could not get this working
# response = requests.get(url, headers=headers)

if (response.status_code != 200):
raise Exception('Failed to load page {}'.format(url))
page_contents = response.text
doc = BeautifulSoup(page_contents, 'html.parser')
Unless there is an error, the response function will return 200. If this is the case, display the error code; otherwise, continue. doc is a BeautifulSoup (BS4) object that makes searching for HTML tags and navigating the Document Object Model a breeze (DOM)

Now Save the HTML Files
# first use the date and time in the file name
filename = 'comic_art_marvel_dc-' + today + '.html'

with open(filename, 'w') as f:
f.write(page_contents)
We can use h3 tags with the class's-item title' to acquire the listing's title/description.

title_class = 's-item__title'

title_tags = doc.find_all('h3', {'class': title_class})
This locates all of the h3 tags in the BS4 documentation.

# make a list for all the titles
title_list = []
loop through the tags and obtain only the contents of each one

for i in range(len(title_tags)):
# make sure there are contents first
if (title_tags[i].contents):
title_contents = title_tags[i].contents[0]
title_list.append(title_contents)

len(title_list)
202

print(title_list[:5])
['WHAT IF ASTONISHING X-MEN
#1 ORIGINAL J. SCOTT CAMPBELL COMIC COVER ART MARVEL',
'CHAMBER OF DARKNESS
#7 COVER ART (VERY FIRST BERNIE WRIGHTSON MARVEL COVER) 1970',
'MANEELY, JOE - WILD WESTERN
#46 GOLDEN AGE MARVEL COMICS COVER (LARGE ART) 1955',
'Superman vs Captain Marvel Double page splash art by Rich Buckler DC 1978 YOWZA!',
'SIMON BISLEY 1990 DOOM PATROL
#39 ORIGINAL COMIC COVER ART PAINTING DC COMICS']
since the price is in the same area of the html page as the title, let's use the findNext function. this time we will search for a 'span' element with class = 's-item__price'. also, when I tried to run separate functions to find the title, and then the price, there were sometimes duplicate title tags -- to the length of the lists would not match. I would get a title list with 202 items and a price list 200 items -- so these could not be joined in a dataframe.
Also, I imagine using findNext() and findPrevious() might make the whole search process a little faster.

We'll use the findNext function because the price is in the same section of the html page as the title. We'll look for a'span' element with the class's-item price' this time. Furthermore, whenever I tried to execute separate functions to get the title and then the price, there were occasionally duplicate page titles - the lengths of the lists didn't match. You would get a 202-item title list and a 200-item price list, which couldn't be combined in a data frame.

In addition, You can use findNext() and findPrevious()that will speed up the entire search process.

price_class = 's-item__price'

price_list = []

for i in range(len(title_tags)):
# make sure there are contents first
if (title_tags[i].contents):
title_contents = title_tags[i].contents[0]
title_list.append(title_contents)
price = title_tags[i].findNext('span', {'class': price_class})
if(i==1):
print(price)
This displays the price information during the last item listed on the first search page, out of a total of 200.

print(price.contents)
['$60.00']
Now you need to check if you are getting a string and not a tag, and if so Strip the Dollar sign
from __future__ import division, unicode_literals
import codecs
from re import sub

if (isinstance(price_string, str)):
price_string = sub(r'[^\d.]', '', price_string)
else:
price_string = price.contents[0].contents[0]
price_string = sub(r'[^\d.]', '', price_string)

print(price_string)

60.00
Converting the Price into a Floating-Point Decimal
price_num = float(price_string)
print(price_num)
60.0
Place it All together in a Loop and Add all the Prices to a List
for i in range(len(title_tags)):
if (title_tags[i].contents):
title_contents = title_tags[i].contents[0]
title_list.append(title_contents)
price = title_tags[i].findNext('span', {'class': price_class})
if price.contents:
price_string = price.contents[0]
if (isinstance(price_string, str)):
price_string = sub(r'[^\d.]', '', price_string)
else:
price_string = price.contents[0].contents[0]
price_string = sub(r'[^\d.]', '', price_string)
price_num = float(price_string)
price_list.append(price_num)

print(len(price_list))
202
print(price_list[:5])

[50000.0, 45000.0, 18000.0, 16000.0, 14999.99]
now find an anchor tag with a reference and add the links to each distinct art listing

item_page_link = title_tags[i].findPrevious('a', href=True)
link_list = []
Clearing the Other Lists

title_list.clear()
price_list.clear()

for i in range(len(title_tags)):
if (title_tags[i].contents):
title_contents = title_tags[i].contents[0]
title_list.append(title_contents)
price = title_tags[i].findNext('span', {'class': price_class})
if price.contents:
price_string = price.contents[0]
if (isinstance(price_string, str)):
price_string = sub(r'[^\d.]', '', price_string)
else:
price_string = price.contents[0].contents[0]
price_string = sub(r'[^\d.]', '', price_string)
price_num = float(price_string)
price_list.append(price_num)
item_page_link = title_tags[i].findPrevious('a', href=True) # {'class': 's-item__link'})
if item_page_link.text:
href_text = item_page_link['href']
link_list.append(item_page_link['href'])

len(link_list)
202
print(link_list[:5])
Creating a DataFrame using the Dictionary
import pandas as pd

title_price_link_df = pd.DataFrame(title_and_price_dict)

len(title_price_link_df)

202
print(title_price_link_df[:5])
title ... link 0 WHAT IF ASTONISHING X-MEN
#1 ORIGINAL J. SCOTT... ... https://www.ebay.com/itm/123753951902?hash=ite... 1 CHAMBER OF DARKNESS
#7 COVER ART (VERY FIRST B... ... https://www.ebay.com/itm/312520261257?hash=ite... 2 MANEELY, JOE - WILD WESTERN
#46 GOLDEN AGE MAR... ... https://www.ebay.com/itm/312525381131?hash=ite... 3 Superman vs Captain Marvel Double page splash ... ... https://www.ebay.com/itm/233849382971?hash=ite... 4 SIMON BISLEY 1990 DOOM PATROL
#39 ORIGINAL COM... ... https://www.ebay.com/itm/153609370179?hash=ite... [5 rows x 3 columns]
We're simply interested in the top six pages of results produced by our search address for now. We would potentially obtain 1200 listings ordered by price if the URL returned 200 listings per page. Unfortunately, eBay stops processing requests after the fourth page, resulting in 800 listings. Given the current traffic on eBay, this should be enough to get all products over $75. The listings below this amount are almost entirely made up of fan art rather than actual comic art.

So, the quick and simple method is to check for the pages in the lower-left corner and click on each one to receive the connections to that page.

More About the Author

3i Data Scraping is an Experienced Web Scraping Services Company in the USA. We are Providing a Complete Range of Web Scraping, Mobile App Scraping, Data Extraction, Data Mining, and Real-Time Data Scraping (API) Services. We have 11+ Years of Experience in Providing Website Data Scraping Solutions to Hundreds of Customers Worldwide.

Total Views: 142Word Count: 2020See All articles From Author

Add Comment

Business Articles

1. Catering Services In Noida For Every Occasion
Author: Catering Services in Noida

2. Leading The Way In Business Continuity Management System (bcms) In Uae And Dubai
Author: kohan

3. Manila Rope: A Versatile Solution For Various Industries In The Uae
Author: yasirsheikh1891

4. Exploring Asian Clothes Online: A Guide For Uk Shoppers
Author: Dazzle and Bloom

5. Maximizing Your Email Marketing Roi: A Comprehensive Guide
Author: tim seifert

6. Spray Paint: The Ultimate Solution For Versatile And Efficient Painting
Author: yakubali7842

7. High-quality Thrust Needle Roller Bearings: Essential For Reliable Performance
Author: psbearings

8. Web Design Company In Coimbatore
Author: cp

9. Top Needle Roller Bearing Manufacturer: Quality You Can Rely On
Author: psbearings

10. Discover The Best Rfid Tags For Your Industry Needs At Id Tech Solutions
Author: Shivam Kumar

11. Translation Company In India
Author: Lingosolution

12. Why Perlau Gwyn Dental Care Is The Top Choice For Dentists In Cardiff And Teeth Whitening Services
Author: Rebecca Brown

13. Hybrid Inverters & Their Diverse Applications
Author: blogswalaindia

14. The Role Of Solar Panels In Sustainable Living
Author: blogswalaindia

15. Solar Energy And Battery Storage: What You Need To Know
Author: blogswalaindia

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: