@B.Rabbit wrote:
Hello people,
I'm trying to scrap name of books(Business & Economics) and their other details from the following link:
My objective is to scrap the names of all books from the first 10 pages of the website.
I used the following code to do the same:import requests from bs4 import BeautifulSoup def amazon_spider(max_pages): page = 1 i = 1 while page <= max_pages: url = 'https://www.amazon.in/s/ref=sr_pg_3?rh=n%3A976389031%2Cn%3A%21976390031%2Cn%3A1318068031&page=' + str(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, 'html.parser') for link in list(soup.findAll('a', {'class': 'a-link-normal s-access-detail-page a-text-normal'})): print(i) data = link.get('title') href = link.get('href') print(href) print(link.string) get_single_item_data(href) i = i + 1 page = page + 1 amazon_spider(10)
I should get names of 160 books when I run the above code, but I rarely get that. Sometimes I get names of 64 books or 128 books and sometimes none at all.
Why does the output keep varying? Is the issue with the code or the internet connectivity?Regards
Posts: 4
Participants: 2