Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Different outputs everytime the program is run while scraping web using beautiful soup

$
0
0

@B.Rabbit wrote:

Hello people,
I'm trying to scrap name of books(Business & Economics) and their other details from the following link:


My objective is to scrap the names of all books from the first 10 pages of the website.
I used the following code to do the same:

import requests
from bs4 import BeautifulSoup

def amazon_spider(max_pages):
    page = 1
    i = 1
    while page <= max_pages:
        url = 'https://www.amazon.in/s/ref=sr_pg_3?rh=n%3A976389031%2Cn%3A%21976390031%2Cn%3A1318068031&page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, 'html.parser')

        for link in list(soup.findAll('a', {'class': 'a-link-normal s-access-detail-page  a-text-normal'})):
            print(i)
            data = link.get('title')
            href = link.get('href')
            print(href)
            print(link.string)
            get_single_item_data(href)
            i = i + 1
        page = page + 1
amazon_spider(10)

I should get names of 160 books when I run the above code, but I rarely get that. Sometimes I get names of 64 books or 128 books and sometimes none at all.
Why does the output keep varying? Is the issue with the code or the internet connectivity?

Regards

Posts: 4

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles