session.get(url) is not returning all of the data in a page. How do I fix this?

Christian Warjri
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
April 26, 2024

I'm using a python script to extract all the tables from a Confluence page. I'm using requests and BeautifulSoup.

 

def fetch_page(session, url):

    """Fetch page content using a requests session for persistent connection."""

    response = session.get(url)

    if response.status_code == 200:

        return response.content

    raise ValueError(f"Failed to fetch the page: {response.status_code}")


I'm parsing through the tables through a function:

def parse_table(content, tag):

    """Parse HTML tables and extract relevant data based on tag."""

    soup = BeautifulSoup(content, 'html.parser')

    tables = soup.find_all('table')

    all_tables_data = []
...
For some reason, I can't get the last few tables of a page. I have about 12 tables, and I can only get to table 5 or 6, and the rest don't appear in my dataframe. Why is this the case?

 

 

1 answer

1 accepted

0 votes
Answer accepted
Christian Warjri
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
April 26, 2024

The solution is to use: response.json()['body']['view']['value'] instead of just returning the content

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
PREMIUM
TAGS
AUG Leaders

Atlassian Community Events