Crawl4AI
Episode 5: JavaScript Execution and Dynamic Content Handling
Quick Intro
Explain JavaScript code injection with examples (e.g., simulating scrolling, clicking āload moreā). Demo: Extract content from a page that uses dynamic loading with lazy-loaded images.
Hereās a focused outline for the JavaScript Execution and Dynamic Content Handling video:
JavaScript Execution & Dynamic Content Handling
1) Why JavaScript Execution Matters:
- Many modern websites load content dynamically via JavaScript, requiring special handling to access all elements.
- Crawl4AI can execute JavaScript on pages, enabling it to interact with elements like āload moreā buttons, infinite scrolls, and content that appears only after certain actions.
2) Basic JavaScript Execution:
- Use
js_code
to execute JavaScript commands on a page: - This command scrolls to the bottom, triggering any lazy-loaded or dynamically added content.
3) Multiple Commands & Simulating Clicks:
- Combine multiple JavaScript commands to interact with elements like āload moreā buttons:
- This script scrolls down and then clicks the āload moreā button, useful for loading additional content blocks.
4) Waiting for Dynamic Content:
- Use
wait_for
to ensure the page loads specific elements before proceeding: - This example waits until elements with
.dynamic-content
are loaded, helping to capture content that appears after JavaScript actions.
5) Handling Complex Dynamic Content (e.g., Infinite Scroll):
- Combine JavaScript execution with conditional waiting to handle infinite scrolls or paginated content:
result = await crawler.arun( url="https://example.com", js_code=[ "window.scrollTo(0, document.body.scrollHeight);", "const loadMore = document.querySelector('.load-more'); if (loadMore) loadMore.click();" ], wait_for="js:() => document.querySelectorAll('.item').length > 10" # Wait until 10 items are loaded )
- This example scrolls and clicks "load more" repeatedly, waiting each time for a specified number of items to load.
6) Complete Example: Dynamic Content Handling with Extraction:
- Full example demonstrating a dynamic load and content extraction in one process:
async with AsyncWebCrawler() as crawler: result = await crawler.arun( url="https://example.com", js_code=[ "window.scrollTo(0, document.body.scrollHeight);", "document.querySelector('.load-more').click();" ], wait_for="css:.main-content", css_selector=".main-content" ) print(result.markdown[:500]) # Output the main content extracted
7) Wrap Up & Next Steps:
- Recap how JavaScript execution allows access to dynamic content, enabling powerful interactions.
- Tease the next video: Content Cleaning and Fit Markdown to show how Crawl4AI can extract only the most relevant content from complex pages.
This outline explains how to handle dynamic content and JavaScript-based interactions effectively, enabling users to scrape and interact with complex, modern websites.