šŸš€ Crawl4AI 0.4.2 Update: Smarter Crawling Just Got Easier (Dec 12, 2024)

Hey Developers,

Iā€™m excited to share Crawl4AI 0.4.2ā€”a major upgrade that makes crawling smarter, faster, and a whole lot more intuitive. Iā€™ve packed in a bunch of new features to simplify your workflows and improve your experience. Letā€™s cut to the chase!


šŸ”§ Configurable Browser and Crawler Behavior

Youā€™ve asked for better control over how browsers and crawlers are configured, and now youā€™ve got it. With the new BrowserConfig and CrawlerRunConfig objects, you can set up your browser and crawling behavior exactly how you want. No more cluttering arun with a dozen argumentsā€”just pass in your configs and go.

Example:

from crawl4ai import BrowserConfig, CrawlerRunConfig, AsyncWebCrawler

browser_config = BrowserConfig(headless=True, viewport_width=1920, viewport_height=1080)
crawler_config = CrawlerRunConfig(cache_mode="BYPASS")

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun(url="https://example.com", config=crawler_config)
    print(result.markdown[:500])

This setup is a game-changer for scalability, keeping your code clean and flexible as we add more parameters in the future.

Remember: If you like to use the old way, you can still pass arguments directly to arun as before, no worries!


šŸ” Streamlined Session Management

Hereā€™s the big one: You can now pass local storage and cookies directly. Whether itā€™s setting values programmatically or importing a saved JSON state, managing sessions has never been easier. This is a must-have for authenticated crawlsā€”just export your storage state once and reuse it effortlessly across runs.

Example: 1. Open a browser, log in manually, and export the storage state. 2. Import the JSON file for seamless authenticated crawling:

result = await crawler.arun(
    url="https://example.com/protected",
    storage_state="my_storage_state.json"
)

šŸ”¢ Handling Large Pages: Supercharged Screenshots and PDF Conversion

Two big upgrades here:

  • Blazing-fast long-page screenshots: Turn extremely long web pages into clean, high-quality screenshotsā€”without breaking a sweat. Itā€™s optimized to handle large content without lag.

  • Full-page PDF exports: Now, you can also convert any page into a PDF with all the details intact. Perfect for archiving or sharing complex layouts.


šŸ”§ Other Cool Stuff

  • Anti-bot enhancements: Magic mode now handles overlays, user simulation, and anti-detection features like a pro.
  • JavaScript execution: Execute custom JS snippets to handle dynamic content. No more wrestling with endless page interactions.

šŸ“Š Performance Boosts and Dev-friendly Updates

  • Faster rendering and viewport adjustments for better performance.
  • Improved cookie and local storage handling for seamless authentication.
  • Better debugging with detailed logs and actionable error messages.

šŸ”  Use Cases Youā€™ll Love

  1. Authenticated Crawls: Login once, export your storage state, and reuse it across multiple requests without the headache.
  2. Long-page Screenshots: Perfect for blogs, e-commerce pages, or any endless-scroll website.
  3. PDF Export: Create professional-looking page PDFs in seconds.

Letā€™s Get Crawling

Crawl4AI 0.4.2 is ready for you to download and try. Iā€™m always looking for ways to improve, so donā€™t hold backā€”share your thoughts and feedback.

Happy Crawling! šŸš€