š Crawl4AI 0.4.2 Update: Smarter Crawling Just Got Easier (Dec 12, 2024)
Hey Developers,
Iām excited to share Crawl4AI 0.4.2āa major upgrade that makes crawling smarter, faster, and a whole lot more intuitive. Iāve packed in a bunch of new features to simplify your workflows and improve your experience. Letās cut to the chase!
š§ Configurable Browser and Crawler Behavior
Youāve asked for better control over how browsers and crawlers are configured, and now youāve got it. With the new BrowserConfig
and CrawlerRunConfig
objects, you can set up your browser and crawling behavior exactly how you want. No more cluttering arun
with a dozen argumentsājust pass in your configs and go.
Example:
from crawl4ai import BrowserConfig, CrawlerRunConfig, AsyncWebCrawler
browser_config = BrowserConfig(headless=True, viewport_width=1920, viewport_height=1080)
crawler_config = CrawlerRunConfig(cache_mode="BYPASS")
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url="https://example.com", config=crawler_config)
print(result.markdown[:500])
This setup is a game-changer for scalability, keeping your code clean and flexible as we add more parameters in the future.
Remember: If you like to use the old way, you can still pass arguments directly to arun
as before, no worries!
š Streamlined Session Management
Hereās the big one: You can now pass local storage and cookies directly. Whether itās setting values programmatically or importing a saved JSON state, managing sessions has never been easier. This is a must-have for authenticated crawlsājust export your storage state once and reuse it effortlessly across runs.
Example: 1. Open a browser, log in manually, and export the storage state. 2. Import the JSON file for seamless authenticated crawling:
result = await crawler.arun(
url="https://example.com/protected",
storage_state="my_storage_state.json"
)
š¢ Handling Large Pages: Supercharged Screenshots and PDF Conversion
Two big upgrades here:
-
Blazing-fast long-page screenshots: Turn extremely long web pages into clean, high-quality screenshotsāwithout breaking a sweat. Itās optimized to handle large content without lag.
-
Full-page PDF exports: Now, you can also convert any page into a PDF with all the details intact. Perfect for archiving or sharing complex layouts.
š§ Other Cool Stuff
- Anti-bot enhancements: Magic mode now handles overlays, user simulation, and anti-detection features like a pro.
- JavaScript execution: Execute custom JS snippets to handle dynamic content. No more wrestling with endless page interactions.
š Performance Boosts and Dev-friendly Updates
- Faster rendering and viewport adjustments for better performance.
- Improved cookie and local storage handling for seamless authentication.
- Better debugging with detailed logs and actionable error messages.
š Use Cases Youāll Love
- Authenticated Crawls: Login once, export your storage state, and reuse it across multiple requests without the headache.
- Long-page Screenshots: Perfect for blogs, e-commerce pages, or any endless-scroll website.
- PDF Export: Create professional-looking page PDFs in seconds.
Letās Get Crawling
Crawl4AI 0.4.2 is ready for you to download and try. Iām always looking for ways to improve, so donāt hold backāshare your thoughts and feedback.
Happy Crawling! š