🚀 Crawl4AI 0.4.2 Update: Smarter Crawling Just Got Easier (Dec 12, 2024)
Hey Developers,
I’m excited to share Crawl4AI 0.4.2—a major upgrade that makes crawling smarter, faster, and a whole lot more intuitive. I’ve packed in a bunch of new features to simplify your workflows and improve your experience. Let’s cut to the chase!
🔧 Configurable Browser and Crawler Behavior
You’ve asked for better control over how browsers and crawlers are configured, and now you’ve got it. With the new BrowserConfig
and CrawlerRunConfig
objects, you can set up your browser and crawling behavior exactly how you want. No more cluttering arun
with a dozen arguments—just pass in your configs and go.
Example:
from crawl4ai import BrowserConfig, CrawlerRunConfig, AsyncWebCrawler
browser_config = BrowserConfig(headless=True, viewport_width=1920, viewport_height=1080)
crawler_config = CrawlerRunConfig(cache_mode="BYPASS")
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url="https://example.com", config=crawler_config)
print(result.markdown[:500])
This setup is a game-changer for scalability, keeping your code clean and flexible as we add more parameters in the future.
Remember: If you like to use the old way, you can still pass arguments directly to arun
as before, no worries!
🔐 Streamlined Session Management
Here’s the big one: You can now pass local storage and cookies directly. Whether it’s setting values programmatically or importing a saved JSON state, managing sessions has never been easier. This is a must-have for authenticated crawls—just export your storage state once and reuse it effortlessly across runs.
Example: 1. Open a browser, log in manually, and export the storage state. 2. Import the JSON file for seamless authenticated crawling:
result = await crawler.arun(
url="https://example.com/protected",
storage_state="my_storage_state.json"
)
🔢 Handling Large Pages: Supercharged Screenshots and PDF Conversion
Two big upgrades here:
-
Blazing-fast long-page screenshots: Turn extremely long web pages into clean, high-quality screenshots—without breaking a sweat. It’s optimized to handle large content without lag.
-
Full-page PDF exports: Now, you can also convert any page into a PDF with all the details intact. Perfect for archiving or sharing complex layouts.
🔧 Other Cool Stuff
- Anti-bot enhancements: Magic mode now handles overlays, user simulation, and anti-detection features like a pro.
- JavaScript execution: Execute custom JS snippets to handle dynamic content. No more wrestling with endless page interactions.
📊 Performance Boosts and Dev-friendly Updates
- Faster rendering and viewport adjustments for better performance.
- Improved cookie and local storage handling for seamless authentication.
- Better debugging with detailed logs and actionable error messages.
🔠 Use Cases You’ll Love
1. Authenticated Crawls: Login once, export your storage state, and reuse it across multiple requests without the headache. 2. Long-page Screenshots: Perfect for blogs, e-commerce pages, or any endless-scroll website. 3. PDF Export: Create professional-looking page PDFs in seconds.
Let’s Get Crawling
Crawl4AI 0.4.2 is ready for you to download and try. I’m always looking for ways to improve, so don’t hold back—share your thoughts and feedback.
Happy Crawling! 🚀