Introduction

Welcome to the documentation for Crawl4AI v0.2.5! 🕷️🤖

Crawl4AI is designed to simplify the process of crawling web pages and extracting useful information for large language models (LLMs) and AI applications. Whether you're using it as a REST API, a Python library, or through a Google Colab notebook, Crawl4AI provides powerful features to make web data extraction easier and more efficient.

Key Features ✨

  • 🆓 Completely Free and Open-Source: Crawl4AI is free to use and open-source, making it accessible for everyone.
  • 🤖 LLM-Friendly Output Formats: Supports JSON, cleaned HTML, and markdown formats.
  • 🌍 Concurrent Crawling: Crawl multiple URLs simultaneously to save time.
  • 🎨 Media Extraction: Extract all media tags including images, audio, and video.
  • 🔗 Link Extraction: Extract all external and internal links from web pages.
  • 📚 Metadata Extraction: Extract metadata from web pages for additional context.
  • 🔄 Custom Hooks: Define custom hooks for authentication, headers, and page modifications before crawling.
  • 🕵️ User Agent Support: Customize the user agent for HTTP requests.
  • 🖼️ Screenshot Capability: Take screenshots of web pages during crawling.
  • 📜 JavaScript Execution: Execute custom JavaScripts before crawling.
  • 📚 Advanced Chunking and Extraction Strategies: Utilize topic-based, regex, sentence chunking, cosine clustering, and LLM extraction strategies.
  • 🎯 CSS Selector Support: Extract specific content using CSS selectors.
  • 📝 Instruction/Keyword Refinement: Pass instructions or keywords to refine the extraction process.

Check the Changelog for more details.

Power and Simplicity of Crawl4AI 🚀

Crawl4AI provides an easy way to crawl and extract data from web pages without installing any library. You can use the REST API on our server or run the local server on your machine. For more advanced control, use the Python library to customize your crawling and extraction strategies.

Explore the documentation to learn more about the features, installation process, usage examples, and how to contribute to Crawl4AI. Let's make the web more accessible and useful for AI applications! 💪🌐🤖