CrawlResult
Reference
The CrawlResult
class encapsulates everything returned after a single crawl operation. It provides the raw or processed content, details on links and media, plus optional metadata (like screenshots, PDFs, or extracted JSON).
Location: crawl4ai/crawler/models.py
(for reference)
class CrawlResult(BaseModel):
url: str
html: str
success: bool
cleaned_html: Optional[str] = None
media: Dict[str, List[Dict]] = {}
links: Dict[str, List[Dict]] = {}
downloaded_files: Optional[List[str]] = None
screenshot: Optional[str] = None
pdf : Optional[bytes] = None
markdown: Optional[Union[str, MarkdownGenerationResult]] = None
markdown_v2: Optional[MarkdownGenerationResult] = None
fit_markdown: Optional[str] = None
fit_html: Optional[str] = None
extracted_content: Optional[str] = None
metadata: Optional[dict] = None
error_message: Optional[str] = None
session_id: Optional[str] = None
response_headers: Optional[dict] = None
status_code: Optional[int] = None
ssl_certificate: Optional[SSLCertificate] = None
...
Below is a field-by-field explanation and possible usage patterns.
1. Basic Crawl Info
1.1 url
(str)
What: The final crawled URL (after any redirects).
Usage:
1.2 success
(bool)
What: True
if the crawl pipeline ended without major errors; False
otherwise.
Usage:
1.3 status_code
(Optional[int])
What: The page’s HTTP status code (e.g., 200, 404).
Usage:
1.4 error_message
(Optional[str])
What: If success=False
, a textual description of the failure.
Usage:
1.5 session_id
(Optional[str])
What: The ID used for reusing a browser context across multiple calls.
Usage:
# If you used session_id="login_session" in CrawlerRunConfig, see it here:
print("Session:", result.session_id)
1.6 response_headers
(Optional[dict])
What: Final HTTP response headers.
Usage:
1.7 ssl_certificate
(Optional[SSLCertificate])
What: If fetch_ssl_certificate=True
in your CrawlerRunConfig, result.ssl_certificate
contains a SSLCertificate
object describing the site’s certificate. You can export the cert in multiple formats (PEM/DER/JSON) or access its properties like issuer
,
subject
, valid_from
, valid_until
, etc.
Usage:
2. Raw / Cleaned Content
2.1 html
(str)
What: The original unmodified HTML from the final page load.
Usage:
2.2 cleaned_html
(Optional[str])
What: A sanitized HTML version—scripts, styles, or excluded tags are removed based on your CrawlerRunConfig
.
Usage:
2.3 fit_html
(Optional[str])
What: If a content filter or heuristic (e.g., Pruning/BM25) modifies the HTML, the “fit” or post-filter version.
When: This is only present if your markdown_generator
or content_filter
produces it.
Usage:
3. Markdown Fields
3.1 The Markdown Generation Approach
Crawl4AI can convert HTML→Markdown, optionally including:
- Raw markdown
- Links as citations (with a references section)
- Fit markdown if a content filter is used (like Pruning or BM25)
3.2 markdown_v2
(Optional[MarkdownGenerationResult])
What: The structured object holding multiple markdown variants. Soon to be consolidated into markdown
.
MarkdownGenerationResult
includes:
- raw_markdown
(str): The full HTML→Markdown conversion.
- markdown_with_citations
(str): Same markdown, but with link references as academic-style citations.
- references_markdown
(str): The reference list or footnotes at the end.
- fit_markdown
(Optional[str]): If content filtering (Pruning/BM25) was applied, the filtered “fit” text.
- fit_html
(Optional[str]): The HTML that led to fit_markdown
.
Usage:
if result.markdown_v2:
md_res = result.markdown_v2
print("Raw MD:", md_res.raw_markdown[:300])
print("Citations MD:", md_res.markdown_with_citations[:300])
print("References:", md_res.references_markdown)
if md_res.fit_markdown:
print("Pruned text:", md_res.fit_markdown[:300])
3.3 markdown
(Optional[Union[str, MarkdownGenerationResult]])
What: In future versions, markdown
will fully replace markdown_v2
. Right now, it might be a str
or a MarkdownGenerationResult
.
Usage:
# Soon, you might see:
if isinstance(result.markdown, MarkdownGenerationResult):
print(result.markdown.raw_markdown[:200])
else:
print(result.markdown)
3.4 fit_markdown
(Optional[str])
What: A direct reference to the final filtered markdown (legacy approach).
When: This is set if a filter or content strategy explicitly writes there. Usually overshadowed by markdown_v2.fit_markdown
.
Usage:
Important: “Fit” content (in fit_markdown
/fit_html
) only exists if you used a filter (like PruningContentFilter or BM25ContentFilter) within a MarkdownGenerationStrategy
.
4. Media & Links
4.1 media
(Dict[str, List[Dict]])
What: Contains info about discovered images, videos, or audio. Typically keys: "images"
, "videos"
, "audios"
.
Common Fields in each item:
src
(str): Media URLalt
ortitle
(str): Descriptive textscore
(float): Relevance score if the crawler’s heuristic found it “important”desc
ordescription
(Optional[str]): Additional context extracted from surrounding text
Usage:
images = result.media.get("images", [])
for img in images:
if img.get("score", 0) > 5:
print("High-value image:", img["src"])
4.2 links
(Dict[str, List[Dict]])
What: Holds internal and external link data. Usually two keys: "internal"
and "external"
.
Common Fields:
href
(str): The link targettext
(str): Link texttitle
(str): Title attributecontext
(str): Surrounding text snippetdomain
(str): If external, the domain
Usage:
for link in result.links["internal"]:
print(f"Internal link to {link['href']} with text {link['text']}")
5. Additional Fields
5.1 extracted_content
(Optional[str])
What: If you used extraction_strategy
(CSS, LLM, etc.), the structured output (JSON).
Usage:
5.2 downloaded_files
(Optional[List[str]])
What: If accept_downloads=True
in your BrowserConfig
+ downloads_path
, lists local file paths for downloaded items.
Usage:
if result.downloaded_files:
for file_path in result.downloaded_files:
print("Downloaded:", file_path)
5.3 screenshot
(Optional[str])
What: Base64-encoded screenshot if screenshot=True
in CrawlerRunConfig
.
Usage:
import base64
if result.screenshot:
with open("page.png", "wb") as f:
f.write(base64.b64decode(result.screenshot))
5.4 pdf
(Optional[bytes])
What: Raw PDF bytes if pdf=True
in CrawlerRunConfig
.
Usage:
5.5 metadata
(Optional[dict])
What: Page-level metadata if discovered (title, description, OG data, etc.).
Usage:
if result.metadata:
print("Title:", result.metadata.get("title"))
print("Author:", result.metadata.get("author"))
6. Example: Accessing Everything
async def handle_result(result: CrawlResult):
if not result.success:
print("Crawl error:", result.error_message)
return
# Basic info
print("Crawled URL:", result.url)
print("Status code:", result.status_code)
# HTML
print("Original HTML size:", len(result.html))
print("Cleaned HTML size:", len(result.cleaned_html or ""))
# Markdown output
if result.markdown_v2:
print("Raw Markdown:", result.markdown_v2.raw_markdown[:300])
print("Citations Markdown:", result.markdown_v2.markdown_with_citations[:300])
if result.markdown_v2.fit_markdown:
print("Fit Markdown:", result.markdown_v2.fit_markdown[:200])
else:
print("Raw Markdown (legacy):", result.markdown[:200] if result.markdown else "N/A")
# Media & Links
if "images" in result.media:
print("Image count:", len(result.media["images"]))
if "internal" in result.links:
print("Internal link count:", len(result.links["internal"]))
# Extraction strategy result
if result.extracted_content:
print("Structured data:", result.extracted_content)
# Screenshot/PDF
if result.screenshot:
print("Screenshot length:", len(result.screenshot))
if result.pdf:
print("PDF bytes length:", len(result.pdf))
7. Key Points & Future
1. markdown_v2
vs markdown
- Right now, markdown_v2
is the more robust container (MarkdownGenerationResult
), providing raw_markdown, markdown_with_citations, references, plus possible fit_markdown.
- In future versions, everything will unify under markdown
. If you rely on advanced features (citations, fit content), check markdown_v2
.
2. Fit Content
- fit_markdown
and fit_html
appear only if you used a content filter (like PruningContentFilter or BM25ContentFilter) inside your MarkdownGenerationStrategy or set them directly.
- If no filter is used, they remain None
.
3. References & Citations
- If you enable link citations in your DefaultMarkdownGenerator
(options={"citations": True}
), you’ll see markdown_with_citations
plus a references_markdown
block. This helps large language models or academic-like referencing.
4. Links & Media
- links["internal"]
and links["external"]
group discovered anchors by domain.
- media["images"]
/ ["videos"]
/ ["audios"]
store extracted media elements with optional scoring or context.
5. Error Cases
- If success=False
, check error_message
(e.g., timeouts, invalid URLs).
- status_code
might be None
if we failed before an HTTP response.
Use CrawlResult
to glean all final outputs and feed them into your data pipelines, AI models, or archives. With the synergy of a properly configured BrowserConfig and CrawlerRunConfig, the crawler can produce robust, structured results here in CrawlResult
.