SSLCertificate
Reference
The SSLCertificate
class encapsulates an SSL certificate’s data and allows exporting it in various formats (PEM, DER, JSON, or text). It’s used within Crawl4AI whenever you set fetch_ssl_certificate=True
in your CrawlerRunConfig
.
1. Overview
Location: crawl4ai/ssl_certificate.py
class SSLCertificate:
"""
Represents an SSL certificate with methods to export in various formats.
Main Methods:
- from_url(url, timeout=10)
- from_file(file_path)
- from_binary(binary_data)
- to_json(filepath=None)
- to_pem(filepath=None)
- to_der(filepath=None)
...
Common Properties:
- issuer
- subject
- valid_from
- valid_until
- fingerprint
"""
Typical Use Case
- You enable certificate fetching in your crawl by:
- After
arun()
, ifresult.ssl_certificate
is present, it’s an instance ofSSLCertificate
. - You can read basic properties (issuer, subject, validity) or export them in multiple formats.
2. Construction & Fetching
2.1 from_url(url, timeout=10)
Manually load an SSL certificate from a given URL (port 443). Typically used internally, but you can call it directly if you want:
cert = SSLCertificate.from_url("https://example.com")
if cert:
print("Fingerprint:", cert.fingerprint)
2.2 from_file(file_path)
Load from a file containing certificate data in ASN.1 or DER. Rarely needed unless you have local cert files:
2.3 from_binary(binary_data)
Initialize from raw binary. E.g., if you captured it from a socket or another source:
3. Common Properties
After obtaining a SSLCertificate
instance (e.g. result.ssl_certificate
from a crawl), you can read:
1. issuer
(dict)
- E.g. {"CN": "My Root CA", "O": "..."}
2. subject
(dict)
- E.g. {"CN": "example.com", "O": "ExampleOrg"}
3. valid_from
(str)
- NotBefore date/time. Often in ASN.1/UTC format.
4. valid_until
(str)
- NotAfter date/time.
5. fingerprint
(str)
- The SHA-256 digest (lowercase hex).
- E.g. "d14d2e..."
4. Export Methods
Once you have a SSLCertificate
object, you can export or inspect it:
4.1 to_json(filepath=None)
→ Optional[str]
- Returns a JSON string containing the parsed certificate fields.
- If
filepath
is provided, saves it to disk instead, returningNone
.
Usage:
json_data = cert.to_json() # returns JSON string
cert.to_json("certificate.json") # writes file, returns None
4.2 to_pem(filepath=None)
→ Optional[str]
- Returns a PEM-encoded string (common for web servers).
- If
filepath
is provided, saves it to disk instead.
4.3 to_der(filepath=None)
→ Optional[bytes]
- Returns the original DER (binary ASN.1) bytes.
- If
filepath
is specified, writes the bytes there instead.
4.4 (Optional) export_as_text()
- If you see a method like
export_as_text()
, it typically returns an OpenSSL-style textual representation. - Not always needed, but can help for debugging or manual inspection.
5. Example Usage in Crawl4AI
Below is a minimal sample showing how the crawler obtains an SSL cert from a site, then reads or exports it. The code snippet:
import asyncio
import os
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
async def main():
tmp_dir = "tmp"
os.makedirs(tmp_dir, exist_ok=True)
config = CrawlerRunConfig(
fetch_ssl_certificate=True,
cache_mode=CacheMode.BYPASS
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun("https://example.com", config=config)
if result.success and result.ssl_certificate:
cert = result.ssl_certificate
# 1. Basic Info
print("Issuer CN:", cert.issuer.get("CN", ""))
print("Valid until:", cert.valid_until)
print("Fingerprint:", cert.fingerprint)
# 2. Export
cert.to_json(os.path.join(tmp_dir, "certificate.json"))
cert.to_pem(os.path.join(tmp_dir, "certificate.pem"))
cert.to_der(os.path.join(tmp_dir, "certificate.der"))
if __name__ == "__main__":
asyncio.run(main())
6. Notes & Best Practices
1. Timeout: SSLCertificate.from_url
internally uses a default 10s socket connect and wraps SSL.
2. Binary Form: The certificate is loaded in ASN.1 (DER) form, then re-parsed by OpenSSL.crypto
.
3. Validation: This does not validate the certificate chain or trust store. It only fetches and parses.
4. Integration: Within Crawl4AI, you typically just set fetch_ssl_certificate=True
in CrawlerRunConfig
; the final result’s ssl_certificate
is automatically built.
5. Export: If you need to store or analyze a cert, the to_json
and to_pem
are quite universal.
Summary
SSLCertificate
is a convenience class for capturing and exporting the TLS certificate from your crawled site(s).- Common usage is in the
CrawlResult.ssl_certificate
field, accessible after settingfetch_ssl_certificate=True
. - Offers quick access to essential certificate details (
issuer
,subject
,fingerprint
) and is easy to export (PEM, DER, JSON) for further analysis or server usage.
Use it whenever you need insight into a site’s certificate or require some form of cryptographic or compliance check.