Firecrawl
Scrape, search, crawl, map, and extract web data
Integrate Firecrawl into the workflow. Scrape pages, search the web, crawl entire sites, map URL structures, and extract structured data with AI.
Extract structured content from web pages with comprehensive metadata support. Converts content to markdown or HTML while capturing SEO metadata, Open Graph tags, and page information.
| Parameter | Type | Required | Description |
|---|
url | string | Yes | The URL to scrape content from (e.g., "https://example.com/page"\) |
scrapeOptions | json | No | No description |
apiKey | string | Yes | No description |
pricing | custom | No | No description |
metadata | string | No | No description |
rateLimit | string | No | No description |
| Parameter | Type | Description |
|---|
markdown | string | Page content in markdown format |
html | string | Raw HTML content of the page |
metadata | object | Page metadata including SEO and Open Graph information |
↳ title | string | Page title |
↳ description | string | Page meta description |
↳ language | string | Page language code (e.g., "en") |
↳ sourceURL | string | Original source URL that was scraped |
↳ statusCode | number | HTTP status code of the response |
↳ keywords | string | Page meta keywords |
↳ robots | string | Robots meta directive (e.g., "follow, index") |
↳ ogTitle | string | Open Graph title |
↳ ogDescription | string | Open Graph description |
↳ ogUrl | string | Open Graph URL |
↳ ogImage | string | Open Graph image URL |
↳ ogLocaleAlternate | array | Alternate locale versions for Open Graph |
↳ ogSiteName | string | Open Graph site name |
↳ error | string | Error message if scrape failed |
| Parameter | Type | Required | Description |
|---|
query | string | Yes | No description |
apiKey | string | Yes | No description |
pricing | custom | No | No description |
metadata | string | No | No description |
rateLimit | string | No | No description |
| Parameter | Type | Description |
|---|
data | array | Search results data with scraped content and metadata |
↳ title | string | Search result title from search engine |
↳ description | string | Search result description/snippet from search engine |
↳ url | string | URL of the search result |
↳ markdown | string | Page content in markdown (when scrapeOptions.formats includes "markdown") |
↳ html | string | Processed HTML content (when scrapeOptions.formats includes "html") |
↳ rawHtml | string | Unprocessed raw HTML (when scrapeOptions.formats includes "rawHtml") |
↳ links | array | Links found on the page (when scrapeOptions.formats includes "links") |
↳ screenshot | string | Screenshot URL (expires after 24 hours, when scrapeOptions.formats includes "screenshot") |
↳ metadata | object | Metadata about the search result page |
↳ title | string | Page title |
↳ description | string | Page meta description |
↳ sourceURL | string | Original source URL |
↳ statusCode | number | HTTP status code |
↳ error | string | Error message if scrape failed |
| Parameter | Type | Required | Description |
|---|
url | string | Yes | The website URL to crawl (e.g., "https://example.com" or "https://docs.example.com/guide"\) |
limit | number | No | No description |
maxDepth | number | No | Maximum depth to crawl from the starting URL (e.g., 1, 2, 3). Controls how many levels deep to follow links |
formats | json | No | Output formats for scraped content (e.g., ["markdown"], ["markdown", "html"], ["markdown", "links"]) |
excludePaths | json | No | URL paths to exclude from crawling (e.g., ["/blog/", "/admin/", "/*.pdf"]) |
includePaths | json | No | URL paths to include in crawling (e.g., ["/docs/", "/api/"]). Only these paths will be crawled |
onlyMainContent | boolean | No | No description |
apiKey | string | Yes | No description |
pricing | custom | No | No description |
metadata | string | No | No description |
rateLimit | string | No | No description |
| Parameter | Type | Description |
|---|
pages | array | Array of crawled pages with their content and metadata |
↳ markdown | string | Page content in markdown format |
↳ html | string | Processed HTML content of the page |
↳ rawHtml | string | Unprocessed raw HTML content |
↳ links | array | Array of links found on the page |
↳ screenshot | string | Screenshot URL (expires after 24 hours) |
↳ metadata | object | Page metadata from crawl operation |
↳ title | string | Page title |
↳ description | string | Page meta description |
↳ language | string | Page language code |
↳ sourceURL | string | Original source URL |
↳ statusCode | number | HTTP status code |
↳ ogLocaleAlternate | array | Alternate locale versions |
total | number | Total number of pages found during crawl |
Get a complete list of URLs from any website quickly and reliably. Useful for discovering all pages on a site without crawling them.
| Parameter | Type | Required | Description |
|---|
url | string | Yes | The base URL to map and discover links from (e.g., "https://example.com"\) |
search | string | No | Filter results by relevance to a search term (e.g., "blog") |
sitemap | string | No | Controls sitemap usage: "skip", "include" (default), or "only" |
includeSubdomains | boolean | No | No description |
ignoreQueryParameters | boolean | No | No description |
limit | number | No | Maximum number of links to return (e.g., 100, 1000, 5000). Max: 100,000, default: 5,000 |
timeout | number | No | No description |
location | json | No | No description |
apiKey | string | Yes | No description |
pricing | custom | No | No description |
metadata | string | No | No description |
rateLimit | string | No | No description |
| Parameter | Type | Description |
|---|
success | boolean | Whether the mapping operation was successful |
links | array | Array of discovered URLs from the website |
Extract structured data from entire webpages using natural language prompts and JSON schema. Powerful agentic feature for intelligent data extraction.
| Parameter | Type | Required | Description |
|---|
urls | json | Yes | Array of URLs to extract data from (e.g., ["https://example.com/page1", "https://example.com/page2"\] or ["https://example.com/*"\]\) |
prompt | string | No | No description |
schema | json | No | No description |
enableWebSearch | boolean | No | No description |
ignoreSitemap | boolean | No | No description |
includeSubdomains | boolean | No | No description |
showSources | boolean | No | No description |
ignoreInvalidURLs | boolean | No | No description |
scrapeOptions | json | No | No description |
apiKey | string | Yes | No description |
pricing | custom | No | No description |
metadata | string | No | No description |
rateLimit | string | No | No description |
| Parameter | Type | Description |
|---|
success | boolean | Whether the extraction operation was successful |
data | object | Extracted structured data according to the schema or prompt |
Autonomous web data extraction agent. Searches and gathers information based on natural language prompts without requiring specific URLs.
| Parameter | Type | Required | Description |
|---|
prompt | string | Yes | No description |
urls | json | No | Optional array of URLs to focus the agent on (e.g., ["https://example.com", "https://docs.example.com"\]\) |
schema | json | No | No description |
maxCredits | number | No | No description |
strictConstrainToURLs | boolean | No | No description |
apiKey | string | Yes | No description |
| Parameter | Type | Description |
|---|
success | boolean | Whether the agent operation was successful |
status | string | Current status of the agent job (processing, completed, failed) |
data | object | Extracted data from the agent |
expiresAt | string | Timestamp when the results expire (24 hours) |
sources | object | Array of source URLs used by the agent |