Introduction
Search1API's Crawl endpoint provides developers with a straightforward way to extract clean, readable content from any webpage. This API is perfect for content aggregation, data analysis, and feeding AI models with web content.
Authentication
All Search1API endpoints require authentication using Bearer token. Include your API key in the Authorization header:
Authorization: Bearer your_api_key_here Basic Usage
Single URL Crawl
POST https://api.search1api.com/crawl
{
"url": "https://example.com/article"
} The API will respond with the extracted content:
{
"crawlParameters": {
"url": "https://example.com/article"
},
"results": {
"title": "Example Article Title",
"link": "https://example.com/article",
"content": "The full extracted content of the webpage..."
}
} Batch Processing
Crawl API supports batch processing for improved efficiency. Send multiple URLs in a single API call:
Batch Crawl Request
POST https://api.search1api.com/crawl
[
{
"url": "https://example.com/article1"
},
{
"url": "https://example.com/article2"
},
{
"url": "https://example.com/article3"
}
] Batch Response
[
{
"crawlParameters": {
"url": "https://example.com/article1"
},
"results": {
"title": "First Article Title",
"link": "https://example.com/article1",
"content": "Content from first article..."
}
},
{
"crawlParameters": {
"url": "https://example.com/article2"
},
"results": {
"title": "Second Article Title",
"link": "https://example.com/article2",
"content": "Content from second article..."
}
},
{
"crawlParameters": {
"url": "https://example.com/article3"
},
"results": {
"title": "Third Article Title",
"link": "https://example.com/article3",
"content": "Content from third article..."
}
}
]
Response Fields
title: The extracted title of the webpage (if available)link: The original URL that was crawledcontent: The main content extracted from the webpage, cleaned of ads and navigation elements
Key Features
- Clean Content Extraction
- Removes ads and navigation elements
- Preserves important formatting
- Extracts main article content intelligently
- Smart Processing
- Handles different character encodings
- Processes JavaScript-rendered content
- Maintains proper text formatting
- Batch Processing
- Process multiple URLs in one request
- Improve efficiency and reduce API calls
- Handle bulk content extraction
Best Practices
Batch Processing
- Recommended batch size: 5-10 URLs
- Implement retry logic for failed requests
- Handle partial successes appropriately
Authentication
- Keep your API key secure
- Use environment variables for key storage
- Implement proper error handling
Content Handling
- Cache content when appropriate
- Respect robots.txt guidelines
- Implement rate limiting
Use Cases
- Content Aggregation
- Build content archives
- Create research databases
- Develop news aggregators
- AI Training
- Collect training data
- Build content analysis systems
- Create text summarization datasets
- Research Tools
- Academic research
- Market analysis
- Competitive intelligence
Integration Examples
Python Example
import requests
headers = {
'Authorization': 'Bearer your_api_key_here',
'Content-Type': 'application/json'
}
# Single URL crawl
single_data = {
'url': 'https://example.com/article'
}
response = requests.post(
'https://api.search1api.com/crawl',
headers=headers,
json=single_data
)
# Batch crawl
batch_data = [
{'url': 'https://example.com/article1'},
{'url': 'https://example.com/article2'}
]
batch_response = requests.post(
'https://api.search1api.com/crawl',
headers=headers,
json=batch_data
) Error Handling Example
def crawl_with_retry(urls, max_retries=3):
batch_data = [{'url': url} for url in urls]
for attempt in range(max_retries):
try:
response = requests.post(
'https://api.search1api.com/crawl',
headers=headers,
json=batch_data,
timeout=30
)
return response.json()
except requests.exceptions.RequestException:
if attempt == max_retries - 1:
raise
continue
Why Choose Our Crawl API?
- Reliable: Robust content extraction
- Clean: Get only the content you need
- Fast: Optimized for quick response times
- Economic: Starting from free
- Batch-enabled: Process multiple URLs efficiently
Get Started
Visit our API documentation to start using Search1API's Crawl endpoint today. Transform your content extraction capabilities with our powerful API!
No comments yet