Skip to main content

Add Websites & URLs

Automatically crawl your website to add all its content to your Knowledge Base. Your agent learns from your web pages, blog posts, and documentation without manual uploads.

What is Website Crawling?

Instead of uploading documents one by one, ChatCrafterAI can visit your website and extract all the text content. It’s like giving your agent a tour of your website and letting it memorize everything. What it crawls:
  • All text content from your pages
  • Product pages and descriptions
  • Blog posts and articles
  • Support documentation
  • FAQ pages
  • Pricing pages
  • About pages
  • Any other text content
What it skips:
  • Navigation menus
  • Ads and banners
  • Images (unless they contain text)
  • JavaScript interactive elements

How to Add Your Website

Option 1: Single URL

Add just one specific page:
  1. Go to Knowledge Base
  2. Click “Add URL”
  3. Paste one URL - Example: www.yoursite.com/about
  4. Click Add
  5. Done - That page is now in your Knowledge Base
Best for:
  • Support FAQ pages
  • Specific product pages
  • Documentation pages
  • Blog posts

Option 2: Multiple URLs

Add several specific pages at once:
  1. Go to Knowledge Base
  2. Click “Add Multiple URLs”
  3. Paste URLs - One per line:
    www.yoursite.com/products
    www.yoursite.com/pricing
    www.yoursite.com/faq
    www.yoursite.com/support
    
  4. Click Add All
  5. Done - All pages added to Knowledge Base
Best for:
  • Key pages only
  • Specific sections you want indexed
  • Avoiding non-important pages

Option 3: Full Website Crawl

Let the agent explore your entire website:
  1. Go to Knowledge Base
  2. Click “Crawl Website”
  3. Enter your domain - www.yoursite.com
  4. Choose crawl depth:
    • Shallow - Homepage + main pages only (fastest)
    • Medium - Homepage + 2 levels deep (balanced)
    • Deep - Homepage + 3 levels deep (most complete)
  5. Click Start Crawl
  6. Wait - Takes 5-30 minutes depending on site size
  7. Review & Approve - See what was crawled before finalizing
Best for:
  • Your entire website
  • Starting from scratch
  • Comprehensive coverage

Option 4: Sitemap

Upload your website’s sitemap (if you have one):
  1. Go to Knowledge Base
  2. Click “Import Sitemap”
  3. Paste sitemap URL - Usually: www.yoursite.com/sitemap.xml
  4. Click Import
  5. Agent crawls all URLs from the sitemap automatically
Best for:
  • Large websites
  • When you have a sitemap
  • Complex site structures

Option 5: Homepage Only

Just add your homepage content:
  1. Go to Knowledge Base
  2. Click “Add URL”
  3. Enter: www.yoursite.com (or www.yoursite.com/index.html)
  4. Click Add
Best for:
  • Starting simple
  • Testing before crawling whole site
  • Small websites

Review Crawled Content

After crawling, you’ll see a preview:
  1. List of crawled pages - Shows all pages found
  2. Preview of content - See what was extracted from each page
  3. Approve or reject - Keep pages you want, skip others
  4. See stats:
    • Total pages crawled
    • Total content added
    • Processing time
Check for:
  • ✅ Are important pages included?
  • ❌ Are duplicate pages or ads included?
  • ✅ Is the extracted text accurate?
  • ❌ Does it include navigation menus?
If something looks wrong, you can re-crawl with different settings or manually add/remove specific pages.

Keep Content Updated

Auto-Recrawl (Optional)
  • Set schedule: Daily, Weekly, or Monthly
  • Agent automatically updates your content
  • Prices and information stay current
  • No manual work needed
Manual Recrawl
  • Anytime you make website changes
  • Just click “Recrawl” next to the website
  • Takes a few minutes
When to recrawl:
  • After changing prices
  • After updating policies
  • After adding new blog posts
  • After restructuring your site
  • Monthly (if auto-recrawl isn’t enabled)

Tips for Best Results

Clear Site Structure
  • Use proper headings (<h1>, <h2>, <h3>)
  • Organize content in sections
  • Clear hierarchy helps agent understand
Descriptive Page Titles
  • Instead of “Page 1” use “Return Policy”
  • Instead of “Service XYZ” use “Premium Support Service”
  • Good titles help agent find right content
Keep Content Clear
  • Avoid content in images (agent can’t read it)
  • Use text for important information
  • Avoid duplicate content on multiple pages
Remove Outdated Pages
  • Delete old blog posts if information changed
  • Remove draft/test pages from being crawled
  • Exclude login/admin pages
Check Navigation
  • Exclude navigation from crawling if possible
  • Focus crawler on content, not menus
  • Cleaner results = better agent answers

Troubleshooting

Some pages didn’t crawl
  • Pages might be behind login - make them public for crawling
  • JavaScript content won’t be extracted - add text version
  • Robot.txt might block crawling - check your robot.txt file
Too many irrelevant pages
  • Re-crawl with “Shallow” depth setting
  • Manually select pages instead of full crawl
  • Remove pages from Knowledge Base after crawling
Old content still there
  • Delete old crawled content manually
  • Re-crawl to get updated version
  • Or enable auto-recrawl
Duplicate content
  • Some sites have pages accessible via multiple URLs
  • Keep one version, delete duplicates
  • Doesn’t hurt agent, but wastes space

What Happens Next

After adding URLs:
  1. Crawling - Agent visits your pages (might take minutes/hours)
  2. Extraction - Agent reads and extracts text content
  3. Indexing - Content becomes searchable by agent
  4. Ready - Agent can answer questions using this content

Next: Train Your Agent to test and improve answers