Add Websites & URLs
Automatically crawl your website to add all its content to your Knowledge Base. Your agent learns from your web pages, blog posts, and documentation without manual uploads.What is Website Crawling?
Instead of uploading documents one by one, ChatCrafterAI can visit your website and extract all the text content. It’s like giving your agent a tour of your website and letting it memorize everything. What it crawls:- All text content from your pages
- Product pages and descriptions
- Blog posts and articles
- Support documentation
- FAQ pages
- Pricing pages
- About pages
- Any other text content
- Navigation menus
- Ads and banners
- Images (unless they contain text)
- JavaScript interactive elements
How to Add Your Website
Option 1: Single URL
Add just one specific page:- Go to Knowledge Base
- Click “Add URL”
- Paste one URL - Example: www.yoursite.com/about
- Click Add
- Done - That page is now in your Knowledge Base
- Support FAQ pages
- Specific product pages
- Documentation pages
- Blog posts
Option 2: Multiple URLs
Add several specific pages at once:- Go to Knowledge Base
- Click “Add Multiple URLs”
- Paste URLs - One per line:
- Click Add All
- Done - All pages added to Knowledge Base
- Key pages only
- Specific sections you want indexed
- Avoiding non-important pages
Option 3: Full Website Crawl
Let the agent explore your entire website:- Go to Knowledge Base
- Click “Crawl Website”
- Enter your domain - www.yoursite.com
- Choose crawl depth:
- Shallow - Homepage + main pages only (fastest)
- Medium - Homepage + 2 levels deep (balanced)
- Deep - Homepage + 3 levels deep (most complete)
- Click Start Crawl
- Wait - Takes 5-30 minutes depending on site size
- Review & Approve - See what was crawled before finalizing
- Your entire website
- Starting from scratch
- Comprehensive coverage
Option 4: Sitemap
Upload your website’s sitemap (if you have one):- Go to Knowledge Base
- Click “Import Sitemap”
- Paste sitemap URL - Usually: www.yoursite.com/sitemap.xml
- Click Import
- Agent crawls all URLs from the sitemap automatically
- Large websites
- When you have a sitemap
- Complex site structures
Option 5: Homepage Only
Just add your homepage content:- Go to Knowledge Base
- Click “Add URL”
- Enter: www.yoursite.com (or www.yoursite.com/index.html)
- Click Add
- Starting simple
- Testing before crawling whole site
- Small websites
Review Crawled Content
After crawling, you’ll see a preview:- List of crawled pages - Shows all pages found
- Preview of content - See what was extracted from each page
- Approve or reject - Keep pages you want, skip others
- See stats:
- Total pages crawled
- Total content added
- Processing time
- ✅ Are important pages included?
- ❌ Are duplicate pages or ads included?
- ✅ Is the extracted text accurate?
- ❌ Does it include navigation menus?
Keep Content Updated
Auto-Recrawl (Optional)- Set schedule: Daily, Weekly, or Monthly
- Agent automatically updates your content
- Prices and information stay current
- No manual work needed
- Anytime you make website changes
- Just click “Recrawl” next to the website
- Takes a few minutes
- After changing prices
- After updating policies
- After adding new blog posts
- After restructuring your site
- Monthly (if auto-recrawl isn’t enabled)
Tips for Best Results
Clear Site Structure- Use proper headings (
<h1>,<h2>,<h3>) - Organize content in sections
- Clear hierarchy helps agent understand
- Instead of “Page 1” use “Return Policy”
- Instead of “Service XYZ” use “Premium Support Service”
- Good titles help agent find right content
- Avoid content in images (agent can’t read it)
- Use text for important information
- Avoid duplicate content on multiple pages
- Delete old blog posts if information changed
- Remove draft/test pages from being crawled
- Exclude login/admin pages
- Exclude navigation from crawling if possible
- Focus crawler on content, not menus
- Cleaner results = better agent answers
Troubleshooting
Some pages didn’t crawl- Pages might be behind login - make them public for crawling
- JavaScript content won’t be extracted - add text version
- Robot.txt might block crawling - check your robot.txt file
- Re-crawl with “Shallow” depth setting
- Manually select pages instead of full crawl
- Remove pages from Knowledge Base after crawling
- Delete old crawled content manually
- Re-crawl to get updated version
- Or enable auto-recrawl
- Some sites have pages accessible via multiple URLs
- Keep one version, delete duplicates
- Doesn’t hurt agent, but wastes space
What Happens Next
After adding URLs:- Crawling - Agent visits your pages (might take minutes/hours)
- Extraction - Agent reads and extracts text content
- Indexing - Content becomes searchable by agent
- Ready - Agent can answer questions using this content
Next: Train Your Agent to test and improve answers