In the era of Agentforce, your AI is only as good as the data that grounds it. While many focus on structured CRM data, the "gold mine" of institutional knowledge often lives on your website—in documentation, blogs, and product manuals. This makes customer data unification crucial for businesses.
But how do you get CRM data and static web content into Salesforce Data 360 (formerly Salesforce Data Cloud) to power your autonomous agents?
Here are the five definitive methods to bridge the gap between your website and your AI.
1. The Automated Explorer: Web Content (Crawler) Connector
If you need to ingest vast amounts of public-facing data without manual intervention, the Web Crawler is your best friend. It "reads" your site much like a search engine would.
- How it works: You provide a "Starting URL," and the crawler follows internal links up to 3 or 4 levels deep within your domain.
- Best for: Indexing entire resource centers, blogs, or public documentation hubs.
- Key Requirements: You’ll need the starting URL, desired crawl depth, and—crucially—permission in your site’s robots.txt for the Salesforce bot to enter.
2. The Surgical Strike: Web Content (Sitemap) Connector
Sometimes a crawler is too broad. If you want to be selective about which pages your AI "learns" from, use the Sitemap method. In case the website comprises several hundred pages, you can create a separate sitemap for your targeted URLs for the specific website pages data you want to ingest in the Salesforce data cloud.
- How it works: Instead of following links randomly, Data 360 targets the exact URLs listed in your website’s sitemap.xml.
- Best for: High-value pages like FAQs or Product Catalogues, while ignoring low-value pages like "Terms of Service" or "Contact Us."
- Key Requirements: A valid Sitemap URL and proper authentication settings.
3. The Secure Bridge: Managed File Ingestion (S3 / GCS)
What if your data is behind a login or requires significant cleaning before it’s AI-ready? Moving files to a cloud bucket is the most robust enterprise path.
- How it works: You scrape or export your content into formats like PDF, DOCX, or TXT and drop them into Amazon S3 or Google Cloud Storage. Salesforce Data 360 then syncs these buckets via a native connector.
- Best for: Secure, private data or websites with complex structures that traditional crawlers can't navigate.
- Key Requirements: A manual or scripted export process and a configured Cloud Storage connector.
4. The Quick Upload: Data Library
For smaller, static sets of documents that don't change often, the Data Library offers the path of least resistance.
- How it works: You manually upload up to 2,000 files (PDFs, HTML exports, etc.) directly into a library within the Data 360 interface.
- Best for: "One-and-done" uploads of legal white papers, employee handbooks, or specific technical manuals.
- Key Requirements: Manually prepared files ready for direct upload.
5. The Gold Standard: CMS to Salesforce Knowledge Migration
When accuracy is paramount, moving your content into Salesforce Knowledge first is the superior strategy. This puts a "human-in-the-loop" to verify information before the AI sees it.
- How it works: Export content from your CMS (like WordPress or SharePoint) and import it into Salesforce Knowledge Articles. Once published, the Salesforce CRM Connector syncs these articles directly into Data 360.
- Best for: Content that requires strict version control, multi-language support, or executive approval.
- Key Requirements: A migration plan to move CMS data into the Knowledge__kav object and a publication workflow.
Which Method is Right for You?
Choosing the right data ingestion path depends on your balance of automation vs. control:
| Method |
Automation |
Privacy |
Best Use Case |
| Crawler |
High |
Public |
Massive public hubs |
| Sitemap |
Medium |
Public |
Curated public pages |
| S3 / GCS |
Medium |
Private |
Secure/Complex data |
| Data Library |
Low |
Both |
Small, static file sets |
| Knowledge |
Low |
Private |
High-stakes, verified content |
Summary
Regardless of the path you choose, remember that ingestion is only step one. Once your data is in Salesforce Data 360, you must create a Search Index to vectorize that content, transforming raw text into the "intelligence" that powers your Agentforce agents.