Smart Crawling (Advanced)
What is Smart Crawling?
Section titled “What is Smart Crawling?”Smart Crawling is an advanced mode that runs in the browser, so it can see and interact with the page just like a user. This helps WebSync capture content from modern sites where simple link-following misses important content or grabs too much navigation noise.
What it can do:
- Read the page DOM and extract the main content instead of menus and sidebars.
- Click buttons like “Load more” to reveal additional items.
- Scroll to trigger lazy-loaded content.
- Navigate pagination and multi-step layouts.
Crawl specs (in plain terms)
Section titled “Crawl specs (in plain terms)”A crawl spec is a small set of rules that tell Smart Crawling how to handle a specific site. It focuses on:
- Which links to follow.
- Where the main content lives on the page.
- Which UI interactions are needed to reveal all content.
You do not need to configure crawl specs yourself. WebSync applies them automatically when available.
Examples of sites with built-in crawl specs
Section titled “Examples of sites with built-in crawl specs”We maintain crawl specs for popular patterns and platforms, including:
- Documentation sites (docs-style navigation and deep link trees).
- Blog platforms (post lists, categories, pagination).
- Knowledge bases and help centers (nested topics and sidebars).
If you use a site that fits one of these patterns, Smart Crawling typically produces much cleaner results than a generic crawl.
Request support for your website
Section titled “Request support for your website”If you have a site that is not captured well by generic crawling, let us know. We can add or improve a crawl spec for it.
Submit a request here: https://tally.so/r/nGkYRL