1:52
3:12
Some of Moonlit’s core functionalities require web scraping. That includes “Apps” that use functions such as the Sitemap Extractor, or Webpage Scraper, and also for the “Import from Website” option in the knowledge base. However, some websites are protected by anti-bot services, so in this guide we’ll show you how you can whitelist requests coming from Moonlit’s server.
How Moonlit Scraping Bot Identifies itself
A user agent is a short string identifier for an agent performing a web request. Moonlit’s User Agent string is:
Mozilla/5.0 (compatible; Moonlit/1.0; +https://moonlitplatform.com
Depending on your hosting provider or security software, you can add rules that allow this user agent to scrape your website.
Whitelisting in Robots.txt
If you have custom rules set in your site’s robots.txt file you can whitelist moonlit by adding the following lines:
User-agent: Moonlit
Allow:
Hosting Websites
Siteground - whitelisting instructions
Bluehost - whitelisting instructions
Hostgator - whitelisting instructions
GoDaddy - whitelisting instructions
GreenGeeks - whitelisting instructions
If your hosting provider is not listed above, please contact us through the live chat widget on the bottom right and we’ll help you.
Bot Detection Software
CloudFlare
Login to your Cloudflare account, go to the Security tab and select Firewall Rules.
Click on Create a Firewall Rule, and give it a name (i.e Moonlit)
Set the field as User Agent, the operator as ‘includes’ and the value to “Moonlit”
Then turn on the Allow switch and click deploy in the bottom right corner.
Incapsula
In your control panel, navigate to Settings > security and click on Whitelist Specific Sources.
Click on Add Exception, this opens up the whitelist firewall modal.
From the dropdown menu, select ‘User Agent’ and set the value to “Moonlit”