Site indexing policy
For Web Site Administrators
You may have noticed “IAS Crawler (ias_crawler; http://integralads.com/site-indexing-policy/)” and “IAS Wombles (ias_wombles; http://integralads.com/site-indexing-policy/)” and be wondering why these crawlers are visiting your site, or you may want to invite one or both of these robots to crawl your site. Integral uses machine learning and other analysis to provide content rating and certification services for Brands, Agencies, Ad-Networks and Publishers. Allowing our robots to crawl your site is essential for us to be able to provide an accurate rating of your site’s content and to further refine our certification services for content rating and invalid traffic. If our “IAS Crawler” robot is blocked, we will be unable to provide an accurate rating of your site and thus your site will be inaccessible to any of our partner advertisers. If our “IAS Wombles” robot is blocked, we will not be able to capture additional information to analyze our invalid traffic models.
You can find our public facing IPs here.
To block “IAS Crawler (ias_crawler; http://integralads.com/site-indexing-policy/)” and “IAS Wombles (ias_wombles; http://intergralads.com/site-indexing-policy/)” from crawling your site, please read below. Additional information regarding our privacy policy and technology can be found on the following pages Privacy Policy and Products pages.
If there are areas of your site that you would like to prohibit the robots from crawling, simply inform us of your crawling parameters via the Standard for Robot Exclusion (SRE). The SRE standard governs the practices of most of the major Web-crawling groups and Integral strictly adheres to the standard.
When crawling your site, the Integral crawlers seeks out a file called “robots.txt” which website administrators can place at the top level of a site to direct the behavior of web crawling robots.
The Integral crawlers will always pick up a copy of the robots.txt file prior to their respective crawl of the Web. If you change your robots.txt file while we are crawling your site, please let us know so that we can instruct the crawlers to retrieve the updated instructions contained in the robots.txt file.
To exclude all robots, the robots.txt file should look like this:
User-agent: *
Disallow: /
To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this:
User-agent: *
Disallow: /images/
Web site administrators can allow or disallow specific robots from visiting part or all of their site. Integral crawlers identify themselves as ias_crawler and ias_wombles, and so to allow ias_crawler and/or ias_wombles to visit (while preventing all others), your robots.txt file should look like this:
User-agent: ias_crawler
Disallow:
To prevent ias_crawler from visiting (while allowing all others), your robots.txt file should look like this:
User-agent: ias_crawler
Disallow: /
For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at www.robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.
There are a few reasons that Integral may not have visited your site. Your site may be new or we may not have been directed to your site by our Brand, Agency, or Ad-Network partners. It is also possible that your web site administrator has disallowed crawlers from visiting your site. Please read the information about robots.txt that we have provided above to ensure your preferences are being honored.