What is oBot?oBot is the web crawling bot of the Content Security Division of IBM Germany Research & Development GmbH. We use several computers to crawl webpages and a large computer cluster to categorize the content of these pages.
The result of this analysis is a compact webfilter database that is made available to our customers in several content filtering products including an SDK for OEM partners. Using several algorithms we can assign more than 65 different categories (https://exchange.xforce.ibmcloud.com/faq#url_categories_list) to webpages.
The crawling process and the following analysis are completely automated processes. oBot uses different parameters to determine the interval of its visits as well as how much data it needs to classify a webpage. Starting on your homepage oBot loads html and other text documents, images, animations and binaries from your webserver to analyse the content. During its visits oBot obeys the entries in your robots.txt and tries to keep its footprint small. Links of the same hosts are not processed in parallel but sequentially and with little pauses in between. oBot stores links in a database and uses them for its next visits. This might cause some “404 -page not found” errors on your webserver if your URLs have changed in the meantime. We would like to apologize for any inconvenience this might cause.
How can administrators keep oBot from crawling (parts of) their site?If you prefer to keep oBot from crawling your site, your administrator can place a robots.txt (http://en.wikipedia.org/wiki/Robots.txt) file in the root folder of your website that specifies which files and folders oBot and other crawlers may or may not access.
The IBM Content Security Crawler's ID that should be used in your robots.txt file is: oBot.
How to verify oBot’s identity?Here´s some information on how to verify oBot´s identity.
- user-agent: Mozilla/5.0 (compatible; oBot/2.3.1; +http://www.xforce-security.com/crawler/)
- our IPV4 ranges are 206.253.224.x, 194.153.113.x and 206.253.225.x, 206.253.226.x and 18.104.22.168/27
- our IPV6 ranges are 2001:1be0:1000:160:0:0:0:0/64, 2001:1be0:1000:167:0:0:0:0/64, 2001:1be0:1000:168:0:0:0:0/64; 2001:1be0:1000:169:0:0:0:0/64 and 2a03:8180:1c01:c1::/64
- use nslookup/dig obot.cobion.com to verify the IP
Why is it called "oBot"?Our Crawler has a history that goes back to the late 1990’s when it was developed and run by an Internet start-up company called ‘ONLY Solutions GmbH’ – that is what the ‘O’ stands for. The crawler’s ID was never changed in the years that followed although the company’s name did. Changing the ID would render existing robots.txt entries addressing “oBot” useless. We do not want Administrators to change existing entries just because we switched our name.
How to contact us?Please use the following email address: technology at kassel.ibm.com