Integral employs complex heuristics to evaluate all content on web pages through multiple layers of proprietary and best-in-class data science technologies. We currently have industry-leading visibility, at the individual page level, across over 98% of the commercial web.
Integral’s proprietary data science and human verification technologies have been designed by leading academicians in the data science and machine learning fields. Our rating methodology never relies on single source data to evaluate content — it weighs competing evidence sources to create the most accurate and comprehensive rating of page content.
Our Content Rating Standard Data Sources Include:
- Network Neighboring: Examines interaction between host pages and the pages they’re linked via inbound and outbound links.
- Human Scoring: Utilizes micro-outsourcing technology to score massive numbers of web pages to verify and improve AdSafe’s ratings.
- Semantic Filters: Multiple internally-designed, state-of-the-art semantic filters evaluate all text and html content on web pages.
- Image Analysis: Partner image analysis technologies are leveraged to evaluate images appearing on web pages.
- Site / Domain Registration Information: Utilizes domain registration information with detailed probabilistic models to identify the source of content on pages.
- Html Source Code Evidence: Leverages non-textual html features to reveal the nature of a web page.
- Auxiliary Constructed Variables: Utilizes extraction techniques from data mining and information retrieval to transform raw page content into statistics and features.
- Search Engine Toxic URL lists: Utilizes data around URL pass-through of search engines as an indication of content character.
Other tools leveraged by our rating platform include:
- Site traffic volume data
- Third party malware databases
- Third party bots and robots databases
- Proxy detection heuristics
- Third party click fraud database