Overview

IntuView’s core technology is incorporated in IntuScan™ - a platform for real‐time semantic text analytics in a variety of supported languages. The purpose of IntuScan™ is to extract all relevant information from a large quantity of unstructured texts and to generate a structured representation as well as human-readable report, which provides multi-parameter categorisation of the document (genre, topic, political/ideological leanings), entities, sentiments, ideas and other information implicit in the document. Although IntuScan™ is a generic platform it uses domain specific ontologies for processing texts on the semantic level. The current implementation contains ontologies for the relevant verticals, including defense-related packages (e.g. terror and radical ideologies, explosives and more).

Language Detection
IntuScan™ identifies more than 60 languages and is capable of distinguishing between different languages using the same character sets (Arabic, Farsi, Urdu etc.). In addition, IntuScan™ can identify multi-lingual texts and “hybrid language texts” (“Spanglish” – Spanish English, “Frarabic – French Arabic” etc. - created by insertion into a primary language words from a person’s native tongue). 

Entity Extraction 
IntuScan™ deals with entity (persons, places, organizations, events etc.) extraction, resolution and matching through application of statistical algorithms combined with semantic information. IntuScan™ not only finds words in the text that represent such entities but resolves occurrences (including those written in different writing systems such as Latin, Arabic and Cyrillic) that represent the same entity into a single representation, enhances the identification with implicit and contextual information in the text, and performs name analysis based on culture-specific naming conventions. 

Topic Detection 
IntuScan™ identifies the key topics of the given input. The topic identification algorithm is based on statistical machine-learning models adapted for domain-specific topics. The topic identification allows the user to automatically understand the main topics of the processed text (e.g. Terror, Explosives, Insurance, legal etc.)

Sentiment Analysis 
IntuScan™ identifies the attitude of the author of the given text towards different entities, drills down into sentiment towards specific attributes or features of the entities and aggregates sentiment towards a set of linked entities with common “parent” entities (e.g. negative sentiment towards entities that belong to a parent entity (e.g. senior officials of a country who all have in common their affinity to the country or a number of products that all belong to a certain company) may indicate a general negative sentiment towards the parent entity.

Idea Mining  
IntuScan™ uses sophisticated NLP techniques to extract ideas from the text. Ideas (sometimes referred as events) are specific concepts that do not represent a named entity but an action or event that may be important for the user (e.g. suicide bombing preparation, preparation of improvised explosives, identification of a sale opportunity and more).
 
Summarization
IntuScan™ generates a natural language report for any given document. The report characterizes the document and presents the key ideas in their appropriate context. The report is currently available in English and French. This information is also generated in a RDF structure that is stored in a triple-store semantic database for future query.

Social Media Processing
IntuScan™ is now being applied to the realm of social media (Facebook, Twitter, blogs etc.) with the goal of enabling comprehensive analysis of social networks both for defense and security and commercial needs. Processing social media with IntuScan™ provides the user with situational awareness of his market and in depth information on attitudes, trends and sentiment towards issues or products in different social sectors. By doing so, IntuView provides a robust and responsive tool that can provide a low-price alternative to active surveys.
This includes:
  • Language identification including in-depth identification of the “register” or sub-language register.  This includes identification of the text as written in “tweet register”, or as a Romanized version of a non Latin-script language (e.g. Arabic, Russian, and Urdu).  This enables identification of social frames of reference in the language (local Arabic dialects, groups in a country whose dialect can be identified (Hispanics in America, North Africans in France etc.).
  • Clustering of social media associated with subgroups based on their linguistic and interest affinities
  • Identification of areas of focus of the different groups
  • Analysis of the sentiment of the group members towards different entities, products and issues, as well as, the ability to drill down to sentiment towards specific attributes or features of the entities and aggregated sentiment towards parent entities of a number of entities
  • Domain (topic) detection to identify the relevance of the input.
  • Entity extraction, resolution, disambiguation, and aggregation.