Our Difference

IntuScan™ is unique in the market as the only tool that goes beyond “data mining” and offers “meaning mining” - a comprehensive domain-specific solution to the need for domain specific intelligence extraction and document exploitation.

Most of the tools in this field are generic Natural Language Processing (NLP) technologies that perform tasks such as morphological analysis, part of speech (POS) analysis at diverse levels, named entity recognition (NER) or “entity extraction”, generic topic classification and document-level or entity-level sentiment identification. Some offer automated translation of foreign language texts. Some of the tools make use of a generic ontology for text classification. However, these tools are generic by nature, hence they suffer from low levels of detail and accuracy in analysis and classification of documents and they are incapable of “reading between the lines” of the text and providing comprehension beyond entity extraction.

IntuScan™, on the other hand, was designed to emulate as closely as possible the intuition and analytical processes of subject matter experts. IntuScan™, therefore, is not merely a “platform”, but a comprehensive solution including an expert knowledgebase and culture and domain specific algorithms that enable in-depth understanding of information that is implicit in the document. 
The following is a table summarizing the difference between Intuview and other NLP technology providers:

GENERAL
 

IntuScan

Competitors

Employs a semantic ontology based technology. This provides a higher level of accuracy.

Most other platforms do not use ontology-based knowledge bases.

Integrates cultural and subject-matter knowledge for deep analysis, categorization and summarization of texts

Most other platforms perform general categorization and do not drill down to sub-categories. Other platforms do not provide a natural language summary in English of the foreign language text.


LANGUAGE IDINTIFICATION

IntuScan

Competitors

IntuScan™ identifies more than 60 languages including distinguishing between different languages using the same character sets (Arabic, Farsi, Urdu etc.) and short strings of texts.

Most tools can identify languages – not all have high precision in identification of “sister languages” in short texts or multi-lingual texts.

IntuScan™ can identify “hybrid language texts” (“Spanglish”, “Frarabic” etc. - created by insertion into a primary language text words from a person’s native tongue). The spelling of a foreign word in a “host” language derives from a number of variables: the original dialect/accent of the speaker from which he draws the pronunciation; the Latin script into which he transliterates (French, English, Spanish); his level of literacy and more. This renders the task of back-transliteration very difficult. IntuScan discovers the source language of such a word (e.g. Arabic written in Latin letters), reverses it to the native script and then performs all the tasks of NLP, entity extraction and content analysis

No other tool has an ability to “read” borrowed words or short phrases from “background languages” (e.g. Arabic) within the text of a “host language” (e.g. English) and hence will not be able to analyze their role in the text properly.


ENTITY EXTRACTION

IntuScan

Competitors

IntuScan™ performs statistical based entity extraction in six languages: English, Arabic, French, Spanish, Urdu and Indonesian, including sub-registers of those languages and “hybrid” languages. The level of accuracy in these languages is high.

Other tools offer various suites of languages in different levels of accuracy. Some tools offer rule base entity extraction with relatively low recall.

IntuScan™ performs all the analysis in the source language of the name. Hence, if the name is identified as coming from a non-Latin script, the matching will be done on the unique spelling of the name in that language

Other platforms aggregate names between languages on the basis of phonetic models transliteration of a foreign source language name into a standard Romanized form.

IntuScan™ extracts not only the “raw” entities but links them to implicit (gender, ethnicity, social) and contextual (titles, status) information. IntuView also uses information from the categories and parameters of the document in which the entities appear to add information to the entity. Entities that represent locations or institutions are enriched with information regarding their possible theater or affiliation (e.g. Atatürk International Airport implies the location in Turkey; Ronald Reagan Airport implies in the US) or are resolved on the basis of the general theater of the document (e.g. Nazareth in Israel or in Pennsylvania).

Other platforms only extract named entities and categorize them as persons, locations etc. and do not enrich them with contextual or implicit information from the names. Extraction of locations from a text in these platforms does not include identification of the likely theater

IntuScan™ integrates culture-specific rules that parse each name and identify the role of each component, possible ethnicity, gender and other implicit information and then aggregates named entities that occur in different scripts or in different forms deriving from cultural naming conventions (e.g. Richard James Smith may be called Rick Smith, Ricky Smith, R.J. Smith, or even Jim Smith).

Other platforms do not extract implicit information or parse the extracted names into components (given name, patronymic, etc.) or aggregate names in different forms.

IntuScan™ uses naming conventions to identify family relations between entities (father-son, sibling, cousin relationships).

Do not discover family relationships or other affinities between entities. IntuView identifies these relationships (siblings, father-sons, cousins etc.) based on naming conventions.


IDEA MINING

IntuScan

Competitors

IntuScan™ uses sophisticated NLP techniques to extract ideas from the text. Ideas (sometimes referred as events) are specific concepts that do not represent a named entity but an action or event that may be important for the user (e.g. suicide bombing preparation, preparation of an improvised explosives, identification of a sale opportunity and more).

Most platforms do not identify “idea entities” or “action entities” and do not extract “between the lines” implications of quotations or sentences that carry non-explicit meanings.


SENTIMENT ANALYSIS

IntuScan

Competitors

IntuScan™ identifies the attitude of the author of the given text towards different entities and drills down into sentiment towards specific attributes or features of the entities. The domain-dependent approach of semantic analysis enables drill-down to these nuances. At the same time, it aggregates sentiment towards a set of linked entities to identify sentiment towards the “parent” entity (e.g. negative sentiment towards entities linked to the United States may indicate a general negative sentiment towards the United States. The sentiment analysis algorithms are based on statistical models composed of semantic features.

Tools that analyze "sentiment" are primarily based on words that represent generic positive or negative attitudes. These can give an idea regarding very strong sentiment (terrible, awful, marvelous, good, bad) but cannot drill down to "sentiment" that is conditional on the domain or object of the word such lexical instances are domain and culture specific and do not hold the same connotation in all texts.


SOCIAL MEDIA ANALYTICS

IntuScan

Competitors

IntuScan™ is now being applied to the realm of social media (Facebook, Twitter, blogs etc.) with the goal of enabling comprehensive analysis of social networks both for defense and security and commercial needs. Processing social media with IntuScan™ provides the user with situational awareness: in depth information on attitudes, trends and sentiment towards issues or products in different social sectors.
       Language identification including in-depth identification of the “register” or sub-language register. This includes identification of the text as written in “tweet register”, or as a Romanized version of a non Latin-script language (e.g. Arabic, Russian, Urdu). This enables identification of social frames of reference in the language.
       Clustering of social media associated with subgroups based on their linguistic and interest affinities.
       Identification of areas of focus of the different groups.
       Analysis of the sentiment of the group members towards different entities, products and issues, and drill down to sentiment towards specific attributes or features of the entities and aggregated sentiment towards parent entities of a number of entities.
       Domain (topic) detection to identify the relevance of the input.
       Entity extraction, resolution, disambiguation and aggregation.
This capability enables the user to perform:
       Evaluation of market size, brand prominence and effect of campaigns by references in social media.
       Breakdown of the information to age groups and social sectors.
       Evaluation of sentiment towards products, entities and issues.
       Breakdown of what is liked and not liked in the target product or entity.
       Early warning on events that can become catalysts for either crisis or opportunity.

Other platforms use key words and basic sentiment indicators. They do not differentiate between different linguistic registers (that characterize different social strata) and do not use domain-specific ontology-based knowledge bases to aggregate different ways of referring to the same entity or idea.



DOCUMENT EXPLOTATION

IntuScan

Competitors

IntuScan™ offers a comprehensive document exploitation system: categorization, sentiment analysis, identification of authorship, political leanings etc.

Other platforms do not offer an integrated system for document exploitation but only the above-mentioned tools of entity extraction and name matching as building blocks to be integrated into other systems.

 

IntuScan

Competitors

IntuScan™ provides semantic representation of all the extracted and inferred information in a Triple-Store RDF flexible semantic database that allows sophisticated SPARQL queries.

Technology is purely lexical. The output is not presented in formats that can be flexibly manipulated in semantic queries.