“"They spell it Vinci and pronounce it Vinchy; foreigners always spell better than they pronounce" 
Mark Twain

"Everybody has a right to pronounce foreign names as he chooses"  
Winston Churchill 

The Name Matcher allows the user to rapidly vet, validate, disambiguate, aggregate and match person names using cultural-based rules. The Name Matcher can be applied to any task that calls for vetting of names in order to alert the user if there may be cause for continued investigation. Some examples could include:
  • Accurate and thorough matching of names against watch lists to reduce false positives and false negatives.
  • Identifying anomalies where people’s apparent nationality does not match their passport (note that in many countries “naturalization” is not as common as in the United States).
  • Flagging of potential familial or other affiliations between persons whose names are being checked and persons in a watch list.
  • Flagging names of persons as potentially affiliated with high-risk places, tribes, clans or other non-obvious affiliations.
  • Vetting watch lists to discover redundancies and duplicate entries of individuals, that once merged may yield important information.
  • Identification of names that are in the watch list when they appear in large amounts of documents and flagging documents that may contain information on those people. 
The Name Matcher accepts lists of names in various formats. The list may contain additional identifying information about the people they represent (gender, addresses, passport numbers, age, ethnicity etc.).
Upon input of new names, the Name Matcher provides the following:
  • Identification of ethnicity of the input name (Anglo-Saxon, Hispanic, Arabic, Afghani, Iranian, etc.).
  • If the name is written in Latin script but is a Romanized form of an Arabic script name (Arabic, Farsi, Urdu etc.), the system performs “back-transliteration” the identified source language and identifies the original components of the name. Additional languages (Chinese, Korean) are planned.
  • Names components are analyzed in order to extract implicit information such as national origin, gender, religious or sect identity etc. The name components then are linked to potential variants (e.g. a given name Al may be a short form for Albert or Alan or Alistair etc.) in order to facilitate matching of the same entity represented by those variants.
  • Each name is analyzed to identify its constituent parts (given name, patronymic, family name, tribal name, nickname etc.). In many cases, the name may be written with the family name first. In language systems that the user is familiar with, this may be trivial (if we see Smith John, we understand that the more likely name is John Smith), however in foreign languages this is not transparent. In such a case, the system will identify the correct order of the components.
  • All culturally acceptable variants of the parsed name are generated and validated with algorithms based on the naming conventions of the origin-culture of the name.
  • Names and information input with them are examined to identify possible anomalies, inconsistencies, incompleteness or other errors deriving from contradictions between attributes of the name and other data associated with them. For example: a name that is identified with high confidence as Moroccan but is associated with a Saudi passport will be flagged, since Arab states rarely naturalize citizens of other Arab countries.
  • All extracted name entities along with their generated information and all data that was originally associated with that name are stored in the database.
  • The names are then matched with the names in the watch list as potentially: the same person, a father-son relationship, a son-father relationship, a brother, a cousin or potentially a family member. Each relationship receives a confidence score (e.g. if the watch list contains a name “Muhammad Abdallah”, even if there is a match of an input name “Muhammad Abdallah”, the system calculates the statistical probability that this may be the same person, based on the ubiquity of the name.

IntuScan Name Matcher Modules

  • Named Entity Recognizer (NER) distinguishes named entities in unstructured text.
  • Named Entity Analyzer (NEA) makes a cultural-linguistic sensitive analysis of names. It identifies name components, validates names through implicit information hidden in the name, and detects anomalies between name parts.
  • Named Entity Transliterator (NET) validates transliterated names by restoring them to the source language.
  • Named Entity Combiner (NEC) aggregates all name variants and aliases into a single entity.
  • Named Entity Analyzer (NEA) parses the name and analyzes its components to identify all possible name variants and extract implicit information like gender, ethnicity or religion.
  • Named Entity Matcher (NEM) identifies the relations between identified entities (siblings, father-son, grandfather, etc.). IntuScan Name Matcher can also be applied to databases of names and create watchlists to validate their content, merge duplicates, and point out potentially invalid entries that impair the integrity of a database, rendering it cumbersome and ineffective.
To view examples of how IntuScan Name Matcher accounts for different forms of a name in a body of text, click here