confusable_homoglyphs package

Submodules

confusable_homoglyphs.categories module

confusable_homoglyphs.categories.alias(chr)[source]

Retrieves the script block alias for a unicode character.

>>> categories.alias('A')
'LATIN'
>>> categories.alias('τ')
'GREEK'
>>> categories.alias('-')
'COMMON'
Parameters

chr (str) – A unicode character

Returns

The script block alias.

Return type

str

confusable_homoglyphs.categories.aliases_categories(chr)[source]

Retrieves the script block alias and unicode category for a unicode character.

>>> categories.aliases_categories('A')
('LATIN', 'L')
>>> categories.aliases_categories('τ')
('GREEK', 'L')
>>> categories.aliases_categories('-')
('COMMON', 'Pd')
Parameters

chr (str) – A unicode character

Returns

The script block alias and unicode category for a unicode character.

Return type

(str, str)

confusable_homoglyphs.categories.category(chr)[source]

Retrieves the unicode category for a unicode character.

>>> categories.category('A')
'L'
>>> categories.category('τ')
'L'
>>> categories.category('-')
'Pd'
Parameters

chr (str) – A unicode character

Returns

The unicode category for a unicode character.

Return type

str

confusable_homoglyphs.categories.unique_aliases(string)[source]

Retrieves all unique script block aliases used in a unicode string.

>>> categories.unique_aliases('ABC')
{'LATIN'}
>>> categories.unique_aliases('ρAτ-')
{'GREEK', 'LATIN', 'COMMON'}
Parameters

string (str) – A unicode character

Returns

A set of the script block aliases used in a unicode string.

Return type

(str, str)

confusable_homoglyphs.cli module

confusable_homoglyphs.cli.generate_categories()[source]

Generates the categories JSON data file from the unicode specification.

Returns

True for success, raises otherwise.

Return type

bool

confusable_homoglyphs.cli.generate_confusables()[source]

Generates the confusables JSON data file from the unicode specification.

Returns

True for success, raises otherwise.

Return type

bool

confusable_homoglyphs.confusables module

exception confusable_homoglyphs.confusables.Found[source]

Bases: Exception

confusable_homoglyphs.confusables.is_confusable(string, greedy=False, preferred_aliases=[])[source]

Checks if string contains characters which might be confusable with characters from preferred_aliases.

If greedy=False, it will only return the first confusable character found without looking at the rest of the string, greedy=True returns all of them.

preferred_aliases=[] can take an array of unicode block aliases to be considered as your ‘base’ unicode blocks:

  • considering paρa,

    • with preferred_aliases=['latin'], the 3rd character ρ would be returned because this greek letter can be confused with latin p.

    • with preferred_aliases=['greek'], the 1st character p would be returned because this latin letter can be confused with greek ρ.

    • with preferred_aliases=[] and greedy=True, you’ll discover the 29 characters that can be confused with p, the 23 characters that look like a, and the one that looks like ρ (which is, of course, p aka LATIN SMALL LETTER P).

>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character']
'ρ'
>>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character']
'p'
>>> confusables.is_confusable('Abç', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('ρττ', preferred_aliases=['greek'])
False
>>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common'])
False
>>> confusables.is_confusable('ρττp')
[{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
Parameters
  • string (str) – A unicode string

  • greedy (bool) – Don’t stop on finding one confusable character - find all of them.

  • preferred_aliases (list(str)) – Script blocks aliases which we don’t want string’s characters to be confused with.

Returns

False if not confusable, all confusable characters and with what they are confusable otherwise.

Return type

bool or list

confusable_homoglyphs.confusables.is_dangerous(string, preferred_aliases=[])[source]

Checks if string can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones in preferred_aliases that might be confusable with characters from scripts in preferred_aliases

For preferred_aliases examples, see is_confusable docstring.

>>> bool(confusables.is_dangerous('Allo'))
False
>>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin']))
False
>>> bool(confusables.is_dangerous('Alloρ'))
True
>>> bool(confusables.is_dangerous('AlaskaJazz'))
False
>>> bool(confusables.is_dangerous('ΑlaskaJazz'))
True
Parameters
  • string (str) – A unicode string

  • preferred_aliases (list(str)) – Script blocks aliases which we don’t want string’s characters to be confused with.

Returns

Is it dangerous.

Return type

bool

confusable_homoglyphs.confusables.is_mixed_script(string, allowed_aliases=['COMMON'])[source]

Checks if string contains mixed-scripts content, excluding script blocks aliases in allowed_aliases.

E.g. B. C is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.

>>> confusables.is_mixed_script('Abç')
False
>>> confusables.is_mixed_script('ρτ.τ')
False
>>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[])
True
>>> confusables.is_mixed_script('Alloτ')
True
Parameters
  • string (str) – A unicode string

  • allowed_aliases (list(str)) – Script blocks aliases not to consider.

Returns

Whether string is considered mixed-scripts or not.

Return type

bool

confusable_homoglyphs.utils module

confusable_homoglyphs.utils.delete(filename)[source]

Deletes a JSON data file if it exists.

confusable_homoglyphs.utils.dump(filename, data)[source]
confusable_homoglyphs.utils.get(url, timeout=None)[source]
confusable_homoglyphs.utils.load(filename)[source]

Loads a JSON data file.

Returns

A dict.

Return type

dict

confusable_homoglyphs.utils.path(filename)[source]

Returns a file path relative to this package directory.

Returns

A file path string.

Return type

str

confusable_homoglyphs.utils.u(x)[source]

Module contents