What's the difference between str.isdigit, isnumeric and isdecimal in python?
When I run these methods
s.isdigit()
s.isnumeric()
s.isdecimal()
I always got as output or all True
, or all False
for each value of s
(which is of course a string).
What's the difference between the three? Can you provide an example that gives two True
s and one False
(or vice versa)?
Solution 1:
It's mostly about unicode classifications. Here's some examples to show discrepancies:
>>> def spam(s):
... for attr in 'isnumeric', 'isdecimal', 'isdigit':
... print(attr, getattr(s, attr)())
...
>>> spam('½')
isnumeric True
isdecimal False
isdigit False
>>> spam('³')
isnumeric True
isdecimal False
isdigit True
Specific behaviour is in the official docs here.
Script to find all of them:
import sys
import unicodedata
from collections import defaultdict
d = defaultdict(list)
for i in range(sys.maxunicode + 1):
s = chr(i)
t = s.isnumeric(), s.isdecimal(), s.isdigit()
if len(set(t)) == 2:
try:
name = unicodedata.name(s)
except ValueError:
name = f'codepoint{i}'
print(s, name)
d[t].append(s)
Solution 2:
By definition, isdecimal()
⊆ isdigit()
⊆ isnumeric()
. That is, if a string is decimal
, then it'll also be digit
and numeric
.
Therefore, given a string s
and test it with those three methods, there'll only be 4 types of results.
+-------------+-----------+-------------+----------------------------------+
| isdecimal() | isdigit() | isnumeric() | Example |
+-------------+-----------+-------------+----------------------------------+
| True | True | True | "038", "੦੩੮", "038" |
| False | True | True | "⁰³⁸", "🄀⒊⒏", "⓪③⑧" |
| False | False | True | "↉⅛⅘", "ⅠⅢⅧ", "⑩⑬㊿", "壹貳參" |
| False | False | False | "abc", "38.0", "-38" |
+-------------+-----------+-------------+----------------------------------+
1. Some examples of characters isdecimal()==True
(thus isdigit()==True
and isnumeric()==True
)
"0123456789" DIGIT ZERO~NINE
"٠١٢٣٤٥٦٧٨٩" ARABIC-INDIC DIGIT ZERO~NINE
"०१२३४५६७८९" DEVANAGARI DIGIT ZERO~NINE
"০১২৩৪৫৬৭৮৯" BENGALI DIGIT ZERO~NINE
"੦੧੨੩੪੫੬੭੮੯" GURMUKHI DIGIT ZERO~NINE
"૦૧૨૩૪૫૬૭૮૯" GUJARATI DIGIT ZERO~NINE
"୦୧୨୩୪୫୬୭୮୯" ORIYA DIGIT ZERO~NINE
"௦௧௨௩௪௫௬௭௮௯" TAMIL DIGIT ZERO~NINE
"౦౧౨౩౪౫౬౭౮౯" TELUGU DIGIT ZERO~NINE
"೦೧೨೩೪೫೬೭೮೯" KANNADA DIGIT ZERO~NINE
"൦൧൨൩൪൫൬൭൮൯" MALAYALAM DIGIT ZERO~NINE
"๐๑๒๓๔๕๖๗๘๙" THAI DIGIT ZERO~NINE
"໐໑໒໓໔໕໖໗໘໙" LAO DIGIT ZERO~NINE
"༠༡༢༣༤༥༦༧༨༩" TIBETAN DIGIT ZERO~NINE
"၀၁၂၃၄၅၆၇၈၉" MYANMAR DIGIT ZERO~NINE
"០១២៣៤៥៦៧៨៩" KHMER DIGIT ZERO~NINE
"0123456789" FULLWIDTH DIGIT ZERO~NINE
"𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗" MATHEMATICAL BOLD DIGIT ZERO~NINE
"𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡" MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO~NINE
"𝟢𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫" MATHEMATICAL SANS-SERIF DIGIT ZERO~NINE
"𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵" MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO~NINE
"𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿" MATHEMATICAL MONOSPACE DIGIT ZERO~NINE
2. Some examples of characters isdecimal()==False
but isdigit()==True
(thus isnumeric()==True
)
"⁰¹²³⁴⁵⁶⁷⁸⁹" SUPERSCRIPT ZERO~NINE
"₀₁₂₃₄₅₆₇₈₉" SUBSCRIPT ZERO~NINE
"🄀⒈⒉⒊⒋⒌⒍⒎⒏⒐" DIGIT ZERO~NINE FULL STOP
"🄁🄂🄃🄄🄅🄆🄇🄈🄉🄊" DIGIT ZERO~NINE COMMA
"⓪①②③④⑤⑥⑦⑧⑨" CIRCLED DIGIT ZERO~NINE
"⓿❶❷❸❹❺❻❼❽❾" NEGATIVE CIRCLED DIGIT ZERO~NINE
"⑴⑵⑶⑷⑸⑹⑺⑻⑼" PARENTHESIZED DIGIT ONE~NINE
"➀➁➂➃➄➅➆➇➈" DINGBAT CIRCLED SANS-SERIF DIGIT ONE~NINE
"⓵⓶⓷⓸⓹⓺⓻⓼⓽" DOUBLE CIRCLED DIGIT ONE~NINE
"➊➋➌➍➎➏➐➑➒" DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE~NINE
"፩፪፫፬፭፮፯፰፱" ETHIOPIC DIGIT ONE~NINE
3. Some examples of characters isdecimal()==False
and isdigit()==False
but isnumeric()==True
"½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉" VULGAR FRACTION
"৴৵৶৷৸৹" BENGALI CURRENCY NUMERATOR
"௰௱௲" TAMIL NUMBER TEN, ONE HUNDRED, ONE THOUSAND
"౸౹౺౻౼౽౾" TELUGU FRACTION DIGIT
"൰൱൲൳൴൵" MALAYALAM NUMBER, MALAYALAM FRACTION
"༳༪༫༬༭༮༯༰༱༲" TIBETAN DIGIT HALF ZERO~NINE
"፲፳፴፵፶፷፸፹፺፻፼" ETHIOPIC NUMBER TEN~NINETY, HUNDRED, TEN THOUSAND
"៰៱៲៳៴៵៶៷៸៹" KHMER SYMBOL LEK ATTAK
"ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯ" ROMAN NUMERAL
"ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ" SMALL ROMAN NUMERAL
"ↀↁↂↅↆ" ROMAN NUMERAL
"⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟㊱㊲㊳㊴㊵㊶㊷㊸㊹㊺㊻㊼㊽㊾㊿" CIRCLED NUMBER TEN~FIFTY
"㉈㉉㉊㉋㉌㉍㉎㉏" CIRCLED NUMBER TEN~EIGHTY ON BLACK SQUARE
"⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇" PARENTHESIZED NUMBER TEN~TWENTY
"⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛" NUMBER TEN~TWENTY FULL STOP
"⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴" NEGATIVE CIRCLED NUMBER ELEVEN
"⓾➉❿➓" various styles of CIRCLED NUMBER TEN
"🄌" DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
"〇" IDEOGRAPHIC NUMBER ZERO
"〡〢〣〤〥〦〧〨〩〸〹〺" HANGZHOU NUMERAL ONE~TEN, TWENTY, THIRTY
"㆒㆓㆔㆕" IDEOGRAPHIC ANNOTATION ONE~FOUR MARK
"㈠㈡㈢㈣㈤㈥㈦㈧㈨㈩" PARENTHESIZED IDEOGRAPH ONE~TEN
"㊀㊁㊂㊃㊄㊅㊆㊇㊈㊉" CIRCLED IDEOGRAPH ONE~TEN
"一二三四五六七八九十壹貳參肆伍陸柒捌玖拾零百千萬億兆弐貮贰㒃㭍漆什㐅陌阡佰仟万亿幺兩㠪亖卄卅卌廾廿" CJK UNIFIED IDEOGRAPH
"參拾兩零六陸什" CJK COMPATIBILITY IDEOGRAPH
"𐄇𐄈𐄉𐄊𐄋𐄌𐄍𐄎𐄏𐄐𐄑𐄒𐄓𐄔𐄕𐄖𐄗𐄘" AEGEAN NUMBER ONE~NINE, TEN~NINETY
"𐄙𐄚𐄛𐄜𐄝𐄞𐄟𐄠𐄡𐄢𐄣𐄤𐄥𐄦𐄧𐄨𐄩𐄪" AEGEAN NUMBER ONE~NINE HUNDRED, ONE~NINE THOUSAND
"𐄬𐄭𐄮𐄯𐄰𐄱𐄲𐄳" AEGEAN NUMBER TEN~NINETY THOUSAND
"𐅀𐅁𐅂𐅃𐅆𐅇𐅈𐅉𐅊𐅋𐅌𐅍𐅎𐅏𐅐𐅑𐅒𐅓𐅔𐅕𐅖𐅗𐅘𐅙𐅚𐅛𐅜𐅝𐅞𐅟𐅠𐅡𐅢𐅣𐅤𐅥𐅦𐅧𐅨𐅩𐅪𐅫𐅬𐅭𐅮𐅯𐅰𐅱𐅲𐅳𐅴" GREEK ACROPHONIC ATTIC
"𝍠𝍡𝍢𝍣𝍤𝍥𝍦𝍧𝍨" COUNTING ROD UNIT DIGIT ONE~NINE
"𝍩𝍪𝍫𝍬𝍭𝍮𝍯𝍰𝍱" COUNTING ROD TENS DIGIT ONE~NINE