Counterintuitive behaviour of int() in python

It's clearly stated in the docs that int(number) is a flooring type conversion:

int(1.23)
1

and int(string) returns an int if and only if the string is an integer literal.

int('1.23')
ValueError

int('1')
1

Is there any special reason for that? I find it counterintuitive that the function floors in one case, but not the other.


Solution 1:

There is no special reason. Python is simply applying its general principle of not performing implicit conversions, which are well-known causes of problems, particularly for newcomers, in languages such as Perl and Javascript.

int(some_string) is an explicit request to convert a string to integer format; the rules for this conversion specify that the string must contain a valid integer literal representation. int(float) is an explicit request to convert a float to an integer; the rules for this conversion specify that the float's fractional portion will be truncated.

In order for int("3.1459") to return 3 the interpreter would have to implicitly convert the string to a float. Since Python doesn't support implicit conversions, it chooses to raise an exception instead.

Solution 2:

This is almost certainly a case of applying three of the principles from the Zen of Python:

Explicit is better implicit.

[...] practicality beats purity

Errors should never pass silently

Some percentage of the time, someone doing int('1.23') is calling the wrong conversion for their use case, and wants something like float or decimal.Decimal instead. In these cases, it's clearly better for them to get an immediate error that they can fix, rather than silently giving the wrong value.

In the case that you do want to truncate that to an int, it is trivial to explicitly do so by passing it through float first, and then calling one of int, round, trunc, floor or ceil as appropriate. This also makes your code more self-documenting, guarding against a later modification "correcting" a hypothetical silently-truncating int call to float by making it clear that the rounded value is what you want.

Solution 3:

Sometimes a thought experiment can be useful.

  • Behavior A: int('1.23') fails with an error. This is the existing behavior.
  • Behavior B: int('1.23') produces 1 without error. This is what you're proposing.

With behavior A, it's straightforward and trivial to get the effect of behavior B: use int(float('1.23')) instead.

On the other hand, with behavior B, getting the effect of behavior A is significantly more complicated:

def parse_pure_int(s):
    if "." in s:
        raise ValueError("invalid literal for integer with base 10: " + s)
    return int(s)

(and even with the code above, I don't have complete confidence that there isn't some corner case that it mishandles.)

Behavior A therefore is more expressive than behavior B.

Another thing to consider: '1.23' is a string representation of a floating-point value. Converting '1.23' to an integer conceptually involves two conversions (string to float to integer), but int(1.23) and int('1') each involve only one conversion.


Edit:

And indeed, there are corner cases that the above code would not handle: 1e-2 and 1E-2 are both floating point values too.

Solution 4:

In simple words - they're not the same function.

  • int( decimal ) behaves as 'floor i.e. knock off the decimal portion and return as int'
  • int( string ) behaves as 'this text describes an integer, convert it and return as int'.

They are 2 different functions with the same name that return an integer but they are different functions.

'int' is short and easy to remember and its meaning applied to each type is intuitive to most programmers which is why they chose it.

There's no implication they are providing the same or combined functionality, they simply have the same name and return the same type. They could as easily be called 'floorDecimalAsInt' and 'convertStringToInt', but they went for 'int' because it's easy to remember, (99%) intuitive and confusion would rarely occur.

Parsing text as an Integer for text which included a decimal point such as "4.5" would throw an error in majority of computer languages and be expected to throw an error by majority of programmers, since the text-value does not represent an integer and implies they are providing erroneous data