Truncating floats in Python


Question

I want to remove digits from a float to have a fixed number of digits after the dot, like:

1.923328437452 -> 1.923

I need to output as a string to another function, not print.

Also I want to ignore the lost digits, not round them.

1
93
4/20/2016 6:51:38 PM

Accepted Answer

First, the function, for those who just want some copy-and-paste code:

def truncate(f, n):
    '''Truncates/pads a float f to n decimal places without rounding'''
    s = '{}'.format(f)
    if 'e' in s or 'E' in s:
        return '{0:.{1}f}'.format(f, n)
    i, p, d = s.partition('.')
    return '.'.join([i, (d+'0'*n)[:n]])

This is valid in Python 2.7 and 3.1+. For older versions, it's not possible to get the same "intelligent rounding" effect (at least, not without a lot of complicated code), but rounding to 12 decimal places before truncation will work much of the time:

def truncate(f, n):
    '''Truncates/pads a float f to n decimal places without rounding'''
    s = '%.12f' % f
    i, p, d = s.partition('.')
    return '.'.join([i, (d+'0'*n)[:n]])

Explanation

The core of the underlying method is to convert the value to a string at full precision and then just chop off everything beyond the desired number of characters. The latter step is easy; it can be done either with string manipulation

i, p, d = s.partition('.')
'.'.join([i, (d+'0'*n)[:n]])

or the decimal module

str(Decimal(s).quantize(Decimal((0, (1,), -n)), rounding=ROUND_DOWN))

The first step, converting to a string, is quite difficult because there are some pairs of floating point literals (i.e. what you write in the source code) which both produce the same binary representation and yet should be truncated differently. For example, consider 0.3 and 0.29999999999999998. If you write 0.3 in a Python program, the compiler encodes it using the IEEE floating-point format into the sequence of bits (assuming a 64-bit float)

0011111111010011001100110011001100110011001100110011001100110011

This is the closest value to 0.3 that can accurately be represented as an IEEE float. But if you write 0.29999999999999998 in a Python program, the compiler translates it into exactly the same value. In one case, you meant it to be truncated (to one digit) as 0.3, whereas in the other case you meant it to be truncated as 0.2, but Python can only give one answer. This is a fundamental limitation of Python, or indeed any programming language without lazy evaluation. The truncation function only has access to the binary value stored in the computer's memory, not the string you actually typed into the source code.1

If you decode the sequence of bits back into a decimal number, again using the IEEE 64-bit floating-point format, you get

0.2999999999999999888977697537484345957637...

so a naive implementation would come up with 0.2 even though that's probably not what you want. For more on floating-point representation error, see the Python tutorial.

It's very rare to be working with a floating-point value that is so close to a round number and yet is intentionally not equal to that round number. So when truncating, it probably makes sense to choose the "nicest" decimal representation out of all that could correspond to the value in memory. Python 2.7 and up (but not 3.0) includes a sophisticated algorithm to do just that, which we can access through the default string formatting operation.

'{}'.format(f)

The only caveat is that this acts like a g format specification, in the sense that it uses exponential notation (1.23e+4) if the number is large or small enough. So the method has to catch this case and handle it differently. There are a few cases where using an f format specification instead causes a problem, such as trying to truncate 3e-10 to 28 digits of precision (it produces 0.0000000002999999999999999980), and I'm not yet sure how best to handle those.

If you actually are working with floats that are very close to round numbers but intentionally not equal to them (like 0.29999999999999998 or 99.959999999999994), this will produce some false positives, i.e. it'll round numbers that you didn't want rounded. In that case the solution is to specify a fixed precision.

'{0:.{1}f}'.format(f, sys.float_info.dig + n + 2)

The number of digits of precision to use here doesn't really matter, it only needs to be large enough to ensure that any rounding performed in the string conversion doesn't "bump up" the value to its nice decimal representation. I think sys.float_info.dig + n + 2 may be enough in all cases, but if not that 2 might have to be increased, and it doesn't hurt to do so.

In earlier versions of Python (up to 2.6, or 3.0), the floating point number formatting was a lot more crude, and would regularly produce things like

>>> 1.1
1.1000000000000001

If this is your situation, if you do want to use "nice" decimal representations for truncation, all you can do (as far as I know) is pick some number of digits, less than the full precision representable by a float, and round the number to that many digits before truncating it. A typical choice is 12,

'%.12f' % f

but you can adjust this to suit the numbers you're using.


1Well... I lied. Technically, you can instruct Python to re-parse its own source code and extract the part corresponding to the first argument you pass to the truncation function. If that argument is a floating-point literal, you can just cut it off a certain number of places after the decimal point and return that. However this strategy doesn't work if the argument is a variable, which makes it fairly useless. The following is presented for entertainment value only:

def trunc_introspect(f, n):
    '''Truncates/pads the float f to n decimal places by looking at the caller's source code'''
    current_frame = None
    caller_frame = None
    s = inspect.stack()
    try:
        current_frame = s[0]
        caller_frame = s[1]
        gen = tokenize.tokenize(io.BytesIO(caller_frame[4][caller_frame[5]].encode('utf-8')).readline)
        for token_type, token_string, _, _, _ in gen:
            if token_type == tokenize.NAME and token_string == current_frame[3]:
                next(gen) # left parenthesis
                token_type, token_string, _, _, _ = next(gen) # float literal
                if token_type == tokenize.NUMBER:
                    try:
                        cut_point = token_string.index('.') + n + 1
                    except ValueError: # no decimal in string
                        return token_string + '.' + '0' * n
                    else:
                        if len(token_string) < cut_point:
                            token_string += '0' * (cut_point - len(token_string))
                        return token_string[:cut_point]
                else:
                    raise ValueError('Unable to find floating-point literal (this probably means you called {} with a variable)'.format(current_frame[3]))
                break
    finally:
        del s, current_frame, caller_frame

Generalizing this to handle the case where you pass in a variable seems like a lost cause, since you'd have to trace backwards through the program's execution until you find the floating-point literal which gave the variable its value. If there even is one. Most variables will be initialized from user input or mathematical expressions, in which case the binary representation is all there is.

104
2/5/2015 10:15:32 AM

round(1.923328437452, 3)

See Python's documentation on the standard types. You'll need to scroll down a bit to get to the round function. Essentially the second number says how many decimal places to round it to.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon