There’s a simple Java program FloatCalc.java:

public class FloatCalc {

    public static void main(String[] args) {
        System.out.println(1 - 0.56);


What will be the output of this program?

Bonus question: Does the computation occur in run-time or compile-time?


17 Responses to “FloatCalc”

  1. Jhon Doo Says:

    Whi it named Float Calc instead of Double Calc?

  2. raulrw Says:

    I think that since java is interpreted (i.e: by a web browser) the result should be at run-time, but since you are not using any variables FloatCalc.class might be solved (highly probably). There is no way to answer it, because it depends in the program that compiles FloatCalc.class. The result should be 0.44 exactly, but once again, depending on the compiler program, the compiler options and the machine, it could be different.

    I found this site, which is very interesting: http://introcs.cs.princeton.edu/java/91float/

    FloatCalc (for “a simple floating point calculation”) is the appropiate name.

    • Alexey Ivanov Says:

      Raul, thank you very much for your comment.

      It’s true that Java is interpreted language but it’s not directly interpreted: first the Java source code is compiled into Java byte code, it is the latter that is interpreted by a Java Virtual Machine (JVM). (A web browser never interprets Java code; not to be confused with JavaScript.)

      Now we know: Java is both compiled and interpreted language. You noticed that 1 - 0.56 is constant expression, thus it could be evaluated at compile-time. You can examine FloatCalc.class to make sure there’s no calculation in .class file. Surely, it depends on the compiler used, but I’m pretty sure each Java compiler will evaluate this expression in compile time.

      The result… you expect it to be 0.44 but it does not have to be. The thing is floating point calculations are not exact. In this particular case, the program will print 0.43999999999999995 which is close to 0.44. The site you gave the link to also explains it’s incorrect to expect floating point calculations are exact: there are roundoff errors.

      • raulrw Says:

        It might be kind of funny, but I agree with you in all. I’m just try to say that it is sort of unpredictable. In the website above, severals test has been runned, and some case are rounded while some others don’t.
        Regarding to the Java Virtual Machine, even when the browser do not interpret the java binary class by itself, it’s still interpreted.
        Latest computer do not have floating point processor because they solve floating point operations by software, with one of these cores.
        Also IEEE has realeased, in 2008, a 754 standard that allows Binary and Decimal Floating Point Numbers (base 2 and base 10). You can see it at http://grouper.ieee.org/groups/754/, and you can download pdf at http://www.math.fsu.edu/~gallivan/courses/FCM1/IEEE-fpstandard-2008.pdf.gz (I just search in google).
        But, once again, I totally agree with you, and I think that is very good to debate about it. It’s a very important subject, and not only for java.
        Other thing that I would like to remark, as a comment, is that double floating points are at most 16 sifnificant digits, but mostly 14, no matter how much zeros you may have before of after the significant digits. It’s not the case of 0.44, but is still interesting.
        There are, also, many libraries that solve Quadruple Floating Points (128 bits, 16 bytes) by software.

        • Alexey Ivanov Says:

          Raul, you’re right it’s sort of unpredictable… Well, it’s predictable if know the underlying floating point data structures and you’ll get the same result. But the result could be counter-intuitive: humans usually deal with numbers where all calculations are exact. You expect that the following statement is true:

          (100 – 56) / 100 = 1 – 0.56.

          But it’s not in computer software and/or hardware.

          Yes, I agree Java is an interpreted language. Yet you shouldn’t forget about Java Compiler which makes many optimizations of the Java source code. By asking the bonus question, I hinted to it: the computation will take place in compile time. The byte code interpreted by the JVM will contain the result of the calculation: it will load the value and pass it to System.out.println(double) method.

          Latest computers do have floating point co-processor, it’s just integrated into the main CPU whereas floating point co-processor was a separate unit long ago. It’s similar to dual-core where two processor core are integrated into one chip.

          • Raul Riesco W Says:

            Hi Alexey,

            I really appreciate your answer.

            I don’t want to be rude in anyway, but if you try the problem above in C, C#, Pascal, Fortran, Visual Basic, or even in Excel, you’ll get the right value, as long as you request enough decimals in the display function.

            I’m quite sure that the problem is in the Java System.out.println function. There are some websites that treat this.

            • Alexey Ivanov Says:

              Hi Raul,

              I appreciate your interest. You’re not rude. 😉

              I did test it in C/C++ code, and yes, that sample program prints 0.440000. But it proves nothing. Well, it proves that Java has different default precision when printing.

              Of course, Java does not handle numbers as strings. (This would have been highly inefficient.) Primitive types in Java are roughly the same as in C or C++.

              Excel is a different kind. I’m pretty sure it employs special tweaks to keep calculations as precise as possible.

  3. Raul Riesco W Says:

    Hi Alexey,

    I had been working around your problem.
    I run it in both, Borland C 5.0 and Visual C.
    Borland C gave me 0.43999999999999995 while Visual C gave me 0.44.
    I wrote this code that do not solve the problem but illustrates hows IEEE754 works:

    • Alexey Ivanov Says:

      Hi Raul,

      Well, it’s not my problem actually. This example is taken from a real bug.

      As I said above, I ran this code in C++ using Microsoft Visual Studio 2010. The sample program gives the expected result: 0.440000. So far so good. Java prints this numeral with 17-digit precision: 0.43999999999999995. If you make C/C++ print 17 digits after the decimal point, you’ll get the same result: 0.43999999999999995.

      Further, the underlying double bits are the same both in Java and in C++. I’ll post the updated Java code as well as C++ code later. I planned to do it this way from the start but then I had different tasks and different bugs on my mind, and I haven’t completed this work. Now your comments prompted me to complete it.

  4. Raul Riesco W Says:

    O.k. Alexey, you are right, I was wrong about handling numbers as strings. I read something about it long time ago, but it was probably in PHP not in Java.

    The following, code written in C, give the results below, which compared with the results in the Java code I posted above, shows that the internal representation of the number is the same, but the rational (human) representation is different. It’s quite possible that some “tweaks” might be adjusting the result. Or maybe not. It also works to me in Pascal, Fortran and Visual Basic. You can easily try it with a vbs script.

    Some compilers have options to instruct the using of the FPU or software libraries to solve floating point operations. I still think the problem might be in the println function. I guess I read about it somewhere before.

    Also, some compilers uses, internally, an extended floating number of 80 bits to avoid overflow, when they are performing the operation. Visual C stops using it. You can read about it in https://msdn.microsoft.com/en-us/library/9cx8xs15.aspx

    It doesn’t matter if computers have FPU or not. The question is: Are they using the FPU?

    C code compiled with “cl qq.c”:

    typedef unsigned char byte;
    typedef unsigned short word;
    typedef unsigned long dword;
    typedef unsigned __int64 qword;

    char *longToBin(qword q)
    int i;
    qword t;
    static char *r = “0000000000000000000000000000000000000000000000000000000000000000”;

    for (i = 0, t = 1; i 0) r[63-i] = ‘1’;
    return r;

    void main(void)
    char *b;
    double a = 1.0 – 0.56;

    printf(“a = %20.16lf\n”, a);
    b = longToBin(*((qword *)&a));
    printf(“b = %s\n”, b);

    Output of qq.c:
    C:\qq (VisualC code)
    a = 0.4400000000000000
    b = 0011111111011100001010001111010111000010100011110101110000101000

    Output of ieee754.java (Posted in the previous comment)
    C:\javac ieee754.java

    C:\java ieee754
    b = 0011111111011100001010001111010111000010100011110101110000101000
    l = 4601597955262077992
    s = 1 (Sign, 1 positive [bit 63 = 0], -1 negative [bit 63 = 1])
    e = 1021 (Exponent, biased)
    m = 3422735716801576 (Mantissa)
    r = 0.43999999999999995

    a = 0.43999999999999995
    b = 44.0
    c = 0.43999999999999995

    • Alexey Ivanov Says:

      Well, this is what I’m talking about: the binary representation is the same. The difference is in output only. If you ask your C program to print the double value with 17 digits after the decimal point:

      printf("a = %.17lf\n", a);

      you’ll get the same result that Java prints.

      The point of this post was that you should not expect the result will be as exact or precise as it is in with usual finite decimal fractions. And the output may be different because different programming languages could use different tweaks in their output algorithms. Usually some tweaks are applied to get the result that one expects.

  5. Raul Riesco W Says:

    The “for” loop into the longToBin function in the post above is truncated. It should say:

    for (i = 0, t = 1; i 0) r[63-i] = ‘1’;

  6. Raul Riesco W Says:

    O.K. good.

    It’s good to know that I can use < and >. It didn’t came into my mind, and I didn’t want to add another reply. Please remove any not necessary cheap chat (or jibber-jabber). Maybe in the future, you may write me directly to my e-mail address.

    Regarding to floating point number, I think that everything revolves around these “tweaks”. It’s a very esoteric and dark region. I mean, what exactly are those tweaks, and how can I control them with compiler options.

    Even further, it’s possible, today to choose more than one library to process floating point operations, and the only way to know which one is the best, is testing them all, not only for precision but also for efficiency. And against the FPU.

    Some languages like C# (.NET Framework), which is not my favorite one, allows the creation of “operator” functions into a class, so you can add or multiply your class by simply using the ‘+’ and ‘*’ operator. It allows you to create your own RealNumber class. I did it. It works very well. The problem is that you can use Math functions with this class, unless you create all these functions.

    I hope (or dream about) the future could bring us a primitive real type (machine handled) packed in decimal format (base 10), according to IEEE754-2008 Section 3, because with the mantissa stored as a decimal number you just need to find where to put the decimal separator after each operation.

    Something like real32, real64 and real128, with assembly functions like RMUL and RDIV. And with a full set of Math functions.

    I ran a quick search on google and found this page that I think it is very interesting: http://stackoverflow.com/questions/10784951/do-any-jvms-jit-compilers-generate-code-that-uses-vectorized-floating-point-ins

    finally, the line into the longToBin function is: for (i = 0, t = 1; i < 64; t*=2, i++) if ((t & q) > 0) r[63-i] = ‘1’;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: