Comparing Floating-Point Numbers Is Tricky (2017)

jqpabc123 · on Oct 22, 2022

My guess is that at least half of the cases where floats are applied could be better served using scaled integers.

Real world practicalities limit the the need for extended precision in a lot of (if not most) cases.

For example, most monetary transactions are settled to the nearest penny. Most time clocks are only accurate to the nearest second. Most milling machines are only accurate to a single digit fraction of a millimeter.

Applying extended decimals to these may work --- but in engineering terms, this is really "false accuracy" that is unnecessary and can be counter productive.

tialaramex · on Oct 22, 2022

Floats are really nice for audio (PCM) which may not be intuitive.

It's much nicer during mixing, editing and other production steps to work with floats and so essentially all modern software does that, even though the A2D conversion is integer and the end results will be integer data (e.g. CD audio is Linear PCM 16-bits at 44.1kHz)

pclmulqdq · on Oct 22, 2022

That's not why audio processing software uses floats. Floats are used because they effectively let you decouple scale from precision. With scaled integers, soft sounds only have a few bits of precision, while loud sounds are very precise. With floats, every signal is within 1 bit of precision of each other, regardless of volume.

That means you can raise and lower the volume on signals over and over without significant precision loss.

tialaramex · on Oct 22, 2022

What part of what I wrote do you think you're disagreeing with ?

pclmulqdq · on Oct 22, 2022

The part where you said that floats were used because it was easier. It has definite benefits for audio quality to have higher dynamic range in your numerics.

Edit: You said it was nicer, not that it was necessarily easier. They are synonyms.

tialaramex · on Oct 22, 2022

I'm sure you believe I said "floats were used because it was easier" but I didn't, which is why if you go look at the comment you replied to that's not what it says.

tialaramex · on Oct 22, 2022

OK, now you've edited your reply to say you believe "nicer" is a synonym for "easier". But, it's just not.

jqpabc123 · on Oct 22, 2022

It's much nicer

Said every developer who has ever applied floats where scaled integers would be just as effective.

I attribute this to pocket calculators which have shaped most people's thinking.

I've even had developers try to tell me it can't be done any other way --- until I show them.

All that said, if floats are needed to interface with other software, you may not have much choice in the matter.

tialaramex · on Oct 22, 2022

Not for developers, it's much nicer for the users. There's a phrase "infinite headroom" which might make this a bit sharper.

With a conventional analogue recording medium, the medium has physical properties which impact the precision with which it can record audio. For example maybe the reel-to-reel tape recorders in a studio can do about 12-bit precision. At the bottom, the precision is cut off by noise, mechanical noise in the equipment and eventually even quantum mechanical noise in our universe which is unavoidable. At the top it's cut off by practical engineering limits - the magnetic field strengths are limited. A good analogue engineer ensures that there's "headroom" where if the recording is a little louder than expected it doesn't bump into that upper limit, when Aretha hits that note 6dB louder than anticipated, you had 10dB of headroom so it's captured OK. But to do this you're sacrificing that precision at the other end of the scale, listeners can't hear her turning the lyrics sheet in the background of the song even though you'd have heard that if you were in the room.

In digital integer space, we can have lots of precision, 14-bit happened early, 16-bit was soon common place, and today 24-bit is widely available (although you need to squint to see 24 actual bits, don't think about it too hard). But, the integers are brick walls at the edges of that range. If you set things up to use all your (say) 16-bits and it comes out hotter than expected, your data is trashed because it saturates † producing a harsh clipping. So you need to carefully set the headroom to give you as much space at the top as you need, but without wasting too much. Stressful.

With floats you get infinite headroom. Conventionally PCM audio is represented as -1.0 to 1.0, but in float space if you're "too hot" by a factor of literally 1000, it's no problem at all, floating point numbers can go from -1000.0 to 1000.0 just fine, and when you're ready to actually listen to it, just multiply them by 0.001 (this is called "gain" in audio).

† Or, if the programmer doesn't understand what they're doing, it wraps and sounds like complete nonsense. This is common on 1980s home computer audio setups.

jqpabc123 · on Oct 22, 2022

Don't get me wrong --- there are perfectly valid use cases for floats --- and it sounds like you may have one.

But these valid uses are a lot less than what I see in a lot of software.

ttoinou · on Oct 22, 2022

   Said every developer who has ever applied floats where scaled integers would be just as effective.

Floats are great to prototype quickly and get a good result. Once the algorithm is found, you can spend more time fine tuning with improvements such as others kinds of ways to represent fractional numbers

constantcrying · on Oct 23, 2022

This is a complete fallacy. Just because output and input accuracy are low, that does not imply that intermediate accuracy can be low too.

You gave the example of a milling machine, of course in most cases you wouldn't expect it to be accurate to more then a 10th of milliliter. So you could believe that e.g. 100th of a millimeter should be more then enough as your scaled integer (meaning an integer with a value of "100" represents one millimeter). The maximum value of a 32bit integer would represent around 43km. Who would ever want to use more?

But what if now you are trying to calculate the 3D distance between two points which are about a meter (= 1000mm = (int)100000) apart in one direction. To calculate this you need a third power. 100000^3 is pretty large, much larger than a 32 bit int can hold...oops

With scaled integers you have to constantly think about whether each intermediate result is still within your range. It is a terrible, headache inducing format for many applications. The magic of floats is that you do not need to worry about so many things. Yes, they have their pitfalls, but they are used for a very good reason and "you don't need the precision" simply is false in many cases.

Merad · on Oct 22, 2022

> For example, most monetary transactions are settled to the nearest penny.

Settled, yes, but all kinds of financial operations require higher precision. IME with payment processing the backend calculations and account balances were done to a thousandth of a cent. It was a common point of confusion because when clients were billed for their processing fees they usually ended up with a fraction of a cent balance that couldn’t be billed and stayed on their account. Every couple of months those fractions of a cent would add up to a full cent and their detailed billing breakdown would include a 1 cent charge in the “other” category that would drive them crazy. But of course financial calculations usually don’t use floating point anyway.

jqpabc123 · on Oct 22, 2022

When the bill is issued, round the amount due and recorded to the nearest penny (integer) since this is what is being paid.

Do this continuously and consistently and you can easily avoid the "mystery" penny problem.

Merad · on Oct 22, 2022

Yes, obviously that's an option, but it's one that gives up revenue. At that company most clients were billed monthly, and it was a small-ish company, so the loss wouldn't be huge (I'd guess mid to high 5 figures). However many payment processors do daily billing, meaning you'd be giving up a penny every few days instead of every few months.

I also overstated the customer impact... perhaps 1% of customers would inspect every detail of their statements and they tended to be annoyed by it. The overwhelming majority didn't notice or care.

jqpabc123 · on Oct 22, 2022

but it's one that gives up revenue.

Done consistently and properly, it is just as likely to *gain* revenue.

Over the course of time and probability, it should be revenue neutral.

I suggest looking into "banker's rounding".

gw99 · on Oct 22, 2022

I tend to use “HP” style decimal floating point types if accuracy is required. The mantissa is stored as one base 10 digit per 4 bits (BCD encoded). A 64 bit int gives 12 digits of precision and 3 digit mantissa plus flags. This can be scaled to 128 bits if needed easily.

I spent several years maintaining a C library that handled these for some embedded instrumentation. This included most key mathematical functions as well. Most interesting thing I ever had to look after.

hnews_account_1 · on Oct 22, 2022

Money settling is only a small aspect of finance. In most cases, more precision is required. Not like 15 significant digits or something but more than 2 for sure.

jqpabc123 · on Oct 22, 2022

4 decimals can adequately address most interest rate calculations.

One 3 line function can handle applying an interest rate to monetary amounts held as scaled integers. Instead of taking the time to write this one function, the typical developer will build the entire software around floats which causes endless issues.

SideQuark · on Oct 23, 2022

Di you try it? Have you written financial software? This is terrible advice.

Even the most naïve calculation, such as computing total amount paid at 8% compounded monthly over a 30 year mortgage, ends up wrong by over $100,000 compared to the correct answer. This is a pretty common mortgage calculation.

Here's the expression to evaluate: 1000000 * (1 + (8/100)/12) ^ (12*30). Do it, rounding to d digits of accuracy at each basic expression, and here are the errors you get:

    2 -> 25 million
    3 -> 1.3 million
    4 -> $131,170
    5 -> $13,040
    6 -> $1304
    7 -> $130
    8 -> $13
    9 -> $1.3
    10 -> $0.13
    11 -> $0.013
    ....

Thus, using anything less than a double, over large portfolios, is a terrible, terrible idea, even for such a simple calculation.

Anything actually useful, such as summing such calculations over some portfolio, or handling sum over variable interest rate calculations, or, well, pretty much anything, completely pukes at 4 decimals.

This is why people that don't develop floating point with a solid understanding of how to compute and check things like condition numbers for calculations should not give advice on how much precision to use for things.

In fact, I'm hard pressed to find any reasonable financial computation that is not crap at 4 decimals.

This is perhaps why your "guess .. that at least half of the cases where floats are applied could be better served using scaled integers" is not a very good guess.

The proper answer is do not pick either unless one really understands the numerics, so if someone chooses "scaled numerics" over floating point because they find floating point complicated, then they are most certainly made an uninformed choice and likely an error.

Choosing floating point calculations over scaled numerics is much less likely to bite someone doing numerical math, since it can scale properly and dynamically to help hide their ignorance of numerical issues.

hnews_account_1 · on Oct 24, 2022

I don’t think anyone means to put ALL calculations through a reduced representation. That’s a terrible idea. But if you are discounting cash flows or something, it won’t make that much of a difference if you leave the discounted version in full float / double representation and then add them or reduce them to 2 decimals before adding*

High finance uses float / double in general and doesn’t bother with the nuances of decimal representation or IEEE 754 unless they’re working too close to the silicon rather than doing model work in higher abstractions.

* software that allows only 2 decimal representation must account for these discrepancies by default even if it’s only off by a few cents

jqpabc123 · on Oct 24, 2022

You'll probably want to use 64 bit integers here just in case you need to deal with the national budget.

Assume Principal (P) is provided in pennies. In pseudo code:

    P := P * 100              //temporarily scale out 2 extra digits for accuracy (4 decimals total) 
    for I:=1 to 360 do
        P := P + ((P * 8)/1200)
    return (P+50) div 100     //round back to nearest penny

By my calculations, this returns 1093572966 pennies or $10,935,729.66

How did I do?

Thanks for playing.

SideQuark · on Oct 24, 2022

Congratulations - you're 0.3 cents off, despite careful ad-hoc tuning you did to solve only the one set of numbers you set out to. Too bad your method still fails for a whole range of valid mortgages :)

Like what happens when you use your algorithm for a 10M mortgage? Oops! Just add some more scaling, ad hoc fixes.... Then $100M? Oops! More scaling and fixes. And so on. Enough ad-hoc fixes and you'll re-invent float-point, but badly.

This is why such ad hoc goofiness is a bad idea. Despite you trying real hard to make a good algorithm, you still failed.

Try using modern high decimal fractional mortgage rates. Oops.

Try using a longer term, like long term rentals of commercial land for 100+ years. Oops again! Yet my one line floating point code handles them all, no fiddling, no errors, no chasing down bugs every time someone puts other numbers into the code.

Are you going to hand fiddle everytime? Try other rates, some times rates are many more digits than you assume - are you now going to hand analyze each single problem and hand code a single problem solver?

Floating point does them all. Of course, you will eventually stumble onto more and more precision, then you will fail to handle large numbers. Eventually you might try using two 64 bit integers, more or less trying to accomodate what a double simply does out of the box. And then you will have reinvented floating point numbers, just much slower and more error prone.

You're replacing floating point which just works, by having to be super careful of order of operations, thereby making it more likely a programmer without knowing the nuances of numerics will fail. Prime example: you decided to replace a rate, converted to percent with a division, then to a per-time step with another a division, by a multiplication. Why? because if you didn't you puke as I demonstrated.

You also needed to do some prescaling, leading to bugs. Not all prices are in pennies.

Then you needed to "assume" principal was in pennies. Why? to make your ad-hoc method work. Mine worked in dollars, yen, lira, pesos... all sorts of scales, and yours would need tweaking and hackery to handle all the scaling you will need for wide ranging currencies (or dollar amounts) - each leading to bugs.

>Thanks for playing.

You're welcome.

If you really want to play, let's each put forth a single code snippet, and see if the other one can propose a simple mortgage representative of one in the real world that breaks the other's code. Care to play?

So write your best method to calculate mortgages, across any type of simple mortgage I will pick (principal P (in an arbitrary currency), annual rate r (expressed as percent), compounding rate t per annum, term in years n), and I will submit my floating point code against yours, and I will demonstrate yet again why yours fails.

My code:

    total = principal * (1+((rate/100)/t)) ^ (n*t)

No many places to make a mistake there.

And your code? So far your method seems more ad-hoc and error prone than pretty much anything any programmer simply copying a formula out of a book would end up with.

This is exactly why your advice is bad. It will lead to people making mistakes :)

>Thanks for playing.

You're welcome. It has been enlightening. I look forward to your next algorithm.

jqpabc123 · on Oct 24, 2022

Congratulations - you're 0.3 cents off

So within acceptable legal tolerance. Good luck trying to collect 0.3 cents. Even the IRS accepts rounding to the nearest dollar. 0.3 cents is quite a bit different than your earlier claim --- "completely pukes at 4 decimals".

Too bad your method still fails for a whole range of valid mortgages :)

Wrong! You gave a specific example so I followed suit. My standard, generalized, library routine is equally brief and works for amounts up to $100 billion with any interest rate expressed out to 3 decimals --- with nary a float in sight.

No many places to make a mistake there.

But lots of places in the surrounding code for non-obvious mistakes due to floating point math. I know because I've been paid to find and fix them. Some of them may have been yours.

SideQuark · on Oct 24, 2022

>0.3 cents is quite a bit different than your earlier claim --- "completely pukes at 4 decimals".

Yep - plug 100M into your just posted code. Was it correct? Nope?

>So within acceptable legal tolerance.

Yes, when you hand craft to solve one specific instance by carefully tuning. I noticed you ignored posting code you claim will handle mortgages in general.

Care to cite a law you think gives "legal tolerance"? I suspect you're making that up. You must mean 'within my understanding that being within a cent on a single transaction is ok" which is simply not true.

Not when you process thousands of loans (I develop the algorithm used to price tranches for mortgage bundling for a large mortgage company when I was in grad school - I do know a bit about this space, and I certainly know a lot about numerics - floating-point, fixed point, adaptive, unums, the whole lot - you're simply compounding your errors).

>My standard, generalized, library routine is equally brief and works for amounts up to $100 billion with any interest rate expressed out to 3 decimals --- with nary a float in sight.

Post it :) Even tell me what numbers you think it handles. I bet I still break it, and my naive floating-point one above handles it.

I don't think you understand floating-point. Do you ever check condition numbers on your code? Do you know what condition numbers are? I'll take you inability to post this simple magic algorithm you claim you have as evidence you don't have it.

For anyone following this thread, this example pretty clearly shows why naive replacement is going to bite you.

jqpabc123 · on Oct 25, 2022

The generalized routine below provides compound interest Future Value using scaled integer math (no floats) to 14-15 digits of accuracy --- roughly comparable to a double float while avoiding comparison decoherence. It aims for results accurate to the penny while a negative result indicates an overflow failure.

In comparison, floating point math provides no warning when the accuracy of the mantissa is exceeded --- out of sight, out of mind --- by design.

    scaleFV(P,R,C,Y) int64

        SP := 10000              //temporary principal scale factor, 4 decimals
        if (P>SP*1000) SP *= 10  //extend accuracy for larger amounts 
        SR := 1000               //annual rate scale factor, 3 decimals (pre- 
                                   applied to R)
        N := C * Y               //total number of compounding periods
        D := C * 100 * SR        //period rate divisor 
        P := P * SP              //scale the principal (int64)
        while N>0 do
           P := P + ((P * R) div D)   //compound principal with period interest  
           decr N                     //count the period
        return (P + SP div 2) div SP  //unscale result, round back to nearest 
                                        penny

Example:

    P = 10000000000   $100 million in pennies
    R = 8000          8 percent scaled to 3 decimals
    C = 12            12 compounding periods per year
    Y = 30            30 years

    Result = scaleFV(P,R,C,Y) = 109357296578 pennies or $1,093,572,965.78

hnews_account_1 · on Oct 26, 2022

Now implement this for an annualizing a forward rate on 5y5y forward. It’s a simple enough calculation.

((1+10Yrate)^10/(1+5Yrate)^5)^(1/5)

jqpabc123 · on Oct 26, 2022

It’s a simple enough calculation.

Yes, you're right --- that is relatively simple. Nothing there that can't be easily reduced to simple 4 function *integer* arithmetic. The only complication is using appropriate range and scale to meet real world requirements.

By why should I bother? I've already demonstrated how to achieve the impossible.

hnews_account_1 · on Oct 26, 2022

You will reduce finding the 5th root of a number to a 4 function arithmetic? This should be a nice trick. Are you suggesting people do a bottom up search from 0? Or some other mind blowing mathematical property of exponents and logarithms that the top minds in the field have somehow not realized?

jqpabc123 · on Oct 26, 2022

You will reduce finding the 5th root of a number to a 4 function arithmetic?

Yes, magic isn't it. The 8 bit integer only processor in your handheld calculator does this all the time.

https://en.wikipedia.org/wiki/Newton%27s_method

hnews_account_1 · on Oct 27, 2022

Ah man, using Newton raphson to find a numerical root of a single number. Even Taylor series would’ve been an acceptable answer but this just looks like you searched “root finding” and pasted the first result here. The mistakes are just compounding the more you talk.

Not to mention you now have to do your nice little penny wise adjustment on each iteration of a root finding algo to keep it in the confines of your imaginary system. I can’t even.

jqpabc123 · on Oct 27, 2022

Ah man. There is a recurring pattern here ---- first you tell me it can't be done. Then you tell me you don't like how I would do it.

You handheld calculator with it's 8 bit processor is proof that almost any mathematically function can be reduced to basic 4 function integer arithmetic.

hnews_account_1 · on Oct 27, 2022

I’m telling you you have no idea how anything works or why design choices are made in fields where you’re holding forth with an authority inversely proportional to your ignorance. Your very basic and ignorant idea of mathematics or modern finance isn’t worth my time. Especially if you think newton raphson should be any sort of standard for finding numerical roots in 2022. I shudder to think of how you’d approach a seemingly irrational exponent or something. Or how you’d incorporate something as simple as e^x. Would you use a rainbow table? The possibilities for abuse are truly endless.

jqpabc123 · on Oct 27, 2022

Especially if you think newton raphson should be any sort of standard for finding numerical roots in 2022.

Nice straw man. I never proposed a standard --- only that it is possible.

But if you look too closely at whatever method you are using now --- you will likely find an algorithm that you will scoff at. There are only a handful of options and "magic" isn't one of them.

*Anything* a computer does is ultimately reduced down to the most basic mathematical functions using some sort of algorithm. The fact that you don't know it or see it doesn't mean it's not there.

hnews_account_1 · on Oct 27, 2022

I mean this is now an absurd conversation. You’re claiming you’ll break down all mathematical functions and convert them to work exclusively with integers instead of real numbers. So you’re proposing writing an entirely new mathematical standard and acting as though it can be achieved without any investment of time and effort because you’re able to demonstrate some simple calculation that has been done for several centuries by hand.

I’m not even sure how I’m still conversing on these absurdities despite noting them.

SideQuark · on Oct 26, 2022

Don't worry - I just started looking at his code - it fails on tons of common examples, completely demonstrating what we pointed out. There's no need to even have him try to compute things past what he claimed his ever changing claims can do.

I will post results soon.

hnews_account_1 · on Oct 26, 2022

I mean, even not knowing the nuances of modern implementations of floating point arithmetic (which even I don’t fully grasp since I work very very far from the silicon on a daily basis), the whole concept of “I can reduce finance math to 4 operations” is absurd beyond reason. Like what will you do? Write a new root finding algo? Create a method that directly interfaces with the highly optimized logarithm calculating chips on modern microprocessors? Create your own new silicon for what is essentially just a special case of all modern usage that can be perfectly achieved with off the shelf hardware?

This is like crypto bro level of ignorance.

SideQuark · on Oct 26, 2022

1) I posted my code as ` total = principal * (1+((rate/100)/t)) ^ (n*t)`, and you claimed your 10 line "My standard, generalized, library routine is equally brief". By "equally brief" did you mean an order of magnitude larger?

2) Your code and example does not work using 64 bit integers. Using only 64 bit integer math, your code gives 11605682736, off by 90% (easily checked in C/C++/C#). Your math overflows, so you're not using scaled integers; in order to work, your code requires arbitrary sized integers for the loop. Did you not realize this? So you're not using scaled integers as int64s. You're using unbounded sized integers. Your (P*R) term repeatedly overflows a 64 bit int when calculating your example.

3) If you're going to use arbitrary sized integers, then simply compute the exact numerator and denominator, then divide. It's actually accurate, as opposed to your mess. And it's simple to code.

4) You claim "a negative result indicates an overflow failure", which is wrong, since it can overflow twice (or more) in the same calculation, and, since you're using arbitrary sized integers internally, the conversion depends on non-portable behavior. Both of these can be demonstrated quite easily.

5) You claim "floating point math provides no warning when the accuracy of the mantissa is exceeded," which is wrong - it marks it with an Infinty using IEEE 754 math, required by a significant amount of languages, and provided on all major pieces of hardware), which will never revert as your error code can. And for someone understanding numerics (and condition numbers), the calculation is easy to extend with the accuracy of the result so the used will know error bounds.

6) Your code is amazingly slow: your int64 version (which fails almost all tests) is ~6000x slower than the pure double version, and the arbitrary sized int one is 45,000x slower (C#, Release build, tests run over the examples below).

7) Examples where your algorithm fails. Inputs are numbers that occur in real mortgages: interest 18.45% happened in the 1980s, 50, 75, 100 year terms exist (50 is getting common in CA), 24 payments represents bi-weekly payoffs (and some places compound daily, or 365 times a year [1]). To test accuracy of even your int64 routine, lower principal amounts are below to show both your routines (the honest int64 and the arbitratry integer one) still fail. "My routines" are the double one above and, since you require arbitrary sized integers, I'll test one that simply computes exact numerator and denominator then divides:

10000,18450,365,50 -> both your int64 and arbitrary precision are off by $10.08, my routines are both perfect.

Want a lower interest rate? 10000,8000,365,50 -> both of yours are off by $0.12, both mine are still perfect.

Let's push them with a longer term, up the principal: 100000,8000,365,100 -> both yours are off by $6.72, both mine are perfect

Now, since your int64 version pukes immediately on larger numbers, let's only look at the arbitrary sized versions.

500000,8000,365,100 -> yours off by $0.68, mine perfect

Maybe the problem is the daily compounding? 100000000,18450,12,50 -> yours off by $0.03, mine perfect 100000000,18450,12,100 -> yours off by $0.34, mine perfect

And for fun, let's look at a $100k loan, 15%, compounded hourly (say 8760 times a year) like a bad credit loan, for 25 years.... (Note your routine is stupidly slow for this one):

100000,15000,8760,25 - both yours are off by $1.21, both mine are correct.

I can keep producing errors all over this frontier.

8) So, to avoid "comparison decoherence" (!?) because you don't understand floating point (which can be made bit exact, as I've done on many projects for binary file format interop), you instead produce demonstrably buggy, numerically faulty, slow, memory unbounded code?

This is why people should not take advice on numerics from someone that does not understand numerics.

[1] https://www.rocketmortgage.com/learn/compound-interest

jqpabc123 · on Oct 26, 2022

Post your code and I'll fix it for you.

SideQuark · on Oct 26, 2022

Oh good, that will be fun.

Here's C++ code

    #include <iostream>
    #include <cstdint>

    int64_t scaleFV(int64_t P, int64_t R, int64_t C, int64_t Y)
    {
        int64_t SP = 10000;              // temporary principal scale factor, 4 decimals
        if (P > SP * 1000) SP *= 10;     // extend accuracy for larger amounts 
        int64_t SR = 1000;               // annual rate scale factor, 3 decimals (pre-applied to R)
        int64_t N = C * Y;               // total number of compounding periods
        int64_t D = C * 100 * SR;        // period rate divisor 
        P = P * SP;                      // scale the principal (int64_t)
        while (N > 0)
        {
            std::cout << P * R << std::endl; // watch the errors fly!
            P = P + ((P * R) / D);    // compound principal with period interest  
            N--;                      // count the period
        }
        return (P + SP / 2) / SP;     // unscale result, round back to nearest penny  
    }

    int main()
    {
        std::cout << scaleFV(10000000000L, 8000, 12, 30) << " = 11605682736 != 109357296578\n" << std::endl;
    }

I added a print in your loop to show you the overflow. Good luck. To help you, P * R for this calculation gets about 4x as big as an int 64 can hold.

Another way to see it is your answer 109357296578, times your scaling 100000, requires P to hold > 53 bits. But your rate of 8000 is almost 13 bits. So P * R cannot fit in a 64 (well 63 for signed) int.

If you cannot fix it, care to explain how your code works only with int64 as you claim here and in other threads on this page? Maybe then you can address the errors I listed above where your routine failed?

jqpabc123 · on Oct 27, 2022

If you cannot fix it, care to explain how your code works only with int64 as you claim here and in other threads on this page?

Yes, your code pukes all over itself. And mine doesn't. Why?

For more than 2 decades now, Intel processors have included SSE "extensions" with a whole bank of 128 bit registers (XMM0 thru XMM15) with specialized math instructions for integer and floating point.

https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

The compiler I use emits SSE opcodes by default for operations on 64 bit integers when building 64 bit executables. In other words, 128 bit processor registers are being used for the calculations. Overflow occurs when the final resultant is too large for int64.

So there you have it.

SideQuark · on Oct 27, 2022

>For more than 2 decades now, Intel processors have included SSE "extensions" with a whole bank of 128 bit registers (XMM0 thru XMM15) with specialized math instructions for integer and floating point.

That's interesting, since Intel did not add 128 bit wide integer math in SSE. Those 128 bit registers were SIMD, meaning multiple data, meaning at most 64 bit integers. Later extensions (not two decades back) added larger registers. I wrote the article for Intel for the release for AVX in 2011 [2], where Intel expanded the registers to 256 bit (still no 128 bit integer math). But there has certainly not been 128 bit wide integer operations for two decades. There has been 128 bit registers split into at most 64 bit sized components, the M in SIMD. I also wrote decompilers for Intel binaries, and just looked through that code, and again, no 128 bit integers. Are you confusing 128 (or 256 or 512) bit wide registers that are split into components as actually 128 or 256 or 512 bit integer registers? Are you making stuff up?

Intel intrinsics denote register size and SIMD size. For example, _mm256_add_epi16 means use 256 bit registers, and add packed integers of 16 bit size. There are no _epi128 intrinsics, only size 8,16,32,64 [1]. Another interesting place to look for these 128 bit integer instructions is [3]. Lots of IntN for N = 8,16,32,64, none for N=128. Here's [4] the Intel Software Development Manual from April 2022.... Also not seeing them. Section 4.6.2 lists supported data types - not a single 128 bit integer on that page. I don't see them in the AVX and other extension sections either.

So I'm super interested in your compiler and language that automatically emits 128 bit wide integer SIMD instructions for Intel, since they are not in the opcode or intrinsic lists. Please name the language and compiler, and even post some working code to demonstrate this auto extension to 128 bit math.

And, if you're using 128 bit registers, why would you pick 64 bit math, which fails for all the cases above? You still have not addressed that any size register fails on the examples I posted above, including your auto-extending non-portable compiler tricks.

For example, $100000,8000,365,100 fails even on 128 bit, 256 bit, even infinite bit length registers. Because your algorithm itself is bad.

So, care to post your compiler, language, and code? Also, why did you keep telling us it was 64 bit when it wasn't?

[1] https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

[2] https://hpc.llnl.gov/sites/default/files/intelAVXintro.pdf

[3] https://www.felixcloutier.com/x86/index.html

[4] https://www.intel.com/content/www/us/en/developer/articles/t...

jqpabc123 · on Oct 27, 2022

For example, $100000,8000,365,100 fails even on 128 bit, 256 bit, even infinite bit length registers. Because your algorithm itself is bad.

Really? So now we've progressed from $100 thousand to $100 million to the national budget?

Anything that can't handle the national budget is "bad"?

Every algorithm "fails" when pushed beyond it's limits --- even the ones you use based on double precision floats but they do so silently by losing precision in the mantissa which is only 52 bits.

Out of sight, out of mind don't mean it's always "right". By the standard you're applying, your own algorithm itself is equally "bad".

SideQuark · on Oct 27, 2022

So, where are your 128 bit SSE instructions? What compiler? What language?

Interestingly, Intel's own compiler, when operating on int128, does not emit these instructions you claim exist (you can check it on Godbolt.org and look at disassembly). Maybe you should tell them about these instructions.

Why does your routine fail for simple cases that the floating point does not?

Please stop deflecting. Can you post code, compiler, and language or not?

>Really? So now we've progressed from $100 thousand to $100 million to the national budget?

That example is for a $100,000 future value where your algorithm fails. It is not national budget.

Did you even try the examples I demonstrated where your algorithm fails?

>By the standard you're applying, your own algorithm itself is equally "bad".

Yet it's incredibly faster, does not rely on lying about mythical instruction sets, and handles simple cases yours didn't, even cases you claimed yours did handle.

Oh, and it uses honest 64 bit hardware.

So, code and compiler to demonstrate your SSE claims, or this thread has demonstrated what I expected it to.

SideQuark · on Oct 29, 2022

Ah, so no reply on your compiler and language that makes 128 bit SSE code? Makes sense, since the instructions you claimed to use don't exist.

I wanted to test to see if I can even find cases where your algorithm works but the normal floating point one doesn't, and made a neat discovery.

*Your algorithm fails significantly in every range I test it.*

Here's a simple example: pick a normal loan, say 5%, 5 years, compounded monthly, and check your algorithm for every loan value in $1000 to $2000. Such small numbers, you'd think even your algorithm would work. No int64 overflows in sight.

It fails for 333 values in this range. The first is few are $1000.34, $1006.41, $1007.01; the last few are $1993.71, $1999.18, $1999.78.

Test these :)

In fact, no matter what reasonable rates, compoundings, and time lengths I put, for any range of principals, your routine fails. Try it: pick R,C,Y, a starting P value, then test, add 1 to P, test again, and you will fail over and over and over. The double based method works. Amazing.

Another example, try 7%, monthly, 8 years, $10,000 to $20,000, and you get 6337 errors. Largest failures are at $19,998.43, 19,999.58. Smallest at 10,013.60, 10,014.75.

No failures for the double based code.

Every single test I try like this, yours has a spectacular number of failures, the double has none.

So you can try to add more scaling, which breaks other parts. If you carefully analyze, you can prove that your method will fail for reasonable values no matter what scaling tricks you try and play. For fixed precision you simply will lose too much off the top or from the bottom for rates used in mortgages. You honestly need a sliding accuracy to make it work in 64 bits.

None of these values fail for the double based routine.

On the front of trying to find cases where one routine fails but the other doesn't, I set the random ranges large enough to make both routines fail from time to time to get errors, then checked to see where yours might work and the floating point fails.

Here is what I get from 10,000 runs :)

    total 10000 runs, 
        double failed 4861 = 48.6%, 
        scaled failed 8455 = 84.5%, 
        double failed & scaled didn't 2 = 0.0%, 
        scaled failed & double didn't 3596 = 36.0%

I guess that puts the nail in the coffin, right? Yours fails on every range, and out of this 10,000 random value test, yours failed 3596 times the double one didn't. The double one failed only 2 times that yours didn't. Both failed a lot overall. This test is how I discovered that yours actually fails in places it seemingly should not, like everywhere.

Did you ever test yours?

"My standard, generalized, library routine is equally brief and works for amounts up to $100 billion with any interest rate expressed out to 3 decimals"..... I hope you're not using this for anything real world!

This thread is my new go to for an example when I teach numerical methods stuff, to show people that naïve trying to beat floating point pretty much always fails.

Now it's completely transparent to anyone reading this far why using fixed point is almost always a terrible idea, even for simple things like naïve future value calculations, even when an absolutely certain master of fixed point like yourself claims it and even provides an "algorithm."

I take it you have no code to share? :)

This concludes the testing of the code.

SideQuark · on Oct 27, 2022

Oh, another useful fact for you - you claimed your routine is good up to $100 billion, that it takes input in pennies, and interest scaled by 1000. Your temp scale factor starts as 10,000, and then is multiplied by 10, so your principal in pennies is scaled by 100,000 before the loop.

The first operation in the loop requires computing P * R. For a rate, say of 8.000%, your rate is 8000, so computing P * R is ($100B * 100) * (100,000) * 8000 = 8 * 10^21, which is a 73 bit number (74 bit for signed).

How exactly do you fit this 74 bit computation in a 64 bit signed int again?

Hopefully this helps you understand your scaled integer solution better.

gilbetron · on Oct 24, 2022

In fact, you want to use something like Java's BigDecimal for financial calculations: https://dzone.com/articles/never-use-float-and-double-for-mo...

jqpabc123 · on Oct 24, 2022

Yes, that works --- but you can pay a significant price in terms of efficiency and storage headaches. It all depends on your needs.

64 bit integers fit natively within modern processor registers (read max. possible speed) and are still reasonably efficient even when implemented in 32 bit processors.

My simple suggestions to avoid loss of of significance, comparison decoherence, "mystery penny" problems and storage convenience are these.

- Use 64 bit integers for monetary amounts expressed in pennies. Format as desired for display/printing.

- Use 32 bit integers for annual interest rates with amounts scaled to 3 decimals (x1000). For example, 8.000% would be stored as 8000. 8.259% would be 8259. If you think you need more decimals (personally, I doubt it), go for it --- just be sure your math is scaled appropriately.

- Round and store amounts paid/due to the nearest penny using "banker's rounding". This in itself can cause some minor loss of accuracy (piecewise rounding) which (if any) should be reflected in the "final" payment.

- Temporarily use 4 additional decimals for any multiplication or division intensive calculation. Round and return a final resultant to the nearest penny.

- This generally works just fine for routine calculations involving amounts below $100 billion or so which is probably 99.9% of financial apps. Simple totals can go up to $ trillion range. "BigDecimal" (which is really infinite range integer math with library routines available in many different languages) might be a candidate for the remaining 0.1% but personally; all things considered, I would investigate 96 or 128 bit integer math first if more range is needed. C/C++ and Rust have an int128 type.

Your mileage may vary.

hnews_account_1 · on Oct 24, 2022

You’re misunderstanding the problem. It isn’t storage, it is calculation. You cannot do away with decimals because calculations are not always linear. There are several logarithms, exponents and root finding algorithms used in finance which you’d have to write several additional rules for if you can’t represent decimals. It’s not like mathematicians thought decimals were just a convenience so you don’t have to show large integers. They’re fundamentally not integers.

jqpabc123 · on Oct 24, 2022

As stated below in this thread:

    Perfectly valid use cases for floats do exist --- but 
    these are a lot fewer than what I see being done in the 
    real world.

    Most simple accounting applications involving time and 
    money can be better addressed without them.

hnews_account_1 · on Oct 24, 2022

Your arguments all over this thread are not “simple accounting applications”. Mortgages are ridiculously complicated. Even if they aren’t, the simplest of prepayment models requires exponents and fractions all of which will fail spectacularly if your system used just a fixed decimal range. When you first mentioned it in this thread, I thought you meant like simple surface level applications or accounting things, not heavily complex stuff like mortgages or derivatives. The latter require precise math in their (very non linear) modeling. You cannot eschew floats in that case.

jqpabc123 · on Oct 25, 2022

Mortgages are ridiculously complicated.

All simple addition, subtraction, multiplication and division. This stuff was invented long before computers --- like way back in ancient Babylon.

will fail spectacularly if your system used just a fixed decimal range.

Really? Wonder how the US economy gets by on a fixed decimal range? Have you ever paid for anything in fractions of a penny?

You cannot eschew floats in that case.

I'm sure it's possible to craft a "derivative" that no one understands using higher level math. But accounting and mortgages are all built around simple 4 function arithmetic. All you really need to do is insure you have enough range for penny accuracy and you can stumble by somehow.

See the routine I posted above in this thread.

NOTE: Nothing prevents you from using floating point math in an isolated routine if you think you really need it.

My real objection is using FP everywhere, including storage. This creates lots of potential problems that can be avoided by simply using integers. And with 64 bit processors, the required math is very efficient.

hnews_account_1 · on Oct 26, 2022

Mortgages are not simple mathematical operations. Don’t betray your ignorance of modern finance. What the other guy gave you was a simple calculation of a mortgage rate. Actual financial modeling (which is done day in and day out and uses more compute cycles than you can imagine) cannot be done by reducing it to 4 operations. If you think modern mortgages are what the Babylonians worked with, you might as well call a computer a clay tablet.

> paid anything in fractions of a penny?

Yes. Fx operations routinely require sub penny operations. You’re claiming, as someone on a moderately technical (if heavily ignorant) forum, that you’re going to divorce your system for settling with mom and pop outfits who deal with nothing less than a penny - from the system that the same bank may need to use to run highly advanced operations (on the settlement layer)? If so, I don’t know what to tell you except ask you to understand how money works.

You cannot stumble by on low end systems. There are incredibly complicated operations a bank does every single day in volumes of thousands that are not “complicated derivatives” which require much much more than the 4 basic operators. If finance was that fucking easy, any moron could do it. I’m not saying it is particularly difficult but you seem to think finance is accounting. It is not. It is almost anything but accounting.

You’re still missing the point. Sure in storage it may make sense to cut it off at 2 decimals. But you seem to have a bad understanding of financial math in general where you think most of the operations are just the 4 basic operators. While it is strictly true, this is like saying most coding it just loops, so we can destroy the rest of the scaffolding and just put loops everywhere. No need for abstractions. That just sounds ignorant.

SideQuark · on Oct 26, 2022

>See the routine I posted above in this thread.

See my reply to it showing an incredible amount of errors.... I bet you work in crypto pricing, right? From your claims it is clear you do not even understand how your code works (claiming it's using int64, when I clearly showed it's using arbitrary sized integers since your intermediate results don't fit in int64).

>All you really need to do is insure you have enough range for penny accuracy

Which your code woefully does not do...

SideQuark · on Oct 26, 2022

>but you can pay a significant price in terms of efficiency and storage headaches

Yet the code you posted above as using int64 does not in fact work using int64, it requires arbitrary sized integers as I demonstrated above.

Did you even check your code? Your example above has the (P*R) term overflowing leading to nonsense results.

Man this is funny - you somehow thought you were using int64 in your calcs but you were not :)

hnews_account_1 · on Oct 22, 2022

Yep. 4 decimals is plenty for every application and as you said, just multiply it by a million if not. I was more addressing the issue that on some online forums people tend to think finance needs specialized 2 decimal software. Only a fraction of finance does. Most others just use the usual IEEE 754 specification that is standard in modern programming languages and OSes for their daily use. Heck, excel uses IEEE 754 for most cases and like 95% of finance runs on excel.

di4na · on Oct 22, 2022

That sound nice until you start dealing with all kind of algorithm like hyperloglog.

jqpabc123 · on Oct 22, 2022

Perfectly valid use cases for floats do exist --- but these are a lot fewer than what I see being done in the real world.

Most simple accounting applications involving time and money can be better addressed without them.

conaclos · on Oct 22, 2022

It is why we decided [2] to exclude floats as map key in bare [1].

Go follows the standard. However, this creates an issue [3]. Indeed, NaN can map to an infinite number of values. These values cannot be accessed. +0 and -0 designates a unique key.

JavaScript normalizes -0 to 0, and NaNs are normalized to a canonical NaN. This makes possible to use NaN as a key in a map.

Other implementations could rely on binary comparison.

This is a lot of pain to deal with. It is better to avoid float comparison or to restrict floats in some way.

[1] https://baremessages.org

[2] https://lists.sr.ht/~sircmpwn/public-inbox/%3CCAFFTG-a-Vci%2...

[3] https://github.com/golang/go/issues/20660

danybittel · on Oct 22, 2022

What helped me better understand floats, was playing around with them in their bit representation.

For example, if you go to https://www.h-schmidt.net/FloatConverter/IEEE754.html and enter 1.0. Then use +1 and -1 to go through ULPs.

stabbles · on Oct 22, 2022

Another nice tool is the Julia repl:

    julia> 1.4 + eps(1.4) === nextfloat(1.4)
    true

    julia> bitstring(1.4)
    "0011111111110110011001100110011001100110011001100110011001100110"

    julia> bitstring(nextfloat(1.4))
    "0011111111110110011001100110011001100110011001100110011001100111"

OscarCunningham · on Oct 22, 2022

The site I really like for exploring the binary representation of floats is https://float.exposed.

tomxor · on Oct 22, 2022

I think this is the first time I've seen a simple, approachable article that makes correct use of machine epsilon (specifically treating it as a _relative_ error)...

It's unfortunately very common for people to misguidedly treat it as an absolute error which makes no sense if you pause to think how it's derived and what it really means (the limit of the significand window, aka mantissa, i.e 2^-52 for a double)... that and the fact that if it's a constant it cannot possibly be an absolute error when the exponent is separate from the significand. There are libraries that have implemented conditions with macheps as an absolute, it's on my todo list to try convince the maintainers of such libraries that's it wrong, but it's a tricky subject to explain while also being massively underestimated by most, so I don't relish it - combined with the fact that checking against an arbitrarily very small absolute does happen to fix a lot of comparison errors - just not for the reason they think, and not reliably.

magicalhippo · on Oct 22, 2022

Floating-point numbers are tricky[1].

One point which often bites people which isn't highlighted in this article, is that due to having an explicit sign bit 0 and -0 have different encodings, but in most cases you'll want them to compare equal. So just comparing the bits won't do.

IIRC even the .Net CLR had this issue way back in the 1.1 days.

For some problems, it can often make sense to switch to arbitrary precision numbers, at least for some parts of the calculations.

[1]: https://floating-point-gui.de/

maweki · on Oct 22, 2022

You mean back in the 1.099999904632568359375 days?

arunc · on Oct 22, 2022

Probably back in 1.100000004632568359375 days.

Wait, when did HN become reddit?!

Agingcoder · on Oct 22, 2022

I do enjoy the occasional joke on hn - for reasons I don't understand they seem to be frowned upon.

vbezhenar · on Oct 22, 2022

> One point which often bites people which isn't highlighted in this article, is that due to having an explicit sign bit 0 and -0 have different encodings, but in most cases you'll want them to compare equal. So just comparing the bits won't do.

How does JS do it? (0 / -1) === (0 / 1) is true.

TAForObvReasons · on Oct 22, 2022

    const n = 0/-1, p = 0/1;
    n === p; // true
    1/n === 1/p; // false
    1/n === -Infinity; // true
    1/p === Infinity; // true
    Infinity === -Infinity; // false

IshKebab · on Oct 22, 2022

And because of that `x + 0` is not always equal to `x`.

whycombinetor · on Oct 22, 2022

Am I the only person who thinks of IEEE 754 as a completely well-defined data structure with well-defined arithmetic operations, where two numbers are equal iff their bits are equal? Tiny discrepancies don't randomly appear during float operations, and two different black box systems written from the same order of operations spec sheet (given proper compiler strict math settings) should be able to output the exact same float value given the same input.

tomxor · on Oct 22, 2022

> Am I the only person who thinks of IEEE 754 as a completely well-defined data structure

Not at all... but I think there are a large number of people who think they can imagine a better format and yet don't fully understand or appreciate much of the design of IEEE754, and they tend to be quite noisy.

I think the lack of appreciation is ironically due to how damn good a format it is. It manages to hide the myriad deficiencies through such a good balance of compromises that it casts this illusion that perfect fp math is possible in a generalisable enough way. So on the occasions it surfaces the true reality, people assume it must be some kind of defect that is easily addressable with a new format; rather than an intentional decision to maximise correctness and utility by carefully choosing where and how to be incorrect. FP math is the most underestimated areas by programmers IMO, it's just taken for granted so much, which is probably a testament to how well IEEE754 works.

constantcrying · on Oct 23, 2022

You are absolutely right. So many times I have seen people complain about this or that "flaw" in floating point arithmetic and proposing this or that alternative system. It always makes me question whether they have actually tried using these alternatives...

Giving people the illusion that they can work with real numbers on a computer by itself is pretty magical. But the fact that 32bits of data are not enough to hold information about all real numbers and that floats aren't actually real numbers should not be surprising. To me it seems most people don't even understand what floating point numbers are, if you know a little bit all the "weird" behavior starts to make sense and you understand why it has to be that way and how much thought and genius has been put into this datatype.

racingmars · on Oct 23, 2022

I'm not sure what you mean by this. It is true that two compliant systems will produce the same output for the same sequence of inputs, but that isn't the entire scope of when the concept of "equals" might matter.

> Tiny discrepancies don't randomly appear during float operations

They don't "randomly" appear, because as you say IEEE 754 is well-defined, but discrepancies _do_ appear, and if you care about the concept of equality, they certainly matter.

Take a trivial example in C:

  #include <stdio.h>
  
  int main(void) {
      volatile double a, b, c, d, e;
  
      a = 5.0;
      b = a/3.0;
      c = b/3.0;
      d = c/3.0;
      e = d*27.0;
  
      if (a==e)
          printf("EQUAL\n");
      else
          printf("NOT EQUAL\n");
  
      return 0;
  }

I think everyone would agree that 5 / 3 / 3 / 3 * 27 = 5, or in general that a / 3 / 3 / 3 * 27 = a. But this program, when executed on a system properly implementing IEEE 754, will report that a and e are not equal. Their bits are different.

Hence why when you're talking about comparing equality of IEEE 754 values, you can't just do a comparison of the bits in storage.

You're right that all (correctly implemented) systems will accumulate errors in the same way (assuming you've selected the same rounding mode, etc.), but I don't see how that negates the need to handle the fact that the representation in memory of e will not equal a even though they _should_ represent the same value. Given the sequence of operations performed, how much "slop" should you allow when comparing a to e? Certainly more than 0 which a bit-by-bit comparison implies.

whycombinetor · on Oct 23, 2022

It's no "error" (ha) that you can't exactly represent 5/3 in base 2 number system - it can only represent numbers of the form (-1)^a*2^b*c for integer a,b,c. If you are trying to perform repeated calculations that accumulate some arithmetic error at every step, then ieee754 probably isn't the best data type for the task at hand. Fractional calculators (which allow exact representation of fractions like 5/3 without being limited by base or precision) or "constructive reals" - or simply keeping track of the numerator and comparing if the sum of the numerators is equal to the denominator - would avoid such issues. As another reply to my comment said, it is about using the right data type for the task, and float is a pretty good compromise between many different needs but isn't appropriate for literally everything.

Someone · on Oct 22, 2022

> where two numbers are equal iff their bits are equal?

That’s not true. One of the features of IEEE floats is the existence of NaNs that aren’t (according to the standard) equal to themselves.

adgjlsfhk1 · on Oct 22, 2022

There are 2 problems with this. The first is that CPUs want to do out of order execution and vectorization, so not allowing reordering can come at a significant cost. The second problem is that different LIBMs and cpus will change the result of transcendental functions like exp.

macrolocal · on Oct 22, 2022

Not at all, eg. Knuth's TAOCP, Vol. 2, Section 4.2.2 A.

bicsi · on Oct 22, 2022

The story is much worse than what is presented in the article, especially when talking about floating point errors that add (or rather multiply) up.

More often than not, the error is relative wrt the greatest magnitudes in the intermediary calculations. In essence, if you subtract two floating point numbers, you’re kinda screwed, because you cannot ever handle with good precision cases like A - B, where A and B are big enough numbers. Not to mention more complicated operations like trig functions.

In my opinion, one should avoid floating point as much as possible. And not only when testing for equality (all comparisons suffer from this).

Or, of course, ignore FPEs and proceed at your own risk.

constantcrying · on Oct 23, 2022

This is not the issue. Floating point numbers have problems when A and B differ greatly in magnitude, then A-B might easily be equal to A and so ((A-B)-A)LARGENUMBER can still be equal to zero, even if B was e.g. 1.

This is not a problem with floats. This is an inherent* result of the design constraints. You can not fix it and there are no alternatives without this flaw which offer any of the same benefits as floats.

Any operation between two floats is the floating point number which is closest to the result calculated as real numbers, that is the magic of floats.

Floating point numbers are a near magical type, which for many applications are not only a good choice, but the only one which makes any sense. It is near impossible to imagine modern engineering without them.

tomxor · on Oct 27, 2022

Indeed. The only way to accommodate numbers where the difference in order of magnitude exceeds the type's significand length is to not combine them, keep them separate and operate on them separately, combining them only for a sample.

One good common example is numerical integration. In any sufficient fine grained posteriori simulation, even with modest limits on position - delta velocity can be too small to preserve when adding to position, i.e when delta velocity is more than 2^52 smaller than position. Keeping a separate accumulator is the only way to handle this without arbitrarily increasing precision with software FP.

throwamon · on Oct 22, 2022

Pyret has a pragmatic way of dealing with what they call "roughnums":

https://www.pyret.org/docs/latest/equality.html

everybodyknows · on Oct 22, 2022

Also posted today, What Every Computer Scientist Should Know About Floating-Point Arithmetic:

https://news.ycombinator.com/item?id=33299829

Links to an excruciatingly thorough treatment of the subject by David Goldberg:

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...

colinsane · on Oct 22, 2022

previous discussion: https://news.ycombinator.com/item?id=13998564

should have the 2017 tag.

hi Matt :)

archi42 · on Oct 22, 2022

Came to say "when (not) to use floats is actually the tricky part", read the article, and still say it. Not that the article is wrong, but I think that's just one of the minor issues when people use floats. If engineers would only use floats once they properly understood the concepts, they'll naturally start to wonder how to properly do comparisons (especially for equality).

To illustrate, here is some gold a friend recently showed to me: `float i = 1.F; while(i <= 10.f) { do_something_important(array[(int)i]); i+= 1.f; }`. We both died a little bit (though maybe only I did, because he sees stuff like that too often).

OscarCunningham · on Oct 22, 2022

Can someone give an example of a time you would actually want to compare floats for equality? Every time I've come across it there's always been a better way.

jrootabega · on Oct 22, 2022

As I understand the principles in the article:

- The representable values are so dense near zero that there will be less practical difference between strict equality and approximate equality. Strict equality would be theoretically significant if you found it, but not really necessary to check for. You'll see mostly false negatives.

- The representable values are so sparse far away from zero that strict equality isn't really valuable theoretically or practically. You'll see mostly false positives.

So perhaps there is a region (the location and size of it depending on what you're modelling), where strict equality is just likely enough that you need to check for it, AND that the ULP at that point is meaningful for your application. But philosophically, in that situation, are you really comparing them for equality, or does your epsilon just happen to be zero? Would it be better to interpret your question as: are there situations where you want to check for strict equality no matter what either of the values are?

VBprogrammer · on Oct 22, 2022

I think the most common case, though it's wrong in a multitude of ways, is when you accidentally end up with a float representing a money amount.

Things which are genuinely representable as a float value testing for equality is very rarely useful.

cm2187 · on Oct 22, 2022

I suspect this is what was causing some linear optimisation libraries (simplex style) to fail to find a solution when I had a problem with inputs in range 0.0-1.0, but outputs expressed in hundreds of billions. If I scaled down my problem by dividing all the coefficients by one million, I am solving the same problem mathematically, but now those solvers are then stable and successful. I suspect they must be using some fixed epsilon to determine if the solved solution is close enough to the objective.

jillesvangurp · on Oct 22, 2022

I have some geospatial algorithms in my geogeometry library. Unit testing those, you often need to compare coordinates. A good way to do that is by calculating their distance. Equality is really not that relevant with many of these algorithms. Almost everything that you do to those numbers has rounding errors.

If you think about it, people use 64 bit doubles for representing latitude and longitude. They don't have a lot of precision. A second of a degree is about 25-30 meters (depending on where you are. that's 1/3600th of a degree. So five decimal points gets you there. With 64 bit numbers, you can probably get to cm/mm level accuracy. Which is plenty considering GPS is not accurate to more than about 5 meter.

And distance algorithms are tricky too; they are not that exact. The earth isn't a perfect sphere.

So, you are dealing with an inexact representation of doubles that weren't very exact to begin with and the algorithms add more errors. The solution is to not compare equality but distance with some margin of error. So, subtract your expected and actual values and assert that their difference should be less than whatever margin of error is acceptable to you.

adgjlsfhk1 · on Oct 22, 2022

note that this is a perfect example where you should use scaled integers. you'd get 4000x more precision.

jillesvangurp · on Oct 23, 2022

Data comes as doubles. No choice in the matter there. Conversions just add to the problem.

philh · on Oct 22, 2022

SQL queries sometimes put floats in GROUP BY. E.g. if you have a many-to-one relationship you might do a query like

    SELECT foo_id, foo.some_float, SUM(bar.some_thing)
    FROM foo JOIN bar USING (foo_id)
    GROUP BY foo_id, foo.some_float

I feel kinda dirty whenever I do this.

Though, I would guess the optimizer (at least in postgres) is smart enough to ensure no float equality checks actually happen under-the-hood. They could be necessary, if the schema was different than I'm imagining; but maybe in that case, it would almost always be a bad idea.

cm2187 · on Oct 22, 2022

Why would you feel dirty? In this case it is solving for exact equality, ie the same bits, it doesn't matter that the value is a float.

Though I have seen some people using a double as a primary key (no idea why) and some database engine (internal, not major vendor) failing to do equality comparisons in certain statements, I suspect because they must be switching to "close enough" which is not what you expect when you write col1 = col2.

blep_ · on Oct 22, 2022

This is also really kind of an artifact of how GROUP BY works in most database engines.

I've always liked the way MySQL/MariaDB let you omit things from the GROUP BY if they're provably unique in each group (here, if foo_id is a primary key of foo, and you're grouping by it, there can only ever be one foo.some_float for each foo_id).

I suspect in practice this would get rid of approximately all occurrences of group-by-float.

jbay808 · on Oct 22, 2022

Perhaps when you are unit testing a math function on a known input, for a specific platform, and there is a specific value that you are expecting it to return; anything else is an error.

enriquto · on Oct 22, 2022

> Can someone give an example of a time you would actually want to compare floats for equality?

When they are integers. By construction, floats provide exact arithmetic for small ints. It makes sense to compare those for equality.

Also, when defining a ``sign'' function, you may want to treat the equality case separately, to avoid introducing biases when the input data is quantized.

cm2187 · on Oct 22, 2022

Also should a negative zero be strictly equal to zero?

enriquto · on Oct 22, 2022

They have to be different if you want to have positive and negative infinity.

cm2187 · on Oct 22, 2022

yes, but if you want to check that a-b = 0.0, you want 0+ to be equal to 0-.

kps · on Oct 22, 2022

In IEEE754, yes.

cypress66 · on Oct 22, 2022

In gamedev many things are floats, and exactly 0.0f can mean basically "null". It's not uncommon to see x != 0.0f. Other than 0.f, basically any other value if for some reason has to be compared, it's done with some margin of error like abs(x - y) < 0.0001f

NovemberWhiskey · on Oct 22, 2022

e.g. I'm writing a regression test for a derivative pricing function.

cygaril · on Oct 22, 2022

I've done that many times and never used exact equality comparisons.

If you do exact comparisons for any non-trivial cases, you'll find different compilers, optimization settings, runtimes, and processors give different results.

andreygrehov · on Oct 22, 2022

I'm not familiar with all the floating-point number gotchas, but would like to ask why an obvious solution is not used (or maybe it is?): convert both numbers to a string and compare the result? This might be a naive question, but still.

Someone · on Oct 22, 2022

Converting a IEEE float to a string is tricky, too, IMO more tricky.

It would be a lot slower, too, and wouldn’t work. How do you define “compare the result”? The strings “0.0” and “-0.0” would have to be equal to each other, and the string “NaN” not equal to itself, “1000” larger than “0.999E3”, etc.

constantcrying · on Oct 23, 2022

It is slow and doesn't really solve the problem. Representing the numbers in base 10 does not make it any clearer whether the numbers are equal.

travisjungroth · on Oct 22, 2022

I try to not worry about if my method of comparing floating point numbers is perfect. I figure close enough is close enough.