Oh good, that will be fun. Here's C++ code #include <iostream> #include <cstdint...

jqpabc123 · on Oct 27, 2022

If you cannot fix it, care to explain how your code works only with int64 as you claim here and in other threads on this page?

Yes, your code pukes all over itself. And mine doesn't. Why?

For more than 2 decades now, Intel processors have included SSE "extensions" with a whole bank of 128 bit registers (XMM0 thru XMM15) with specialized math instructions for integer and floating point.

https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

The compiler I use emits SSE opcodes by default for operations on 64 bit integers when building 64 bit executables. In other words, 128 bit processor registers are being used for the calculations. Overflow occurs when the final resultant is too large for int64.

So there you have it.

SideQuark · on Oct 27, 2022

>For more than 2 decades now, Intel processors have included SSE "extensions" with a whole bank of 128 bit registers (XMM0 thru XMM15) with specialized math instructions for integer and floating point.

That's interesting, since Intel did not add 128 bit wide integer math in SSE. Those 128 bit registers were SIMD, meaning multiple data, meaning at most 64 bit integers. Later extensions (not two decades back) added larger registers. I wrote the article for Intel for the release for AVX in 2011 [2], where Intel expanded the registers to 256 bit (still no 128 bit integer math). But there has certainly not been 128 bit wide integer operations for two decades. There has been 128 bit registers split into at most 64 bit sized components, the M in SIMD. I also wrote decompilers for Intel binaries, and just looked through that code, and again, no 128 bit integers. Are you confusing 128 (or 256 or 512) bit wide registers that are split into components as actually 128 or 256 or 512 bit integer registers? Are you making stuff up?

Intel intrinsics denote register size and SIMD size. For example, _mm256_add_epi16 means use 256 bit registers, and add packed integers of 16 bit size. There are no _epi128 intrinsics, only size 8,16,32,64 [1]. Another interesting place to look for these 128 bit integer instructions is [3]. Lots of IntN for N = 8,16,32,64, none for N=128. Here's [4] the Intel Software Development Manual from April 2022.... Also not seeing them. Section 4.6.2 lists supported data types - not a single 128 bit integer on that page. I don't see them in the AVX and other extension sections either.

So I'm super interested in your compiler and language that automatically emits 128 bit wide integer SIMD instructions for Intel, since they are not in the opcode or intrinsic lists. Please name the language and compiler, and even post some working code to demonstrate this auto extension to 128 bit math.

And, if you're using 128 bit registers, why would you pick 64 bit math, which fails for all the cases above? You still have not addressed that any size register fails on the examples I posted above, including your auto-extending non-portable compiler tricks.

For example, $100000,8000,365,100 fails even on 128 bit, 256 bit, even infinite bit length registers. Because your algorithm itself is bad.

So, care to post your compiler, language, and code? Also, why did you keep telling us it was 64 bit when it wasn't?

[1] https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

[2] https://hpc.llnl.gov/sites/default/files/intelAVXintro.pdf

[3] https://www.felixcloutier.com/x86/index.html

[4] https://www.intel.com/content/www/us/en/developer/articles/t...

jqpabc123 · on Oct 27, 2022

For example, $100000,8000,365,100 fails even on 128 bit, 256 bit, even infinite bit length registers. Because your algorithm itself is bad.

Really? So now we've progressed from $100 thousand to $100 million to the national budget?

Anything that can't handle the national budget is "bad"?

Every algorithm "fails" when pushed beyond it's limits --- even the ones you use based on double precision floats but they do so silently by losing precision in the mantissa which is only 52 bits.

Out of sight, out of mind don't mean it's always "right". By the standard you're applying, your own algorithm itself is equally "bad".

SideQuark · on Oct 27, 2022

So, where are your 128 bit SSE instructions? What compiler? What language?

Interestingly, Intel's own compiler, when operating on int128, does not emit these instructions you claim exist (you can check it on Godbolt.org and look at disassembly). Maybe you should tell them about these instructions.

Why does your routine fail for simple cases that the floating point does not?

Please stop deflecting. Can you post code, compiler, and language or not?

>Really? So now we've progressed from $100 thousand to $100 million to the national budget?

That example is for a $100,000 future value where your algorithm fails. It is not national budget.

Did you even try the examples I demonstrated where your algorithm fails?

>By the standard you're applying, your own algorithm itself is equally "bad".

Yet it's incredibly faster, does not rely on lying about mythical instruction sets, and handles simple cases yours didn't, even cases you claimed yours did handle.

Oh, and it uses honest 64 bit hardware.

So, code and compiler to demonstrate your SSE claims, or this thread has demonstrated what I expected it to.

SideQuark · on Oct 29, 2022

Ah, so no reply on your compiler and language that makes 128 bit SSE code? Makes sense, since the instructions you claimed to use don't exist.

I wanted to test to see if I can even find cases where your algorithm works but the normal floating point one doesn't, and made a neat discovery.

*Your algorithm fails significantly in every range I test it.*

Here's a simple example: pick a normal loan, say 5%, 5 years, compounded monthly, and check your algorithm for every loan value in $1000 to $2000. Such small numbers, you'd think even your algorithm would work. No int64 overflows in sight.

It fails for 333 values in this range. The first is few are $1000.34, $1006.41, $1007.01; the last few are $1993.71, $1999.18, $1999.78.

Test these :)

In fact, no matter what reasonable rates, compoundings, and time lengths I put, for any range of principals, your routine fails. Try it: pick R,C,Y, a starting P value, then test, add 1 to P, test again, and you will fail over and over and over. The double based method works. Amazing.

Another example, try 7%, monthly, 8 years, $10,000 to $20,000, and you get 6337 errors. Largest failures are at $19,998.43, 19,999.58. Smallest at 10,013.60, 10,014.75.

No failures for the double based code.

Every single test I try like this, yours has a spectacular number of failures, the double has none.

So you can try to add more scaling, which breaks other parts. If you carefully analyze, you can prove that your method will fail for reasonable values no matter what scaling tricks you try and play. For fixed precision you simply will lose too much off the top or from the bottom for rates used in mortgages. You honestly need a sliding accuracy to make it work in 64 bits.

None of these values fail for the double based routine.

On the front of trying to find cases where one routine fails but the other doesn't, I set the random ranges large enough to make both routines fail from time to time to get errors, then checked to see where yours might work and the floating point fails.

Here is what I get from 10,000 runs :)

    total 10000 runs, 
        double failed 4861 = 48.6%, 
        scaled failed 8455 = 84.5%, 
        double failed & scaled didn't 2 = 0.0%, 
        scaled failed & double didn't 3596 = 36.0%

I guess that puts the nail in the coffin, right? Yours fails on every range, and out of this 10,000 random value test, yours failed 3596 times the double one didn't. The double one failed only 2 times that yours didn't. Both failed a lot overall. This test is how I discovered that yours actually fails in places it seemingly should not, like everywhere.

Did you ever test yours?

"My standard, generalized, library routine is equally brief and works for amounts up to $100 billion with any interest rate expressed out to 3 decimals"..... I hope you're not using this for anything real world!

This thread is my new go to for an example when I teach numerical methods stuff, to show people that naïve trying to beat floating point pretty much always fails.

Now it's completely transparent to anyone reading this far why using fixed point is almost always a terrible idea, even for simple things like naïve future value calculations, even when an absolutely certain master of fixed point like yourself claims it and even provides an "algorithm."

I take it you have no code to share? :)

This concludes the testing of the code.