This is not directly relevant, but Power ISA 3.1 (from May 2020) has 128 bit division operations (vdivuq and vdivsq) with a maximum latency of 61 cycles. I don't have access to a Power 10 machine to see how it compares to what's presented here, but I thought it was an interesting addition to the ISA.
https://wiki.raptorcs.com/w/images/f/f5/PowerISA_public.v3.1... https://files.openpower.foundation/s/EgCy7C43p2NSRfR