Hi! I can’t help but see a big similarity to the JVM, both in terms of design go...

maxime_cb · on Feb 25, 2023

For one the JVM is a huge piece of software. Large enough that only a large corporation could realistically reimplemented or maintain it. It also exposes many APIs with a large surface area. Then there's the issue of Oracle and how you feel about them as a company.

UVM has obviously nowhere near the ecosystem, but you can draw pixels to a frame buffer with two function calls, and your UI will be guaranteed to look the same everywhere.

kaba0 · on Feb 25, 2023

JVM is a specification which can be implemented (and has been plenty of times, completely independently of each other) by a single developer in like half a years tops. It is a simple stack-machine with 100+ basic instructions, a simple exception mechanism and a heap. A GC is not even necessary if we are talking about minimalistic approaches, but a basic tracing GC is also not hard.

It is only a (huge) plus that it can be run everywhere with top of the line performance thanks to OpenJDK (which is mostly developed by Oracle, but is big enough that an insane amount of companies critically depend on it and several one could single-handedly finance the future of the platform if anything were to happen, which won’t because it has the same license as Linux).

ternaryoperator · on Feb 25, 2023

It's not close to as easy as you represent. To start with, there are 204 instructions, some of them, far more complex than what you term "basic," such as invokedynamic. The exception mechanism is also far from "simple," -- it's simple conceptually but extremely difficult to get exactly right when it involves finally clauses both in the exception handler and the original excepting code. There are many subtleties that can lead to completely wrong results if not designed very, very carefully. It's far from "simple."

kaba0 · on Feb 25, 2023

Out of those 204 plenty are the exact same functionality for different types though.

Sure, there is invokedynamic/static/virtual that is a bit more complicated (they basically do runtime linking at first run), but I have implemented them and it is not harder than other pieces of a runtime.

Every method has an exception handler description, which is basically a series of instruction address ranges — if the thrown exception came from there, it jumps to the handler specified by the first match. If not, it propagates up. “Finally” clause is syntactic sugar only.

Sure, these are hard to get right, but that is inherent in the domain to a degree. You need many many “integration” tests - I wrote a test runner that runs the same program with OpenJDK and my implementation and compared their outputs.

maxime_cb · on Feb 25, 2023

Implementing a good GC is incredibly hard. The JVM may have just "100+ basic instructions", but it also has classes, objects, arrays, and a whole set of APIs it provides. Your JVM is kind of useless if it doesn't ship with all of the user interface primitives (and other APIs/classes) people expect, for instance. Otherwise what you have is not what people expect to find in a JVM.

I'm also under the impression that building a good JIT for a JVM would be a massive undertaking. It literally took over a decade for the Sun/Oracle JVM's JIT to become mature enough.

I've designed UVM in a way that I believe it will be possible to design a good JIT with relatively little effort

kaba0 · on Feb 25, 2023

> Implementing a good GC is incredibly hard

If you are interpreting instructions a simple one will be more than enough. Classes are its primitives, you just create a basic runtime representation for them with name, superclass, implemented interfaces and the methods’ bytecodes. Then an object can be as simple as a header containing a pointer to the class’s representation and then a listing of its fields, which can all be 64bits. And an array can have the exact same representation as well, with the first element being its size.

All APIs are just classes with some methods that may be “native”, which are linked to a native implementation (basically just a function pointer). This is how file access and the like becomes possible.

> Otherwise what you have is not what people expect to find in a JVM

You can just say that it is a partial implementation that doesn’t support the whole of the Java standard lib. It’s not unheard of (e.g. Java ME is a subset that runs on every SIM and bank card).

> I'm also under the impression that building a good JIT for a JVM would be a massive undertaking

Well, then just go with an okayish JIT. With all due respect, you ain’t going to beat the JVM with your UVM’s JIT compiler, not even close. Why do you think creating a similarly good JIT compiler to a very similar design would be any easier in case of UVM?

But don’t get me wrong, I just ask these questions because I dislike NIH syndrome and I believe there are useful lessons to be learned from the past. But you should be able to answer why the thing you do is any different (unless it is for learning). And the JVM spec is a surprisingly good read, and you can sure take great ideas from that.

maxime_cb · on Feb 25, 2023

> With all due respect, you ain’t going to beat the JVM with your UVM’s JIT compiler, not even close.

I think I may be able to get very close to native performance. I don't want to sound like an asshole by appealing to authority, but you aren't talking to a teenager writing an interpreter from their parent's basement. I have 21 years of programming experience, a PhD in compiler design and multiple published papers. I have some idea what I'm talking about.

> Why do you think creating a similarly good JIT compiler to a very similar design would be any easier in case of UVM?

The design is superficially similar to the JVM but it's also quite different. UVM's bytecode is untyped. It maps fairly directly to the x86-64 and ARMv8 instruction sets. If you want an idea of how a simple JIT compiler for a bytecode like that can perform, you should look at the performance of Apple's Rosetta. But, I actually think I can build something that yields better performance than that :)

jart · on Feb 26, 2023

You say this stuff is incredibly hard. Kaba says it's impossible and don't try. I say it's easy. I whipped up a JIT to make my virtual machine go 50x faster. I never expected Blinkenlights to go faster than Bochs. Now all the sudden it's outperforming Qemu for many of my use cases. People are doing stuff with it I never expected, like booting the Linux Kernel and running Alpine Linux on Cygwin. Garbage Collection is easy too. I wrote a GC using the NSA POPCNT instruction for an experimental LISP dialect I wrote last Winter called Plinko. It ran faster than any other LISP interpreter I've seen, as measured by the GC-intensive binary trees benchmark game. The only thing faster was SBCL with JIT which was only a hair faster than Plinko using just an interpreter. NIH is awesome because the truth is, when you're focused on your own needs, outperforming the big official things is like shooting fish in a barrel. Technologies like the JVM aren't great because they're better. They're great because they've carefully crafted the long tail of edge cases and compromises that enables it to be good enough for the largest group of people. Generalized software is at a huge disadvantage because bloat fills caches and it can't use special case algorithms. For example, people publish papers all the time bragging about how they beat the performance of the C++ STL at some given thing and that impresses the people who never tried, but it honestly isn't that hard if you consider the burdens that the STL is required to carry.

kaba0 · on Feb 25, 2023

I eagerly await your results, and didn’t want to sound condescending at all, sorry if it came across like that. I was just genuinely interested in a - to me - more understandable difference.

Also, what does “native performance” even mean here? Only removing the interpreter overhead?

maxime_cb · on Feb 25, 2023

Thanks for clarifying. Tone is sometimes ambiguous via text.

At the moment I'm in no rush to actually write the JIT compiler because I think it's faster to iterate with an interpreter. I want to flesh out the VM and its APIs, test the hell out of everything and develop the system a bit more first.

The interpreter runs at something ~400 million instructions per second on my laptop, which is probably close to the performance of an old school Pentium 2 chip, so it's actually fast enough to run a lot of non-trivial software. With even a really basic JIT I should be able to hit 10x that throughput. I've benchmarked code out of GCC and it runs about 27 times faster (on a microbenchmark).

ternaryoperator · on Feb 25, 2023

The folks at the Jacobin JVM [0] project (a JVM written in Go) are working on the issue of size and the ability to have a fully functional JVM maintained by a small group of developers. Right now, per the latest post [1], they can run simple classes and expect to complete the interpreter in the next few months.

[0] jacobin.org [1] http://binstock.blogspot.com/2023/02/jacobin-jvm-at-18-month...