More

winwang · 2025-05-14T19:45:52 1747251952

I'm building another Spark-based choice now with ParaQuery (GPU-accelerated Spark): https://news.ycombinator.com/item?id=43964505

winwang · 2025-05-14T19:45:01 1747251901

Not a new way like Ray, but a new way to express Spark super-efficiently (GPU-acceleration): https://news.ycombinator.com/item?id=43964505

winwang · 2025-05-14T18:28:57 1747247337

Do you guide the LLM to do this specifically? So it doesn't "waste" time on what can be taken care of by static analysis? Would be interesting if you could also integrate traditional analysis tools pre-LLM.

winwang · 2025-05-14T18:27:38 1747247258

Not the OP but -- I would immediately believe that finding bugs would be a valuable problem to solve. Your questions should probably be answered on an FAQ though, since they are pretty good.

winwang · 2025-05-14T18:23:57 1747247037

One possibility is crafting (somewhat-)minimal reproductions. There's some work in the FP community to do this via traditional techniques, but they seem quite limited.

winwang · 2025-05-14T05:20:25 1747200025

Well, I was rejected 4 times (once for a completely unrelated idea though).

The reason I got in was because I actually got traction, but I'm sure my successive failed applications helped a teeny bit at least.

So... pretty straightforward... I think?

winwang · 2025-05-13T18:20:39 1747160439

Indeed I have, though I have some reservations about its focus on interaction nets (or rather, its marketing).

I ended up making a CUDA-based, data-parallel STLC typechecker (Hindley-Milner)... I want to formally prove its correctness first, but maybe a blog post would be okay either way.

winwang · 2025-05-13T18:12:38 1747159958

Depends on the workload. Spark can persist dataframes to the cluster as normal. GPUs can also load certain datasets significantly faster.

winwang · 2025-05-13T07:05:46 1747119946

If you stand up your compute cluster in the same region as your bucket, there are no egress fees. Otherwise, yes, in general. There are some clouds that don't have egress fees though, i.e. Cloudflare R2.

winwang · 2025-05-13T03:36:43 1747107403

1. Yes! Would love to contribute back to these projects, since I am already using RAPIDS under the hood. My general goal is to bring GPU acceleration to more workloads. Though, as solo founder, I am finding it difficult to have any time for this at the moment, haha.

2. Hmm, maybe I should mention that we're not "accelerating all operations" -- merely compatible. Spark-RAPIDS has the goal of being byte-for-byte compatible unless incompatible ops are specifically allowed. But... you might be right about that kind of quirk. Would not be surprising, and reminds me of checking behavior between compilers.

I'd say the default should be a focus on compatibility, and work through any extra perf stuff with our customers. Maybe a good quick way to contribute back to open source is to first upstream some tests?

Thanks for your great questions :)