Hacker Newsnew | past | comments | ask | show | jobs | submit | more winwang's commentslogin

I'm building another Spark-based choice now with ParaQuery (GPU-accelerated Spark): https://news.ycombinator.com/item?id=43964505


Not a new way like Ray, but a new way to express Spark super-efficiently (GPU-acceleration): https://news.ycombinator.com/item?id=43964505


Do you guide the LLM to do this specifically? So it doesn't "waste" time on what can be taken care of by static analysis? Would be interesting if you could also integrate traditional analysis tools pre-LLM.


Not the OP but -- I would immediately believe that finding bugs would be a valuable problem to solve. Your questions should probably be answered on an FAQ though, since they are pretty good.


One possibility is crafting (somewhat-)minimal reproductions. There's some work in the FP community to do this via traditional techniques, but they seem quite limited.


Well, I was rejected 4 times (once for a completely unrelated idea though).

The reason I got in was because I actually got traction, but I'm sure my successive failed applications helped a teeny bit at least.

So... pretty straightforward... I think?


Indeed I have, though I have some reservations about its focus on interaction nets (or rather, its marketing).

I ended up making a CUDA-based, data-parallel STLC typechecker (Hindley-Milner)... I want to formally prove its correctness first, but maybe a blog post would be okay either way.


Depends on the workload. Spark can persist dataframes to the cluster as normal. GPUs can also load certain datasets significantly faster.


If you stand up your compute cluster in the same region as your bucket, there are no egress fees. Otherwise, yes, in general. There are some clouds that don't have egress fees though, i.e. Cloudflare R2.


1. Yes! Would love to contribute back to these projects, since I am already using RAPIDS under the hood. My general goal is to bring GPU acceleration to more workloads. Though, as solo founder, I am finding it difficult to have any time for this at the moment, haha.

2. Hmm, maybe I should mention that we're not "accelerating all operations" -- merely compatible. Spark-RAPIDS has the goal of being byte-for-byte compatible unless incompatible ops are specifically allowed. But... you might be right about that kind of quirk. Would not be surprising, and reminds me of checking behavior between compilers.

I'd say the default should be a focus on compatibility, and work through any extra perf stuff with our customers. Maybe a good quick way to contribute back to open source is to first upstream some tests?

Thanks for your great questions :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: