0%

Recently, I encountered a situation where I needed to contribute to Triton without access to any GPU.
The only device I had was my M3 Mac. After some experimentation, it turns out that Triton has a well-designed abstraction layer, and you don’t need a GPU if you only want to work at the compiler level (i.e., MLIR and LLVM).

The method is present in the codebase, but it’s not officially documented. I hope this guide will save someone else’s time.

Read more »

TL;DR

When you install PyTorch on the ROCm platform for an AMD GPU, a pytorch-triton-rocm package will be installed along with it. If you also install Triton at the same time, you may encounter a version conflict, which can be tricky to resolve.

Read more »

This article is a translation by ChatGPT4o, check this out if you read Chinese.

TL;DR

When you spawn processes with multiprocessing.Process and select fork as the start method, there are additional operations performed besides just invoking os.fork, such as invoking some after-fork hooks registered by other objects. You can’t trigger these hooks if using os.fork directly, potentially leading to errors.

Read more »

This article is a translation by ChatGPT4o, check this out if you read Chinese.

A few days ago, while reading about JARM, a novel TLS server fingerprinting tool proposed by Salesforce, I noticed they used a choose_grease() function when constructing the TLS ClientHello record, which drove me to look into this GREASE mechanism.

Read more »

My name is Qian Wang(pronounced “Chien Wong”), but you can call me Kyle. I’m a software engineer, focusing on developing high-performance deep learning infrastructures.

After earning my bachelor’s degree from Tsinghua University, I spent several years working as a software engineer in Beijing.

Then I moved to the US to take a graduate program at UCLA, from which I will graduate in 2024.