How to Compile Triton Kernels Without GPU

Posted on 2024-12-25 In tech
Symbols count in article: 618 Reading time ≈ 2 mins.

Recently, I encountered a situation where I needed to contribute to Triton without access to any GPU.
The only device I had was my M3 Mac. After some experimentation, it turns out that Triton has a well-designed abstraction layer, and you don’t need a GPU if you only want to work at the compiler level (i.e., MLIR and LLVM).

The method is present in the codebase, but it’s not officially documented. I hope this guide will save someone else’s time.

Package Conflict of Triton in PyTorch ROCm PYPI Distribution

Posted on 2024-10-14
Symbols count in article: 184 Reading time ≈ 1 mins.

TL;DR

When you install PyTorch on the ROCm platform for an AMD GPU, a pytorch-triton-rocm package will be installed along with it. If you also install Triton at the same time, you may encounter a version conflict, which can be tricky to resolve.

Make Your Win Behave Like Mac: Use Mac Magic TracePad and HHKB on a ThinkPad

Posted on 2024-09-29 Edited on 2024-10-11 In tech
Symbols count in article: 1.4k Reading time ≈ 5 mins.

Recently, for some reason(sadly), I have to use ThinkPad and Windows for work. It has been a long time since my last time using a Windows laptop. But with some plugins and key remapping, I managed to imitate a mac-like experience on Windows.

Python Multiprocessing: Another Reason to Use multiprocessing.Process Instead of os.fork

Posted on 2024-06-06 Edited on 2024-10-11 In tech
Symbols count in article: 1.5k Reading time ≈ 6 mins.

This article is a translation by ChatGPT4o, check this out if you read Chinese.

TL;DR

When you spawn processes with multiprocessing.Process and select fork as the start method, there are additional operations performed besides just invoking os.fork, such as invoking some after-fork hooks registered by other objects. You can’t trigger these hooks if using os.fork directly, potentially leading to errors.

The GREASE Mechanism in TLS

Posted on 2024-06-06 Edited on 2024-10-11 In tech
Symbols count in article: 884 Reading time ≈ 3 mins.

This article is a translation by ChatGPT4o, check this out if you read Chinese.

A few days ago, while reading about JARM, a novel TLS server fingerprinting tool proposed by Salesforce, I noticed they used a choose_grease() function when constructing the TLS ClientHello record, which drove me to look into this GREASE mechanism.

About Me

Posted on 2024-04-01 Edited on 2024-10-11
Symbols count in article: 65 Reading time ≈ 1 mins.

My name is Qian Wang(pronounced “Chien Wong”), but you can call me Kyle. I’m a software engineer, focusing on developing high-performance deep learning infrastructures.

After earning my bachelor’s degree from Tsinghua University, I spent several years working as a software engineer in Beijing.

Then I moved to the US to take a graduate program at UCLA, from which I will graduate in 2024.