-
Stockfish Trainer
- Doubled GPU throughput by porting Mixture of Experts CUDA kernels from LLMs, writing custom sparse forward and backward TileLang kernels, overlapping H2D transfers, and fusing optimizer steps
- Enabled multi-GPU and multi-node training on SLURM clusters using Ray, PyTorch Lightning, and NCCL
- Increased CPU dataloading & input encoding throughput by 10x through lockfree rust rewrite
- Ran experiments on application of SOTA ideas like Mixture of Experts to Stockfish & Chess NNUEs
-
Brainrot.mov
- Grew an educational developer focused account (@aws_peter) to 70k+ followers, 4.5M+ views, and 1.6M unique viewers by teaching concepts like Docker, Kubernetes, AWS Lambda, and Infrastructure as Code
- Enabled 1500+ users to generate 25M+ views and 200k+ followers, with thousands in revenue
- Created a React app for generating short form videos using AI voices and custom images, deployed to AWS
-
-
AlphaPaint
- Implemented a best first minimax agent in Rust and placed 3rd out of 100+ teams in Bytefight 2026
- Built a custom AlphaZero style training system using Rust, custom lockfree datastructures, CUDA Graphs, PyTorch, and CUDA kernels via TileLang to achieve 100x speedup over naive implementation
- Ran selfplay across 300 parallel CPU cores on SLURM cluster to generate data for a genetic tuner