1. Stockfish Trainer

    • Doubled GPU throughput by porting Mixture of Experts CUDA kernels from LLMs, writing custom sparse forward and backward TileLang kernels, overlapping H2D transfers, and fusing optimizer steps
    • Enabled multi-GPU and multi-node training on SLURM clusters using Ray, PyTorch Lightning, and NCCL
    • Increased CPU dataloading & input encoding throughput by 10x through lockfree rust rewrite
    • Ran experiments on application of SOTA ideas like Mixture of Experts to Stockfish & Chess NNUEs
  2. Brainrot.mov

    • Grew an educational developer focused account (@aws_peter) to 70k+ followers, 4.5M+ views, and 1.6M unique viewers by teaching concepts like Docker, Kubernetes, AWS Lambda, and Infrastructure as Code
    • Enabled 1500+ users to generate 25M+ views and 200k+ followers, with thousands in revenue
    • Created a React app for generating short form videos using AI voices and custom images, deployed to AWS
  3. Open Source Contributions

  4. AlphaPaint

    • Implemented a best first minimax agent in Rust and placed 3rd out of 100+ teams in Bytefight 2026
    • Built a custom AlphaZero style training system using Rust, custom lockfree datastructures, CUDA Graphs, PyTorch, and CUDA kernels via TileLang to achieve 100x speedup over naive implementation
    • Ran selfplay across 300 parallel CPU cores on SLURM cluster to generate data for a genetic tuner
resume
linkedin
github
projects
music
books