Foundation Models

Analyzing and optimizing the core architectures of Large Language Models (LLMs) and Multimodal models.

Rewiring Transformers

Investigated methods to "rewire" internal self-attention mechanics of transformers to optimize for specialized regression tasks. In this research, we adapted attention masks to prioritize technical terminology over generic syntactic words in cybersecurity texts, optimizing vulnerability analysis.

  • Implemented custom attention heads in PyTorch.
  • Modified context weightings to prevent degradation on long context sequences.

Multimodal Infrastructure & Serving

Deployed and managed production LLM/VLLM systems for automated tasks at scale:

  • shoppin' AI stack: Integrated and deployed CLIP/VLLM models to handle semantic matching across catalogs of 30-40 million items. Deployed these instances on AWS SageMaker.
  • endorphind Video Generation: Built pipelines using Wan 2.2/2.6, LatentSync, and InfiniteTalk (GGUF) for generative avatar lip-syncing and video rendering.