Summary
The job involves working as a DPA performance intern on a project to improve SOC-level performance per watt using memory management innovations. The position requires knowledge in computer architecture, performance modeling, and analytical models, as well as proficiency in C or C++ and scripting languages like Python.
Requirements
- Knowledge in one or more of the following areas: computer architecture, performance modeling, and analytical model
- Knowledge and experience with common LLM (Large Language Model) workloads
- Proficiency in C or C++, and scripting languages such as Python
- Current EE or CS master or Ph.D students with computer architecture backgrounds
Responsibilities
- Responsible for an analytical model implementation of LLM inference and training memory usage
- Responsible for running the performance simulation to extract the workload's characteristics such as memory footprint and bandwidth requirement
- Responsible for evaluation ideas for performance improvement
Preferred Qualifications
- Experience with high-level simulators for performance or power estimation is a plus
- Knowledge in server-class GPU/ML architecture is a plus