3
FlexGen - Running large language models like OPT-175B/GPT-3 on a single GPU (github.com)