Motivation The size of state-of-the-art large language models (LLMs) and multi-modal models has continued to increase rapidly, making it challenging to deploy them efficiently on limited hardware resources.