Background and Related Work Quantization Techniques for Efficient LLM Inference and Training The rapid increase in the size of large language models (LLMs) has necessitated the development of efficient quantization techniques to reduce memory and computational costs while preserving performance.