Abstract: This paper presents Adelia, an efficient inference chip for large language models (LLMs) featuring a streamlined data-flow and dual-mode parallelization. The streamlined dataflow directly ...