8bit.tr Journal

Efficiency

4 articles tagged with Efficiency.

December 31, 2025

Mixture of Attention Routing: Smarter Context Allocation at Scale

A technical exploration of attention routing strategies that allocate context budget to the most relevant tokens.

December 27, 2025

A practical guide to compressing LLMs with quantization, pruning, and distillation while preserving quality.

December 25, 2025

A technical guide to sequence parallelism and how it improves training efficiency for long-context models.

December 15, 2025

A technical guide to reducing energy use and carbon impact in LLM training and inference.