Treffer: Edge AI Inference Optimization: Quantization and Pruning on Resource-Constrained Platforms.
Weitere Informationen
The deployment of sophisticated artificial intelligence models on resource-constrained embedded systems presents fundamental challenges in balancing computational efficiency with accuracy preservation. Contemporary edge devices, including ARM Cortex-A processors and automotive electronic control units, operate under severe limitations of memory capacity, computational throughput, and power budgets that preclude direct deployment of standard floating-point neural networks. Quantization techniques systematically reduce numerical precision from 32-bit floating-point to 8-bit or binary integer representations, achieving compression ratios exceeding 50× while maintaining accuracy within acceptable degradation thresholds. Post-training quantization enables direct conversion of pre-trained models without retraining requirements, while quantization-aware training adapts network representations to accommodate extreme precision reduction. Pruning methodologies exploit overparameterization in neural architectures through selective parameter elimination, with magnitude-based approaches achieving sparsity levels of 80-90% and structured pruning variants enabling hardware acceleration on conventional processors. Hardware-aware optimization strategies align sparsity patterns with SIMD execution units and memory access characteristics, maximizing inference throughput on embedded platforms. Empirical validation in ARM Cortex-A processors and Raspberry Pi systems demonstrates practical deployment of vision and language models within tight resource envelopes, achieving previously unattainable real-time inference performance on cost-effective embedded hardware. The convergence of efficient neural architecture, aggressive model compression, and platform-specific optimization enables the democratization of artificial intelligence capabilities in value-sensitive applications in the automotive, industrial, and consumer domains. [ABSTRACT FROM AUTHOR]