Treffer 1 - 20
von 689
- 1
- 2
Seite in der Trefferliste auswählen
GPU Array Access Auto-Tuning
Weber, Nicolas ; Weber, Nicolas
A fast integral image generation algorithm on GPUs
Dang, Qingqing ; Yan, Shengen ; Wu, Ren
2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on. :624-631 Dec, 2014
A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking
Midorikawa, Hiroko ; Tan, Hideyuki
2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) Communications, Computers and Signal Processing (PACRIM), 2017 IEEE Pacific Rim Conference on. :1-6 Aug, 2017
(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms.
Rasch, Ari
ACM Transactions on Programming Languages & Systems. Sep2024, Vol. 46 Issue 3, p1-74. 74p.
Optimization Techniques for GPU Programming.
HIJMA, PIETER ; HELDENS, STIJN ; SCLOCCO, ALESSIO ; et al.
ACM Computing Surveys. Nov2023, Vol. 55 Issue 11, p1-81. 81p.
Umpalumpa: a framework for efficient execution of complex image processing workloads on heterogeneous nodes.
Střelák, David ; Myška, David ; Petrovič, Filip ; et al.
Computing. Nov2023, Vol. 105 Issue 11, p2389-2417. 29p.
Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations.
Wen, Mei ; Huang, Da-fei ; Xun, Chang-qing ; et al.
Frontiers of Information Technology & Electronic Engineering; Nov2015, Vol. 16 Issue 11, p899-916, 18p
Hardware Design of DRAM Memory Prefetching Engine for General-Purpose GPUs.
Gabbay, Freddy ; Salomon, Benjamin ; Golan, Idan ; et al.
Technologies (2227-7080); Oct2025, Vol. 13 Issue 10, p455, 31p
A Low-latency On-chip Cache Hierarchy for Load-to-use Stall Reduction in GPUs.
MAHANI, NEGIN (SADAT) (NEMATOLLAHI) ; FALAHATI, HAJAR ; DARABI, SINA ; et al.
ACM Transactions on Architecture & Code Optimization; Sep2025, Vol. 22 Issue 3, p1-27, 27p
Preserving provability over GPU program optimizations with annotation-aware transformations.
Şakar, Ömer ; Safari, Mohsen ; Huisman, Marieke ; et al.
Formal Methods in System Design; Dec2025, Vol. 67 Issue 3, p316-372, 57p
Optimizing OpenCL Barrier Synchronization and Memory Efficiency on Multi-Core DSPs.
Gao, Wanrong ; Fang, Jianbin ; Zhang, Peng ; et al.
ACM Transactions on Architecture & Code Optimization; Dec2025, Vol. 22 Issue 4, p1-26, 26p
On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures.
Morgan, Nathaniel ; Yenusah, Caleb ; Diaz, Adrian ; et al.
Information; Nov2024, Vol. 15 Issue 11, p673, 24p
Optimizing General Sparse Matrix-Matrix Multiplication on the GPU.
WANG, YIZHUO ; LIN, HONGPENG ; WEI, BINGXIN ; et al.
ACM Transactions on Architecture & Code Optimization; Dec2025, Vol. 22 Issue 4, p1-25, 25p
Deep learning data handling: exploring file formats and access strategies.
Parraga, Edixon ; Leon, Betzabeth ; Mendez, Sandra ; et al.
Cluster Computing; Oct2025, Vol. 28 Issue 9, p1-23, 23p
Cross-core Data Sharing for Energy-efficient GPUs.
FALAHATI, HAJAR ; SADROSADATI, MOHAMMAD ; QIUMIN XU ; et al.
ACM Transactions on Architecture & Code Optimization; Sep2024, Vol. 21 Issue 3, p1-32, 32p
DCSolver: Accelerating Sparse Iterative Solvers via Divide-and-Conquer on GPUs.
HAOZHONG QIU ; CHUANFU XU ; JIANBIN FANG ; et al.
ACM Transactions on Architecture & Code Optimization; Sep2025, Vol. 22 Issue 3, p1-25, 25p
Research on Malodor Component Identification Based on Sensor Array.
Xie, Jiaxing ; Chen, Wen ; Chen, Shiyun ; et al.
Sensors (14248220); Jul2025, Vol. 25 Issue 13, p3857, 20p
I/O Access Patterns in HPC Applications: A 360-Degree Survey.
BEZ, JEAN LUCA ; BYNA, SUREN ; IBRAHIM, SHADI
ACM Computing Surveys. Feb2024, Vol. 56 Issue 2, p1-41. 41p.
SNCL: a supernode OpenCL implementation for hybrid computing arrays.
Tang, Tao ; Lu, Kai ; Peng, Lin ; et al.
Journal of Supercomputing; May2024, Vol. 80 Issue 7, p9471-9493, 23p
- 1
- 2