-
Y. Chen et al., Optimizing NVMe Storage for Large-scale Deployment: Key Technologies and Strategies in Alibaba Cloud, in IEEE Micro, doi: 10.1109/MM.2024.3426514.
-
Nan Wu, Yingjie Li, Hang Yang, Hanqiu Chen, Steve Dai, Cong Hao, Cunxi Yu, and Yuan Xie. 2024. Survey of Machine Learning for Software-assisted Hardware Design Verification: Past, Present, and Prospect. ACM Trans. Des. Autom. Electron. Syst. 29, 4, Article 59 (July 2024), 42 pages. https://doi.org/10.1145/3661308
-
Zhaodong Chen, Weiqin Zhao, Lei Deng, Yufei Ding, Qinghao Wen, Guoqi Li, Yuan Xie, Large-scale self-normalizing neural networks, Journal of Automation and Intelligence, Volume 3, Issue 2, 2024, Pages 101-110, ISSN 2949-8554, https://doi.org/10.1016/j.jai.2024.05.001.
-
Zhaodong Chen, Andrew Kerr, Richard Cai, Jack Kosaian, Haicheng Wu, Yufei Ding, and Yuan Xie. 2024. EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS ‘24), Vol. 3. Association for Computing Machinery, New York, NY, USA, 301–316. https://doi.org/10.1145/3620666.3651369
-
H. Lin et al., A Comprehensive Survey on Distributed Training of Graph Neural Networks, in Proceedings of the IEEE, vol. 111, no. 12, pp. 1572-1606, Dec. 2023, doi: 10.1109/JPROC.2023.3337442.
-
Guyue Huang, Zhengyang Wang, Po-An Tsai, Chen Zhang, Yufei Ding, and Yuan Xie. 2023. RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ‘23). Association for Computing Machinery, New York, NY, USA, 338–352. https://doi.org/10.1145/3613424.3623775
-
Zheng Qu, Dimin Niu, Shuangchen Li, Hongzhong Zheng, and Yuan Xie. 2023. TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ‘23). Association for Computing Machinery, New York, NY, USA, 452–464. https://doi.org/10.1145/3613424.3614305
-
C. Bai et al., Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators, 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA, 2023, pp. 1-9, doi: 10.1109/ICCAD57390.2023.10323893.
-
Chen Bai, Jiayi Huang, Xuechao Wei, Yuzhe Ma, Sicheng Li, Hongzhong Zheng, Bei Yu, and Yuan Xie. 2023. ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ‘23). Association for Computing Machinery, New York, NY, USA, 268–282. https://doi.org/10.1145/3613424.3614289
-
Xinfeng Xie, Peng Gu, Yufei Ding, Dimin Niu, Hongzhong Zheng, and Yuan Xie. 2023. MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing. ACM Trans. Archit. Code Optim. 20, 3, Article 40 (September 2023), 26 pages. https://doi.org/10.1145/3603113
-
Y. Wang et al., Accelerating Distributed GNN Training by Codes, in IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 9, pp. 2598-2614, Sept. 2023, doi: 10.1109/TPDS.2023.3295184.
-
G. Wu et al., E-Booster: A Field-Programmable Gate Array-Based Accelerator for Secure Tree Boosting Using Additively Homomorphic Encryption, in IEEE Micro, vol. 43, no. 5, pp. 88-96, Sept.-Oct. 2023, doi: 10.1109/MM.2023.3293845.
-
N. Wu, Y. Li, C. Hao, S. Dai, C. Yu and Y. Xie, Gamora: Graph Learning based Symbolic Reasoning for Large-Scale Boolean Networks, 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6, doi: 10.1109/DAC56929.2023.10247828.
-
A. Ren et al., HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference, 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6, doi: 10.1109/DAC56929.2023.10247785.
-
X. Ren et al., CHAM: A Customized Homomorphic Encryption Accelerator for Fast Matrix-Vector Product, 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6, doi: 10.1109/DAC56929.2023.10247696.
-
Siqi Li, Fengbin Tu, Liu Liu, Jilan Lin, Zheng Wang, Yangwook Kang, Yufei Ding, and Yuan Xie. 2023. ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification. In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA ‘23). Association for Computing Machinery, New York, NY, USA, Article 58, 1–14. https://doi.org/10.1145/3579371.3589093
-
Huang, G., Bai, Y., Liu, L., Wang, Y., Yu, B., Ding, Y., & Xie, Y. (2023). ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs. Proceedings of Machine Learning and Systems, 5 (MLSys 2023).
-
Z. Zhu et al., MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures, in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 11, pp. 4112-4125, Nov. 2023, doi: 10.1109/TCAD.2023.3251696.
-
Liang, L., Lin, J., Qu, Z., Ahmad, I., Tu, F., Gupta, T., Ding, Y., & Xie, Y. (2023). SPG: Structure-Private Graph Database via SqueezePIR. Proceedings of the VLDB Endowment, 16(7), 1615–1628. https://doi.org/10.14778/3587136.3587158
-
Zhaodong Chen, Zheng Qu, Yuying Quan, Liu Liu, Yufei Ding, and Yuan Xie. 2023. Dynamic N:M Fine-Grained Structured Sparse Attention Mechanism. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP ‘23). Association for Computing Machinery, New York, NY, USA, 369–379. https://doi.org/10.1145/3572848.3577500
-
B. Shi et al., Efficient Super-Resolution System With Block-Wise Hybridization and Quantized Winograd on FPGA, in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 11, pp. 3910-3924, Nov. 2023, doi: 10.1109/TCAD.2023.3247621.
-
Zhiyao Li, Jiaxiang Li, Taijie Chen, Dimin Niu, Hongzhong Zheng, Yuan Xie, and Mingyu Gao. 2023. Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 747–761. https://doi.org/10.1145/3575693.3575706
-
Z. Du et al., Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio, 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS), Nanjing, China, 2023, pp. 483-490, doi: 10.1109/ICPADS56603.2022.00069.
-
Jin Lin, Xiaotong Luo, Ming Hong, Yanyun Qu, Yuan Xie, Zongze Wu; Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 14398-14407