AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking

Gao, Wanling; Luo, Chunjie; Wang, Lei; Xiong, Xingwang; Chen, Jianan; Hao, Tianshu; Jiang, Zihan; Fan, Fanda; Du, Mengjia; Huang, Yunyou; Zhang, Fan; Wen, Xu; Zheng, Chen; He, Xiwen; Dai, Jiahui; Ye, Hainan; Cao, Zheng; Jia, Zhen; Zhan, Kent; Tang, Haoning; Zheng, Daoyi; Xie, Biwei; Li, Wei; Wang, Xiaoyu; Zhan, Jianfeng

doi:10.1007/978-3-030-32813-9_1

Wanling Gao^10,11,13,
Chunjie Luo^10,11,13,
Lei Wang^10,11,13,
Xingwang Xiong^10,13,
Jianan Chen^10,13,
Tianshu Hao^10,13,
Zihan Jiang^10,13,
Fanda Fan^10,13,
Mengjia Du^10,13,
Yunyou Huang^10,13,
Fan Zhang¹⁰,
Xu Wen^10,13,
Chen Zheng^10,11,13,
Xiwen He¹⁰,
Jiahui Dai^11,12,
Hainan Ye^11,12,
Zheng Cao¹⁴,
Zhen Jia¹⁵,
Kent Zhan¹⁶,
Haoning Tang¹⁷,
Daoyi Zheng¹⁸,
Biwei Xie¹⁹,
Wei Li²⁰,
Xiaoyu Wang²¹ &
…
Jianfeng Zhan^10,11,13

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11459))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

1781 Accesses
22 Citations

Abstract

AI benchmarking provides yardsticks for benchmarking, measuring and evaluating innovative AI algorithms, architecture, and systems. Coordinated by BenchCouncil, this paper presents our joint research and engineering efforts with several academic and industrial partners on the datacenter AI benchmarks—AIBench. The benchmarks are publicly available from http://www.benchcouncil.org/AIBench/index.html. Presently, AIBench covers 16 problem domains, including image classification, image generation, text-to-text translation, image-to-text, image-to-image, speech-to-text, face embedding, 3D face recognition, object detection, video prediction, image compression, recommendation, 3D object reconstruction, text summarization, spatial transformer, and learning to rank, and two end-to-end application AI benchmarks. Meanwhile, the AI benchmark suites for high performance computing (HPC), IoT, Edge are also released on the BenchCouncil web site. This is by far the most comprehensive AI benchmarking research and engineering effort.

Access provided by Autonomous University of Puebla. Download conference paper PDF

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Article Open access 10 November 2022

Face-Specific Data Augmentation for Unconstrained Face Recognition

Article 01 April 2019

Keywords

1 Introduction

AIBench provides a scalable and comprehensive datacenter AI benchmark suite. In total, it includes 12 micro benchmarks, 16 component benchmarks, covering 16 AI problem domains: image classification, image generation, text-to-text translation, image-to-text, image-to-image, speech-to-text, face embedding, 3D face recognition, object detection, video prediction, image compression, recommendation, 3D object reconstruction, text summarization, spatial transformer, learning to rank, and two end-to-end application AI benchmarks: DCMix [1]—a datacenter AI application combination mixed with AI workloads, and E-commerce AI—an end-to-end business AI benchmark. The details of AIBench is introduced in our technical report [2].

We provide both training and inference benchmarks. The training metrics are the wall clock time to train the specific epochs, the wall clock time to train a model achieving a target accuracy [3], and the energy consumption to train a model achieving a target accuracy [3]. The inference metrics are the wall clock time, accuracy, and energy consumption. Additionally, the performance numbers are reported on the BenchCouncil web site (http://www.benchcouncil.org/numbers.html), to measure the training and inference speeds of different hardware platforms, including multiple types of NIVDIA GPUs, Intel CPUs, AI accelerator chips, and to measure the performance of different software stacks, including TensorFlow, PyTorch, and etc.

Using the benchmarks from AIBench, BenchCouncil is organizing the 2019 BenchCouncil International AI System and Algorithm Competition, including four tracks: AI System Competitions on RISC-V—an open-source chip, Cambricon—an AI accelerator Chip, and X86 processors, and 3D Face Recognition Algorithm Competition sponsored by Intellifusion.

2 Related Work

Much previous work focuses on datacenter AI benchmarking. Table 1 summarizes the differences between AIBench and the state-of-the-art and state-of-the-practise datacenter AI benchmarks. Previous work like MLPerf [4], Fathom [5], DAWNBench [3], and TBD suite [6] only targets at component benchmarks, while lacking of the micro and application benchmarks. On the contrary, benchmarks like DeepBench [7] and DNNMark [8] only provide several micro benchmarks, while lacking of the component and application benchmarks. Thus, previous work adopts a narrow vision of datacenter AI scenario, and fails to propose a comprehensive AI benchmark suite.

AIBench includes a series of micro, component and application benchmarks to benchmark the AI systems, architectures, and algorithms. Also, a wide variety of data types and data sources are covered, including text, images, street scenes, audios, videos, etc. The workloads are implemented not only based on mainstream deep learning frameworks like TensorFlow and PyTorch, but also based on traditional programming model like Pthreads, to conduct an apple-to-apple comparison. Meanwhile, the HPC AI benchmarks [9], IoT AI benchmarks [10], Edge AI benchmarks [11], and big data benchmarks [12,13,14] are also released on the BenchCouncil web site.

Table 1. The Summary of different AI Benchmarks.

Full size table

3 Datacenter AI Benchmark Suite—AIBench

Totally, AIBench covers 16 representative real-world data sets widely used in AI scenario and provides 12 AI micro benchmarks and 16 AI component benchmarks. Among them, each micro benchmark provides a neural network kernel implementation, consisting of a single unit of computation [15]; Each component benchmark provides a full neural network model to solve multiple tasks, each of which is a combination of multiple units of computation; Each application benchmark provides an end-to-end application scenario.

3.1 Datacenter AI Micro Benchmarks

Micro benchmarks in AIBench abstracts units of computation among a majority of AI algorithms, and covers 12 units of computation in total. The micro benchmarks are convolution, fully connected, relu, sigmoid, tanh, maximum pooling, average pooling, cosine normalization, batch normalization, dropout, element-wise operation, and softmax.

3.2 Datacenter AI Component Benchmarks

Component benchmarks in AIBench cover 16 problem domains and contain both training and inference. For both training and inference, TensorFlow and PyTorch implementations are provided.

Image classification uses ResNet neural network [16] and uses ImageNet [17] as data input to solve image classification task.

Image generation uses WGAN [18] algorithms and uses LSUN [19] dataset as data input to generate image data.

Text-to-Text Translation uses recurrent neural networks [20] and takes WMT English-German [21] as data input to translate text data.

Image-to-Text uses Neural Image Caption [22] model and takes Microsoft COCO dataset [23] as input to describe image using text.

Image-to-Image uses the cycleGAN [24] algorithm and takes Cityscapes [25] dataset as input to transform the image to another image.

Speech-to-Text uses the DeepSpeech2 [26] algorithm and takes Librispeech [27] dataset as input to recognize the speech data.

Face embedding uses the FaceNet [28] algorithm and takes the LFW (Labeled Faces in the Wild) dataset [29] or VGGFace2 [30] as input to convert image to an embedding vector.

3D face recognition uses 3D face modes to recognize 3D information within images. The input data includes 77,715 samples from 253 face IDs, which is published on the BenchCouncil web site.

Object detection uses the Faster R-CNN [31] algorithm and takes Microsoft COCO dataset [23] as input to detect objects in images.

Recommendation uses collaborative filtering algorithm and takes MovieLens dataset [32] as input to provide recommendations.

Video prediction uses motion-focused predictive models [33] and takes Robot pushing dataset [33] as input to predict video frames.

Image compression uses recurrent neural networks and takes ImageNet dataset as input to compression images.

3D object reconstruction uses a convolutional encoder-decoder network and takes ShapeNet Dataset [34] as input to reconstruct 3D object.

Text summarization uses sequence-to-sequence model [35] and takes Gigaword dataset [36] as input to generate summary description for text.

Spatial transformer uses spatial transformer networks and takes MNIST dataset [37] as input to make spatial transformations.

Learning to Rank uses ranking distillation algorithm [38] and uses Gowalla dataset [39] to generate ranking scores.

3.3 Application Benchmarks

The suite also provides two end-to-end application benchmarks: DCMix [1]—mixed datacenter workloads, and E-commerce AI—an end-to-end business AI benchmark. Among them, DCMix is to model the datacenter application scenario, and generate mixed workloads with different latencies, including AI workloads (i.e., image recognition, speech recognition), online service (e.g., Online search), etc.

E-commerce AI is to mimic complex modern Internet services workloads, which is a joint work with Alibaba. An AI-based recommendation module is included.

3.4 AI Competition

Using the benchmark implementations from AIBench as the baselines, BenchCouncil is organizing the International AI System and Algorithm Competition, advancing the state-of-the-art or state-of-the-practice algorithms on different systems or architecture, like X86, Cambricon, RISC-V, and GPU. This year, there are four tracks, including AI System Competition based on RISC-V, Cambricon, and X86 chips, and Intellifusion 3D Face Recognition Algorithm Competition. The competition information is publicly available from http://www.benchcouncil.org/competition/index.html. Any companies and research institutes are welcomed to join and organize a competition track each year.

Among the four tracks., RISC-V and Cambricon-based AI System Competitions are to implement and optimize image classification on RISC-V and Cambricon, respectively. The X86-based AI System Competition is to implement and optimize the recommendation algorithm. The algorithm Competition is to develop innovative algorithms for 3D Face Recognition.

4 Conclusion

This paper proposes a comprehensive datacenter AI benchmarks—AIBench, covering 12 micro benchmarks, 16 component benchmarks, and 2 end-to-end application benchmarks. The benchmark suite is publicly available from http://www.benchcouncil.org/AIBench/index.html .

References

Xiong, X., et al.: DCMIX: generating mixed workloads for the cloud data center. In: BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18) (2018)
Google Scholar
Gao, W., et al.: An industry standard internet service AI benchmark suite. Technical report, AIBench (2019)
Google Scholar
Coleman, C., et al.: Dawnbench: an end-to-end deep learning benchmark and competition. Training 100(101), 102 (2017)
Google Scholar
Mlperf. https://mlperf.org
Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., Brooks, D.: Fathom: reference workloads for modern deep learning methods. In: Workload Characterization (IISWC), pp. 1–10 (2016)
Google Scholar
Zhu, H., et al.: TBD: Benchmarking and analyzing deep neural network training arXiv preprint arXiv:1803.06905 (2018)
Deepbench. https://svail.github.io/DeepBench/
Dong, S., Kaeli, D.: DNNMark: a deep neural network benchmark suite for GPUs. In: Proceedings of the General Purpose GPUs, pp. 63–72. ACM (2017)
Google Scholar
Jiang, Z., et al.: HPC AI500: a benchmark suite for HPC AI systems. In: 2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18) (2018)
Google Scholar
Luo, C., et al.: AIoT Bench: towards comprehensive benchmarking mobile and embedded device intelligence. In: 2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18) (2018)
Google Scholar
Hao, T., et al.: Edge AIBench: towards comprehensive end-to-end edge computing benchmarking. In: 2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18) (2018)
Google Scholar
Gao, W., et al.: BigDataBench: a scalable and unified big data and AI benchmark suite. arXiv preprint arXiv:1802.08254 (2018)
Wang, L., et al.: BigDataBench: a big data benchmark suite from internet services. In: IEEE International Symposium On High Performance Computer Architecture (HPCA) (2014)
Google Scholar
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 66–76. IEEE (2013)
Google Scholar
Gao, W., et al.: Data Motifs: a lens towards fully understanding big data and AI workloads. In: 2018 27th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN arXiv preprint arXiv:1701.07875 (2017)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop arXiv preprint arXiv:1506.03365 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
https://nlp.stanford.edu/projects/nmt/
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International conference on machine learning, pp. 173–182 (2016)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TiiS) 5(4), 19 (2016)
Google Scholar
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository arXiv preprint arXiv:1512.03012 (2015)
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond arXiv preprint arXiv:1602.06023 (2016)
Rush, A.M., Harvard, S., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: ACLWeb. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2017)
Google Scholar
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database, AT&T Labs, vol. 2, p. 18 (2010). http://yann.lecun.com/exdb/mnist
Tang, J., Wang, K.: Ranking distillation: learning compact ranking models with high performance for recommender system. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2289–2298. ACM (2018)
Google Scholar
Gowalla dataset. https://snap.stanford.edu/data/loc-gowalla.html

Download references

Acknowledgment

This work is supported by the Standardization Research Project of Chinese Academy of Sciences No.BZ201800001.

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wanling Gao, Chunjie Luo, Lei Wang, Xingwang Xiong, Jianan Chen, Tianshu Hao, Zihan Jiang, Fanda Fan, Mengjia Du, Yunyou Huang, Fan Zhang, Xu Wen, Chen Zheng, Xiwen He & Jianfeng Zhan
BenchCouncil (International Open Benchmark Council), Dover, Delaware, USA
Wanling Gao, Chunjie Luo, Lei Wang, Chen Zheng, Jiahui Dai, Hainan Ye & Jianfeng Zhan
Beijing Academy of Frontier Sciences and Technology, Beijing, China
Jiahui Dai & Hainan Ye
University of Chinese Academy of Sciences, Beijing, China
Wanling Gao, Chunjie Luo, Lei Wang, Xingwang Xiong, Jianan Chen, Tianshu Hao, Zihan Jiang, Fanda Fan, Mengjia Du, Yunyou Huang, Xu Wen, Chen Zheng & Jianfeng Zhan
Alibaba, Hangzhou, China
Zheng Cao
Princeton University, Princeton, USA
Zhen Jia
Wuba, Zhuxi, China
Kent Zhan
Tencent, Shenzhen, China
Haoning Tang
Baidu, Beijing, China
Daoyi Zheng
China RISC-V Alliance, Beijing, China
Biwei Xie
Cambricon, Shenzhen, China
Wei Li
Intellifusion, Shenzhen, China
Xiaoyu Wang

Authors

Wanling Gao
View author publications
You can also search for this author in PubMed Google Scholar
Chunjie Luo
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingwang Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Jianan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tianshu Hao
View author publications
You can also search for this author in PubMed Google Scholar
Zihan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Fanda Fan
View author publications
You can also search for this author in PubMed Google Scholar
Mengjia Du
View author publications
You can also search for this author in PubMed Google Scholar
Yunyou Huang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Wen
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiwen He
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Hainan Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Jia
View author publications
You can also search for this author in PubMed Google Scholar
Kent Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Haoning Tang
View author publications
You can also search for this author in PubMed Google Scholar
Daoyi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Biwei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Zhan .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Chen Zheng
Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W. et al. (2019). AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-32813-9_1
Published: 08 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking

Abstract

Similar content being viewed by others

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Face-Specific Data Augmentation for Unconstrained Face Recognition

Keywords

1 Introduction

2 Related Work

3 Datacenter AI Benchmark Suite—AIBench

3.1 Datacenter AI Micro Benchmarks

3.2 Datacenter AI Component Benchmarks

3.3 Application Benchmarks

3.4 AI Competition

4 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking

Abstract

Similar content being viewed by others

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Face-Specific Data Augmentation for Unconstrained Face Recognition

Keywords

1 Introduction

2 Related Work

3 Datacenter AI Benchmark Suite—AIBench

3.1 Datacenter AI Micro Benchmarks

3.2 Datacenter AI Component Benchmarks

3.3 Application Benchmarks

3.4 AI Competition

4 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation