To realize better bandwidth efficiency and powerful correcting capability, storage devices and optical networks demand an extremely high performing forward error correction (FEC). Graphics Processing Unit (GPU) was first introduced by Falcao et al.(2008), even though the GPU then were based on low parity density check (LDPC) decoding which was considered to outperform application specific integrated circuit (ASIC) based hardware. Consequently, researchers have tried to improve the GPU by integrating more the LDPC GPU with more efficient parallel algorithms and optimization techniques (Falcao et al., 2008, 2009; W. Sung et al., 2011). About ten years ago, GPU was mainly designed specifically for graphics processing and rendering. However, they are becoming more use currently. Computers possessing several GPUs or huge amounts of GPUs have become a powerful architecture with the capability of performing the work of general purpose computation, or even supercomputing. (FDCT).
It should be noted that the field of application of GPU varies as a result of wide range of work and high productivity. GPUs that possess parallel processing elements are often used for intensive computational science and engineering applications. As a result of high productivity and wide array of tasks performed by GPU, it is essential to research on energy consumption that is involved in computing process and where possible it will be crucial to find out algorithms that can be applied to cut down expenses. The tool that is used to measure energy consumption that is powered by GPU is current sensor, a microcontroller unit (ST) (Huzmiev & Chipirov, 2016).
One of the major hurdle expected in the upcoming years in realizing exascale result is immoderate power consumption within a fair power budget. GPUs have been found to be an essential element in the realization of exascale computing. The reason for this is that GPU has finer grain, high parallel architecture and advancements in power and performance efficiency (Al-Hashimi et al., 2017).
In this study, we try to establish different architecture to investigate power consumption and energy consumption. We will consider Bitonic Mergesort (BM), algorithm. Of key interest will be to establish the various functioning modes of GPU and the energy consumed as a result of the number of operating computing units.
Research Summary
Various studies have been conducted on the issue of performance and consumption of energy by various researchers. For instance, Ukidave et. al conducted a research that analyzed power efficiency of several optimization techniques often used in heterogeneous platform. The mainly focused on discrete GPUs, shared memory GPUs, lower power system on chip(SoC) equipment. Their study established that architectural and algorithmic factors have an effect on consumption of energy. Moreover, Coplin and Burtscher conducted a research on the same. They mainly tried to investigate and differentiate power profiles of irregular and regular programs which were running on K20 GPU. They expounded their research by also trying to establish the effect on power profile by varying GPUs core and memory frequencies using alternating implementations of the same algorithm and varying inputs to the program. From their study, they were able to deduce that power must be considered as a function of time and that it has to be examined again for each input and after change in the code.
First, we consider the effect of a number of operating computing units and consumption of energy powered by GPU. A study conducted by Huzmiev and Chipirov (2016) sought to establish the relationship between the two variables. The researchers realized that consumption of energy was dependent on the number of blocks and video chips. They also established a significant decrease to 17 percent of energy consumption during their entire experiment as a result of the maximum use of GPU resources during their entire experiment.
Jeremy W. et.al conducted a research on thermal management for GPUs architecture. In their study, they used Qsilver to find out possible thermal management methods. Using Qsilver with GPU in mind, they established that it depended on a streaming infrastructure and more conventional intra block communication. They exploited techniques that used parallelism of graphics workloads mainly multiple clock domains and temperature aware floor plans. They also used other techniques such as classical dynamic voltage scaling (DVS), and clock gating. Several important deductions were made from their study on power consumption management. They established that on top of Qsilver framework, they were able to implement several thermal management voltage techniques which include: fetch gating, dynamic voltage scaling, thermal-aware floor planning, global clock gating and multiple clock domains to mention a few which they sought to establish their efficiency. They established that the voltage scaling techniques such as multi-clock domains and dynamic voltage scaling were more efficient as they produced better results in those domains compared to the more primitive gating techniques such as global clock gating.
Al-Hashimi et al. (2017) decided to investigate on power consumption, energy consumption and kernel runtime of Bitonic Mergesort under various workloads on NVIDIA K40 GPU. Bitonic Mergesort is an algorithm which depends on data and can be used as a method to come up with a sorting network. Bitonic Mergesort has been found to be sufficient for generic parallel architectures as it can operate in place, minimal interprocess communication, and is logically fit for purpose for single instruction, multiple data (SIMD) architectures. (Kipfer & Westermann, 2005). Bitonic Mergesort is expected to be more power and energy efficient as compared to a data-driven algorithm that is highly optimized. In their experiment, they assumed an approach that can be used to assess the correct power consumption and energy consumption of any Kernels operating on Kepler GPUs
Diagram to illustrate the topic
Â
Conclusion
The study is about energy consumption powered by Graphics Processing Units (GPU). There has been a growing concern that in the upcoming years, increased consumption of power is expected to be a stumbling block to achieving exascale performance while being in a reasonable power budget. As a result, the issue of power and energy consumption has been addressed by evaluating on important software building blocks. For instance, the study has sought to establish effects of algorithms such as Bitonic Mergesort which is data dependent and its effect on energy consumption. The study has also sought to establish effects of some operating computing unit on energy consumption.
Other studies have also established that energy consumption is also dependent on the number of operating computing units. Moreover, studies by Jeremy W et al. compared several voltage scaling techniques that were efficient in thermal management. They were able to establish that come voltage scaling techniques such as multi-clock domains and dynamic voltage scaling was more efficient in thermal management as they produced better results compared to some primitive techniques such as global clock gating.
Â
References
Al-Hashimi, M. A., Abulnaja, O. A., Saleh, M. E., & Ikram, M. J. (2017). Evaluating Power and Energy Efficiency of Bitonic Mergesort on Graphics Processing Unit. IEEE Access, 5, 16429-16440.
Burtscher, M., & Coplin, J. Power Characteristics of Irregular GPGPU Programs
Falcao, G., Sousa, L., & Silva, V. (2008, February). Massive parallel LDPC decoding on GPU. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (pp. 83-90). ACM.
Falcao, G., Silva, V., & Sousa, L. (2009, June). How GPUs can outperform ASICs for fast LDPC decoding. In Proceedings of the 23rd international conference on Supercomputing (pp. 390-399). ACM.
Huzmiev, I. K., & Chipirov, Z. A. (2016, May). Energy consumption powered by graphics processing units (GPU) in response to the number of operating computing unit. In Industrial Engineering, Applications and Manufacturing (ICIEAM), International Conference on (pp. 1-4). IEEE.
Ji, H., Cho, J., & Sung, W. (2011). Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU. Journal of Signal Processing Systems, 64(1), 149-159.
Sheaffer, J. W., Skadron, K., & Luebke, D. P. (2005, March). Studying thermal management for graphics-processor architectures. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on (pp. 54-65). IEEE.
Ukidave, Y., Ziabari, A. K., Mistry, P., Schirner, G., & Kaeli, D. (2014). Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms. The International Journal of High-Performance Computing Applications, 28(3), 319-334.
Â
Request Removal
If you are the original author of this essay and no longer wish to have it published on the thesishelpers.org website, please click below to request its removal:
- Why Free Trade Is Better Than Fair Trade - Research Paper Example
- Financial Collapse and the TARP Program
- Passenger Coaches for Urban and Peri-urban Markets - Essay Example
- Overview and Causes of the Recession - Literature Review Example
- Paper Example on Macroeconomics
- Essay Example: Regulation and Competition of the Pharmaceutical Market
- The Economics of Health and Health Care - Essay Sample