# References Adhianto, Laksono, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R. Tallent. 2010. "HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs." *Concurrency and Computation: Practice and Experience* 22 (6): 685--701. https://doi.org/. Adhianto, Laksono, John Mellor-Crummey, and Nathan R. Tallent. 2010. "Effectively Presenting Call Path Profiles of Application Performance." In *PSTI 2010: Workshop on Parallel Software Tools and Tool Infrastructures, in Conjunction with the 2010 International Conference on Parallel Processing*. Advanced Micro Devices. n.d. "ROCm Tracer Callback/Activity Library for Performance tracing AMD GPU's." Anderson, Jonathon, Yumeng Liu, and John Mellor-Crummey. 2022. "Preparing for Performance Analysis at Exascale." In *Proceedings of the 36th ACM International Conference on Supercomputing*. ICS '22. New York, NY, USA: Association for Computing Machinery. . Coarfa, Cristian, John Mellor-Crummey, Nathan Froyd, and Yuri Dotsenko. 2007. "Scalability Analysis of SPMD Codes Using Expectations." In *ICS '07: Proc. Of the 21st International Conference on Supercomputing*, 13--22. New York, NY, USA: ACM. https://doi.org/. Corporation, NVIDIA. 2019. "PC Sampling." . Froyd, Nathan, John Mellor-Crummey, and Rob Fowler. 2005. "Low-Overhead Call Path Profiling of Unmodified, Optimized Code." In *Proc. Of the 19th International Conference on Supercomputing*, 81--90. New York, NY, USA: ACM. https://doi.org/. Lawrence Livermore National Laboratory. n.d.a. "Laghos: High-order Lagrangian Hydrodynamics Miniapp." n.d.b. "Quicksilver: A Proxy App for the Monte Carlo Transport Code, Mercury." Libpfm4. 2008. "Libpfm4: A Helper Library for Performance Tools Using Hardware Counters." . McKenney, Paul E. 1999. "Differential Profiling." *Software: Practice and Experience* 29 (3): 219--34. https://doi.org/[http://dx.doi.org/10.1002/(SICI)1097-024X(199903)29:3\<219::AID-SPE230>3.0.CO;2-0](). Mytkowicz, Todd, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. "Producing Wrong Data Without Doing Anything Obviously Wrong!" *SIGARCH Comput. Archit. News* 37 (1): 265--76. . NVIDIA Corporation. 2019. \*\*. Rice University. n.d. "HPCToolkit Performance Tools." . Tallent, Nathan R., Laksono Adhianto, and John M. Mellor-Crummey. 2010. "Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles." In *SC '10: Proc. Of the 2010 ACM/IEEE Conference on Supercomputing*, 1--11. Washington, DC, USA: IEEE Computer Society. https://doi.org/. Tallent, Nathan R., and John Mellor-Crummey. 2009. "Effective Performance Measurement and Analysis of Multithreaded Applications." In *PPoPP '09: Proc. Of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*, 229--40. New York, NY, USA: ACM. https://doi.org/. Tallent, Nathan R., John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. 2009. "Diagnosing Performance Bottlenecks in Emerging Petascale Applications." In *SC '09: Proc. Of the 2009 ACM/IEEE Conference on Supercomputing*, 1--11. New York, NY, USA: ACM. https://doi.org/. Tallent, Nathan R., John M. Mellor-Crummey, Michael Franco, Reed Landrum, and Laksono Adhianto. 2011. "Scalable Fine-Grained Call Path Tracing." In *ICS '11: Proc. Of the 25th International Conference on Supercomputing*, 63--74. New York, NY, USA: ACM. https://doi.org/. Tallent, Nathan R., John M. Mellor-Crummey, and Allan Porterfield. 2010. "Analyzing Lock Contention in Multithreaded Applications." In *PPoPP '10: Proc. Of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*, 269--80. New York, NY, USA: ACM. https://doi.org/. Tallent, Nathan R., John Mellor-Crummey, and Michael W. Fagan. 2009. "Binary Analysis for Measurement and Attribution of Program Performance." In *PLDI '09: Proc. Of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation*, 441--52. New York, NY, USA: ACM. https://doi.org/. Tallent, Nathan, John Mellor-Crummey, Laksono Adhianto, Mike Fagan, and Mark Krentel. 2008. "HPCToolkit: Performance Tools for Scientific Computing." *Journal of Physics: Conference Series* 125: 012088 (5pp). .