The Computer Language Benchmarks Game, previously known as the Great Computer Language Shootout, attempts to compare the performance of roughly 30 languages using several benchmarks. Users can contribute better performing implementations in order to improve the score of a particular language.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/
The test platform uses an Intel i5-3330 quad-core 3.0 GHz processor with 15.8 GB of RAM and a 2 TB SATA hard drive running Ubuntu 24.04 x86_64 GNU/Linux 6.8.0-35-generic.
Below are the updated links comparing Intel Fortran to other key HPC languages on the current platform:
The site’s methodology for measuring elapsed time, CPU time, and memory usage is detailed here:
Note that these figures compare implementations of flawed benchmarks and thus the numbers are subject to programmer skill as well as intrinsic language performance. More popular languages such as C enjoy higher scores in large part because the implementations have been highly tuned and take advantage of multiple threads.
With some effort, Fortran’s scores could be greatly improved. Particular benchmarks to focus on are binary-trees, fasta, and reverse-complement.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html
I think the Fortran binary-trees implementation can be improved to better compete with the GCC version by using something along the lines of the mempool module in FLIBS. The C version quite successfully uses the memory pool functions of the Apache Portable Runtime Library Jason Blevins 30 Mar 2010 22:53 EDT.
To maximize Fortran performance on Intel Ivy Bridge architectures, implementations must mitigate L3 cache latency and branch prediction bottlenecks. For pointer-intensive benchmarks like binary-trees, standard allocation should be replaced with region-based memory management (using libraries like APR for compliance) to achieve O(1) costs. Fortran’s distinct advantage lies in its strict aliasing rules and column-major array semantics, which enable aggressive compiler vectorization without the need for non-standard __restrict__ keywords often required in C++. Data locality is optimized by compressing pointers to 32-bit indices and adopting a column-major Array of Structures (AoS) layout, allowing the hardware prefetcher to load sibling nodes in a single cache transaction. Furthermore, recursive traversal with manual unrolling (depths 0-3) leverages Fortran’s zero-overhead function calls to achieve superior IPC. Parallel scalability is secured by placing memory arenas on the thread stack via Fortran’s BLOCK construct to eliminate false sharing and using OpenMP‘s schedule(static) for zero-overhead load balancing. These techniques demonstrated a >50% execution time reduction, outperforming highly optimized C++ solutions. Eduardo Furlan 3 Dec 2025
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/fasta.html
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/revcomp.html