In-depth timing of entire applications is best suited to profiling tools such as the commonly available gprof(1) command and vendor-supplied tooling and utilities designed for the purpose of timing code.
However, such tools and their required infrastructure can be inappropriate or overkill for many situations.
So for those times where do-it-yourself timing is the better choice you can start with the module presented here which calls the following standard Fortran intrinsics to provide basic macro-level timing:
These routines are implementation-dependent and often may not be a good choice for testing timing of short events. On the other hand:
During development timing information helps identify which regions of code create the greatest benefit if optimized further.
Timing can be used portably in testing frameworks to capture and identify large changes in performance between releases and different compilers and help identify compiler switch effects on efficiency.
Timing-related intrinsics are available on virtually all platforms, unlike most higher-level profiling packages.
Simple timing procedures are a great tool for timing different versions of a procedure you already have identified as being important to optimize.
These simple calls can generally be left in the code to provide general run-time information when used judiciously, (IE. at a macro level) as part of the program output without requiring the use of external run-time timers such as the GNU time(1) command.
Conversely, using pre-processors can let you easily include the timing calls during development and in debug versions; selectively excluding them in production releases to assist optimization.
Note that calls to any procedures that call the underlying operating system such as timing routines (as well as I/O and practically any call that queries the system) can have a major detrimental impact on optimization when placed in loops and can cause parallel threads to lock or sync while waiting on a system response (see “mutex” (mutually exclusive lock) calls, for example); so place timing calls sparsely and avoid calling timing calls in frequently called procedures and inner loops.
Profiling tools often impact performance in general. Side-effects of instrumenting the code and compiler options required for profiling along with portability issues can complicate using high-level tools as well.
The resolution (or even availability) of the Fortran timing procedures is unfortunately totally dependent on the hardware they are running on, so take note the resolution available for timing varies widely. On some hardware (typically GPUs) there may not even be a user-callable clock procedure, in which case all that may be available is the date and time from the main code. Even that might not be on some specialized hardware!
But the timing routines are usually available where Fortran applications are typically executed.
A basic module that creates an abstraction layer one level removed from the underlying intrinsics and easily changed to accommodate platform variations as well as an example program calling it illustrate basic loop timing using Fortran intrinsics …
module M_tictoc
use,intrinsic :: iso_fortran_env, only : int32,int64,real32,dp=>real64
use,intrinsic :: iso_fortran_env, only : stdout=>OUTPUT_UNIT
implicit none
private
type timer
real(kind=dp) :: cpu_start
real(kind=dp) :: cpu_end
integer(kind=int64) :: clock_start
integer(kind=int64) :: clock_end
integer :: wall_start(8)
integer :: wall_end(8)
contains
procedure :: tic => clock_tic
procedure :: toc => clock_toc
procedure :: print => clock_print
procedure :: walltime => clock_walltime
procedure :: cputime => clock_cputime
procedure :: dattime => clock_dattime
end type
interface timer
procedure :: clock_new
end interface timer
! type for unix epoch time and julian days
integer,parameter,public :: realtime=dp
public :: timer
public :: identify
character(len=*),parameter :: gen='(*(g0))'
character(len=*),parameter :: all='(*(g0,1x))'
contains
! initialization constructor
type(timer) function clock_new(this)
type(timer),intent(in),optional :: this
call cpu_time(clock_new%cpu_start)
call system_clock(clock_new%clock_start)
call date_and_time(values=clock_new%wall_start)
clock_new%cpu_end= clock_new%cpu_start
clock_new%clock_end= clock_new%clock_start
clock_new%wall_end= clock_new%wall_start
end function clock_new
subroutine clock_tic(this)
class(timer) :: this
call cpu_time(this%cpu_start)
call system_clock(this%clock_start)
call date_and_time(values=this%wall_start)
this%cpu_end = this%cpu_start
this%clock_end = this%clock_start
this%wall_end = this%wall_start
end subroutine clock_tic
subroutine clock_toc(this)
class(timer) :: this
call cpu_time(this%cpu_end)
call system_clock(this%clock_end)
call date_and_time(values=this%wall_end)
end subroutine clock_toc
subroutine clock_print(this,string,lun)
class(timer),intent(in) :: this
character(len=*),intent(in),optional :: string
integer(kind=int32),intent(in),optional :: lun
integer(kind=int32) :: lun_
real(kind=dp) :: elapsed_time
real(kind=realtime) :: elapsed_date_and_time
real(kind=dp) :: cpu_time
character(len=105) :: biggest
integer(kind=int64) :: count_rate
if(present(lun))then
lun_=lun
else
lun_=stdout
endif
elapsed_time = this%walltime()
elapsed_date_and_time = this%dattime()
cpu_time = this%cputime()
if(present(string)) write( lun_,gen ) string
if(elapsed_date_and_time >= 0)then
write( lun_,'(a,f0.3)') 'Elapsed dat (sec) ::',elapsed_date_and_time
else
write( lun_,'(a)') 'Elapsed dat (sec) :: N/A'
endif
! try to make a reasonable format for the number of digits of precision
call system_clock(count_rate=count_rate) ! Find the time rate
write(biggest,'("(a,f0.",i0,")")')ceiling(log10(real(count_rate,kind=dp)))
write( lun_,biggest) 'Elapsed time (sec) ::',elapsed_time
write( lun_,gen) 'CPU time (sec) ::',cpu_time
write( lun_,'(a,1x,f0.2)') 'Percentage ::',(cpu_time/elapsed_time)*100
end subroutine clock_print
function clock_walltime(this) result(elapsed_time)
class(timer) :: this
integer(kind=int64) :: count_rate
real(kind=dp) :: elapsed_time
real(kind=dp) :: cpu_time
call system_clock(count_rate=count_rate)
elapsed_time = real(this%clock_end-this%clock_start,kind=dp)/real(count_rate,kind=dp)
end function clock_walltime
function clock_cputime(this) result(cpu_time)
class(timer) :: this
real(kind=dp) :: cpu_time
cpu_time = real(this%cpu_end-this%cpu_start,kind=dp)
end function clock_cputime
function clock_dattime(this) result(cpu_time)
class(timer) :: this
real(kind=dp) :: cpu_time
real(kind=realtime) :: endit,startit
integer :: ierr
call date_to_julian(this%wall_end,endit,ierr)
call date_to_julian(this%wall_start,startit,ierr)
if(ierr == 0)then
cpu_time = real((endit-startit)*86400,kind=dp)
else
cpu_time = -huge(cpu_time)
endif
end function clock_dattime
subroutine date_to_julian(dat,julian,ierr)
! @(#)M_time::date_to_julian(3f): Converts proleptic Gregorian DAT date-time array to Julian Date
! REFERENCE: From Wikipedia, the free encyclopedia 2015-12-19
! correction for time zone should or should not be included?
integer,intent(in) :: dat(8)! array like returned by DATE_AND_TIME(3f)
real(kind=realtime),intent(out) :: julian
integer,intent(out) :: ierr ! 0 =successful, -1=bad year, -4=bad date 29 Feb, non leap-year, -6 negative value -9 bad input
integer :: a , y , m , jdn
integer :: utc
utc=dat(4)*60
julian = -huge(99999) ! this is the date if an error occurs and IERR is < 0
if(any(dat == -huge(dat)))then
ierr=-9
return
endif
associate&
&(year=>dat(1),month=>dat(2),day=>dat(3),utc=>utc,hour=>dat(5),minute=>dat(6),second=>dat(7)-utc+dat(8)/1000.0d0)
if ( year==0 .or. year<-4713 ) then
ierr = -1
else
ierr=0
! You must compute first the number of years (Y) and months (M) since March 1st -4800 (March 1, 4801 BC)
a = (14-month)/12 ! A will be 1 for January or February, and 0 for other months, with integer truncation
y = year + 4800 - a
m = month + 12*a - 3 ! M will be 0 for March and 11 for February
! All years in the BC era must be converted to astronomical years, so that 1BC is year 0, 2 BC is year "-1", etc.
! Convert to a negative number, then increment towards zero
! Staring from a Gregorian calendar date
jdn = day + (153*m+2)/5 + 365*y + y/4 - y/100 + y/400 - 32045 ! with integer truncation
! Finding the Julian Calendar date given the JDN (Julian day number) and time of day
julian = jdn + (hour-12)/24.0_realtime + (minute)/1440.0_realtime + second/86400.0_realtime
ierr=merge(-6,ierr, julian<0.0_realtime ) ! Julian Day must be non-negative
endif
end associate
end subroutine date_to_julian
subroutine identify()
use, intrinsic :: iso_fortran_env, only : compiler_version
use, intrinsic :: iso_fortran_env, only : compiler_options
character(len=*),parameter :: all='(*(g0,1x))'
character(len=*),parameter :: chs='(*(g0))'
character(len=2) :: ch, split
integer :: argument_length, istat, posix, dos, i
character(len=:),allocatable :: progname, options
call get_command_argument(number=0,length=argument_length)
if(allocated(progname))deallocate(progname)
allocate(character(len=argument_length) :: progname)
call get_command_argument (0, progname, status=istat)
print all, 'run date.....:',iso_8601()
if (istat == 0) then
print all, "program name.:" // trim (progname)
else
print all, "Could not get the program name " // trim (progname)
endif
print all, 'compiled by..:', compiler_version()
options=' '//compiler_options()
if(options /= '')then
print all, 'using options:'
! guess which one
posix=0
dos=0
do i=2,len(options)
ch=options(i-1:i)
select case(ch)
case(' -'); posix=posix+1
case(' /'); dos=dos+1
end select
enddo
split=merge(' -',' /',posix > 0)
do i=2,len(options)
ch=options(i-1:i)
if(ch == split)then
write(*,chs,advance='no')char(10),ch
else
write(*,chs,advance='no')ch(2:2)
endif
enddo
print all
endif
print all
end subroutine identify
function iso_8601()
! return date using ISO 8601 format at a resolution of seconds
character(len=8) :: dt
character(len=10) :: tm
character(len=5) :: zone
character(len=25) :: iso_8601
call date_and_time(dt, tm, zone)
ISO_8601 = dt(1:4)//'-'//dt(5:6)//'-'//dt(7:8) &
& //'T'// &
& tm(1:2)//':'//tm(3:4)//':'//tm(5:6) &
& //zone(1:3)//':'//zone(4:5)
end function iso_8601
end module M_tictoc
As an example use, would you expect there to be any difference in timing between sqrt(x) and x**0.5? On a lot of platforms there will be a rather large one for 32-bit values in particular. For example:
program testit
use M_tictoc,only : timer, identify
!
! Sometimes a compiler optimizes away simplistic tests.
! using a volatile variable usually keeps compilers from
! optimizing away a loop which it might otherwise identify
! as dead code and remove in a simple example program.
real,volatile :: sum
!
! The resolution of the Fortran intrinsic timing procedures
! may be too coarse to use to time just a few operations.
! A more accurate measure comes from running repeated
! calls on the code to be timed till at least a few seconds have passed.
!
! A sufficient number would vary depending on the platform.
integer,parameter :: how_many_times=100000000
! create an instance of a timer
type(timer) :: clock
integer :: i
character(len=*),parameter :: all='(*(g0,1x))'
! Describe the platform being used for record keeping
call identify()
! initialize the clock
clock=timer()
print all, 'time sqrt(r) versus r**0.5 ...'
print all, 'time calls to sqrt:'
sum=0.0
call clock%tic() ! <= start a timer
do i=1,how_many_times
sum=sum+sqrt(real(i))
enddo
print all, sum
call clock%toc() ! <= end the timer
call clock%print() ! <= print the timing information
print all, 'time calls to x**0.5:'
sum=0.0
call clock%tic()
do i=1,how_many_times
sum=sum+i**0.5
enddo
print all, sum
call clock%toc()
call clock%print()
end program testit
run date.....: 2024-09-18T21:09:12-04:00
program name.:./testit
compiled by..: GCC version 13.2.1 20240426
using options:
-mtune=generic
-march=x86-64
time sqrt(r) versus r**0.5
time calls to sqrt:
0.274877907E+12
Elapsed dat (sec) ::.405
Elapsed time (sec) ::.405112200
CPU time (sec) ::0.31200000000000000
Percentage :: 77.02
time calls to x**0.5:
0.274877907E+12
Elapsed dat (sec) ::2.586
Elapsed time (sec) ::2.585699600
CPU time (sec) ::1.7660000000000000
Percentage :: 68.30
On this particular platform there is a significant difference between these two mathematically equivalent computations! So maybe there is a reason there is a redundant-looking procedure for square roots (Note some compilers will recognize the mathematical equivalency and call an optimal solution.)!
Where available it is usually a good idea to run strace(1) where available on performance-critical codes that should be floating-point intensive to see if a lot of system calls are being made that can be eliminated. They very often cause locks and interrupts that slow performance.
and there are many many other aspects to optimizing code … memory thrashing and high-water marks, excessive system calls, I/O layout and caching, context switching .,, .
benchmark.f- A benchmarking library for Fortran code
ftime - Basic Fortran 2003 timing routines for Fortran programs
M_stopwatch - Fortran OOP clocks that may be interrupted and resumed, named, and gathered into groups. This is a descendant of the original StopWatch package that is accessible as an fpm(1) package.
StopWatch - StopWatch is a NIST Fortran 90 module for measuring execution time of program segments. It is designed to be a portable, easy-to-use means of measuring execution time. It supports the wall clock, CPU clock, a breakdown of the CPU clock into user and system times, and returns all times in seconds. It provides a simple means of determining which clocks are available, and the precision of those clocks. It is written in a style that allows it to be used with the subset languages Elf90 and F, as well as full Fortran 90 and Fortran 95 compilers.
timer - To just measure the wallclock time a command uses without instrumenting it one can use tools like the GNU/Linux time(1) command or the Linux /proc/$PID/stat file. timer is the most basic of this class of tool. Given a command it runs it and reports how long it took.
MPI_Wtime - The MPI library comes with a set of timing procedures for timing parallel code
Beliavsky Lists - A very extensive list of Fortran code on github that can be searched for “timing”, “time”, … . Several large packages contain timing subpackages you can find here.
gprof - For large codes it can be a major task just to find where the code is spending time. Profiling tools like gprof(1) are indispensable tools for tackling large project and may job be one compiler switch away from being in your hands.
Some of the most powerful profiling tools are supplied by vendors.
Check your compiler-specific documentation or think about browsing for commercial optimization tools and assistance for the big jobs. A discussion on Fortran Discourse lists a good number of them.