In-depth timing of entire applications is best suited to profiling tools such as the commonly available gprof(1) command and vendor-supplied tooling and utilities designed for that purpose.
However, instrumentation affects and compiler options required for profiling along with portability issues can make tools like gprof(1) inappropriate or overkill for some simple situations.
So for those times where do-it-yourself timing is the better choice you will find a few of the most useful standard Fortran intrinsics for basic macro-level timing are
It is not much of an overstatement to say these routines are implementation-dependent and often generally not a good choice for testing timing of short events.
On the other hand,
the calls can generally be left in the code to provide general run-time timing information when used judiciously, (IE. at a macro level).
If already using pre-processors it is usually very easy to make the calls only in development and debug versions and to selectively exclude them in production releases to assist optimization.
they are usually a great tool for timing throw-away development programs used to time different versions of a procedure you already have identified as being important to optimize.
they can be used port ably in testing frameworks to capture and identify large changes in performance between releases and different compilers and compiler switch affects.
Since they are intrinsics a big advantage is that they are available on virtually all platforms, unlike most higher-level profiling packages.
Note that calls to any procedures that call the underlying operating system such as timing routines (and even I/O and practically any call that queries the system) can have a major detrimental impact on optimization when placed in loops and can cause parallel threads to lock or sync while waiting on a system response (see “mutex” (mutually exclusive lock) calls, for example).
It is usually a good idea to run strace(1) where available on performance-critical codes that should be floating-point intensive to see if a lot of system calls are being made that can be eliminated. They very often cause locks and interrupts that slow performance.
The resolution (or even availability) of the procedures is totally dependent on the hardware they are running on, so take note the resolution available for timing varies widely. On some hardware (typically GPUs) there may not even be a user-callable clock procedure, in which case all that may be available is the date and time from the main code. Even that might not be on some specialized hardware!
But the timing routines are usually available and functional. It is still useful to package and wrap them in a module so they can easily be changed to suite different platforms. An example of such a wrapper is included here.
A simple example program calling the module illustrates using it to time a loop of calls.
program testit
use M_tictoc,only : timer, say_hello
!
! using a volatile variable usually keeps compilers from
! optimizing away a loop
real,volatile :: sum
!
! a more accurate measure comes from running repeated
! calls till at least a few seconds have passed.
! therefore this number would vary depending on the platform.
integer,parameter :: how_many_times=100000000
! create an instance of a timer
type(timer) :: clock
integer :: i
character(len=*),parameter :: all='(*(g0,1x))'
! some useful info for record keeping
call say_hello()
! initialize the clock
clock=timer()
print all, 'time sqrt(r) versus r**0.5'
print all, 'time calls to sqrt:'
sum=0.0
call clock%tic() ! <= start a timer
do i=1,how_many_times
sum=sum+sqrt(real(i))
enddo
print all, sum
call clock%toc() ! <= end the timer
call clock%print() ! <= print the timing information
print all, 'time calls to x**0.5:'
sum=0.0
call clock%tic()
do i=1,how_many_times
sum=sum+i**0.5
enddo
print all, sum
call clock%toc()
call clock%print()
end program testit
Expecting there to be no difference in timing? On a lot of platforms there will be a rather large one for 32-bit in particular. For example:
run date.....: 2024-09-18T21:09:12-04:00
program name.:./testit
compiled by..: GCC version 13.2.1 20240426
using options:
-mtune=generic
-march=x86-64
time sqrt(r) versus r**0.5
time calls to sqrt:
0.274877907E+12
Elapsed dat (sec) ::.405
Elapsed time (sec) ::.405112200
CPU time (sec) ::0.31200000000000000
Percentage :: 77.02
time calls to x**0.5:
0.274877907E+12
Elapsed dat (sec) ::2.586
Elapsed time (sec) ::2.585699600
CPU time (sec) ::1.7660000000000000
Percentage :: 68.30
So maybe there is a reason there is a redundant-looking procedure for square roots (Note some compilers will recognize the mathematical equivalency and call an optimal solution!).
Timing can be more complicated for parallel code; there is the issue of System time versus User time and Idle time; sometimes call counting might be useful in lieu of timing; and there are many many other aspects to optimizing code…. memory thrashing and high-water marks, excessive system calls, I/O layout and caching, context switching .,, .
But, a little module might be handy for the many simpler cases …
module M_tictoc
use,intrinsic :: iso_fortran_env, only : int32,int64,real32,dp=>real64
use,intrinsic :: iso_fortran_env, only : stdout=>OUTPUT_UNIT
implicit none
private
type timer
real(kind=dp) :: cpu_start
real(kind=dp) :: cpu_end
integer(kind=int64) :: clock_start
integer(kind=int64) :: clock_end
integer :: wall_start(8)
integer :: wall_end(8)
contains
procedure :: tic => clock_tic
procedure :: toc => clock_toc
procedure :: print => clock_print
procedure :: walltime => clock_walltime
procedure :: cputime => clock_cputime
procedure :: dattime => clock_dattime
end type
interface timer
procedure :: clock_new
end interface timer
! type for unix epoch time and julian days
integer,parameter,public :: realtime=dp
public :: timer
public :: say_hello
character(len=*),parameter :: gen='(*(g0))'
character(len=*),parameter :: all='(*(g0,1x))'
contains
! initialization constructor
type(timer) function clock_new(this)
type(timer),intent(in),optional :: this
call cpu_time(clock_new%cpu_start)
call system_clock(clock_new%clock_start)
call date_and_time(values=clock_new%wall_start)
clock_new%cpu_end= clock_new%cpu_start
clock_new%clock_end= clock_new%clock_start
clock_new%wall_end= clock_new%wall_start
end function clock_new
subroutine clock_tic(this)
class(timer) :: this
call cpu_time(this%cpu_start)
call system_clock(this%clock_start)
call date_and_time(values=this%wall_start)
this%cpu_end = this%cpu_start
this%clock_end = this%clock_start
this%wall_end = this%wall_start
end subroutine clock_tic
subroutine clock_toc(this)
class(timer) :: this
call cpu_time(this%cpu_end)
call system_clock(this%clock_end)
call date_and_time(values=this%wall_end)
end subroutine clock_toc
subroutine clock_print(this,string,lun)
class(timer),intent(in) :: this
character(len=*),intent(in),optional :: string
integer(kind=int32),intent(in),optional :: lun
integer(kind=int32) :: lun_
real(kind=dp) :: elapsed_time
real(kind=realtime) :: elapsed_date_and_time
real(kind=dp) :: cpu_time
character(len=105) :: biggest
integer(kind=int64) :: count_rate
if(present(lun))then
lun_=lun
else
lun_=stdout
endif
elapsed_time = this%walltime()
elapsed_date_and_time = this%dattime()
cpu_time = this%cputime()
if(present(string)) write( lun_,gen ) string
if(elapsed_date_and_time >= 0)then
write( lun_,'(a,f0.3)') 'Elapsed dat (sec) ::',elapsed_date_and_time
else
write( lun_,'(a)') 'Elapsed dat (sec) :: N/A'
endif
! try to make a reasonable format for the number of digits of precision
call system_clock(count_rate=count_rate) ! Find the time rate
write(biggest,'("(a,f0.",i0,")")')ceiling(log10(real(count_rate,kind=dp)))
write( lun_,biggest) 'Elapsed time (sec) ::',elapsed_time
write( lun_,gen) 'CPU time (sec) ::',cpu_time
write( lun_,'(a,1x,f0.2)') 'Percentage ::',(cpu_time/elapsed_time)*100
end subroutine clock_print
function clock_walltime(this) result(elapsed_time)
class(timer) :: this
integer(kind=int64) :: count_rate
real(kind=dp) :: elapsed_time
real(kind=dp) :: cpu_time
call system_clock(count_rate=count_rate)
elapsed_time = real(this%clock_end-this%clock_start,kind=dp)/real(count_rate,kind=dp)
end function clock_walltime
function clock_cputime(this) result(cpu_time)
class(timer) :: this
real(kind=dp) :: cpu_time
cpu_time = real(this%cpu_end-this%cpu_start,kind=dp)
end function clock_cputime
function clock_dattime(this) result(cpu_time)
class(timer) :: this
real(kind=dp) :: cpu_time
real(kind=realtime) :: endit,startit
integer :: ierr
call date_to_julian(this%wall_end,endit,ierr)
call date_to_julian(this%wall_start,startit,ierr)
if(ierr == 0)then
cpu_time = real((endit-startit)*86400,kind=dp)
else
cpu_time = -huge(cpu_time)
endif
end function clock_dattime
subroutine date_to_julian(dat,julian,ierr)
! @(#)M_time::date_to_julian(3f): Converts proleptic Gregorian DAT date-time array to Julian Date
! REFERENCE: From Wikipedia, the free encyclopedia 2015-12-19
! correction for time zone should or should not be included?
integer,intent(in) :: dat(8)! array like returned by DATE_AND_TIME(3f)
real(kind=realtime),intent(out) :: julian
integer,intent(out) :: ierr ! 0 =successful, -1=bad year, -4=bad date 29 Feb, non leap-year, -6 negative value -9 bad input
integer :: a , y , m , jdn
integer :: utc
utc=dat(4)*60
julian = -huge(99999) ! this is the date if an error occurs and IERR is < 0
if(any(dat == -huge(dat)))then
ierr=-9
return
endif
associate&
&(year=>dat(1),month=>dat(2),day=>dat(3),utc=>utc,hour=>dat(5),minute=>dat(6),second=>dat(7)-utc+dat(8)/1000.0d0)
if ( year==0 .or. year<-4713 ) then
ierr = -1
else
ierr=0
! You must compute first the number of years (Y) and months (M) since March 1st -4800 (March 1, 4801 BC)
a = (14-month)/12 ! A will be 1 for January or February, and 0 for other months, with integer truncation
y = year + 4800 - a
m = month + 12*a - 3 ! M will be 0 for March and 11 for February
! All years in the BC era must be converted to astronomical years, so that 1BC is year 0, 2 BC is year "-1", etc.
! Convert to a negative number, then increment towards zero
! Staring from a Gregorian calendar date
jdn = day + (153*m+2)/5 + 365*y + y/4 - y/100 + y/400 - 32045 ! with integer truncation
! Finding the Julian Calendar date given the JDN (Julian day number) and time of day
julian = jdn + (hour-12)/24.0_realtime + (minute)/1440.0_realtime + second/86400.0_realtime
ierr=merge(-6,ierr, julian<0.0_realtime ) ! Julian Day must be non-negative
endif
end associate
end subroutine date_to_julian
subroutine say_hello()
use, intrinsic :: iso_fortran_env, only : compiler_version
use, intrinsic :: iso_fortran_env, only : compiler_options
character(len=*),parameter :: all='(*(g0,1x))'
character(len=*),parameter :: chs='(*(g0))'
character(len=2) :: ch, split
integer :: argument_length, istat, posix, dos, i
character(len=:),allocatable :: progname, options
call get_command_argument(number=0,length=argument_length)
if(allocated(progname))deallocate(progname)
allocate(character(len=argument_length) :: progname)
call get_command_argument (0, progname, status=istat)
print all, 'run date.....:',iso_8601()
if (istat == 0) then
print all, "program name.:" // trim (progname)
else
print all, "Could not get the program name " // trim (progname)
endif
print all, 'compiled by..:', compiler_version()
options=' '//compiler_options()
if(options /= '')then
print all, 'using options:'
! guess which one
posix=0
dos=0
do i=2,len(options)
ch=options(i-1:i)
select case(ch)
case(' -'); posix=posix+1
case(' /'); dos=dos+1
end select
enddo
split=merge(' -',' /',posix > 0)
do i=2,len(options)
ch=options(i-1:i)
if(ch == split)then
write(*,chs,advance='no')char(10),ch
else
write(*,chs,advance='no')ch(2:2)
endif
enddo
print all
endif
print all
end subroutine say_hello
function iso_8601()
! return date using ISO 8601 format at a resolution of seconds
character(len=8) :: dt
character(len=10) :: tm
character(len=5) :: zone
character(len=25) :: iso_8601
call date_and_time(dt, tm, zone)
ISO_8601 = dt(1:4)//'-'//dt(5:6)//'-'//dt(7:8) &
& //'T'// &
& tm(1:2)//':'//tm(3:4)//':'//tm(5:6) &
& //zone(1:3)//':'//zone(4:5)
end function iso_8601
end module M_tictoc
benchmark.f- A benchmarking library for Fortran code
ftime - Basic Fortran 2003 timing routines for Fortran programs
M_stopwatch - Fortran OOP clocks that may be interrupted and resumed, named, and gathered into groups. This is a descendent of the original StopWatch package that is accessible as an fpm(1) package.
StopWatch - StopWatch is a NIST Fortran 90 module for measuring execution time of program segments. It is designed to be a portable, easy-to-use means of measuring execution time. It supports the wall clock, CPU clock, a breakdown of the CPU clock into user and system times, and returns all times in seconds. It provides a simple means of determining which clocks are available, and the precision of those clocks. It is written in a style that allows it to be used with the subset languages Elf90 and F, as well as full Fortran 90 and Fortran 95 compilers.
timer - To just measure the wallclock time a command uses without instrumenting it one can use tools like the GNU/Linux time(1) command or the Linux /proc/$PID/stat file. timer is the most basic of this class of tool. Given a command it runs it and reports how long it took.
MPI_Wtime - The MPI library comes with a set of timing procedures for timing parallel code
Beliavsky Lists - A very extensive list of Fortran code on github that can be searched for “timing”, “time”, … . Several large packages contain timing subpackages you can find here.
gprof - For large codes it can be a major task just to find where the code is spending time. Profiling tools like gprof(1) are indispensible tools for tackling large project and may job be one compiler switch away from being in your hands.
Some of the most powerful profiling tools are supplied by vendors. Check your compiler-specific documentation or think about browsing for commercial optimization tools and assistance for the big jobs. A discussion on Fortran Discourse lists a good number of them.