Clicky

Fortran Wiki
Scratch Files

Fortran Scratch Files

Fortran lets you open a “scratch” file with OPEN(STATUS='SCRATCH',...) with the restriction that FILE="filename" is not allowed on the OPEN(3f). What actually occurs is very system-dependent.

So what exactly is the difference between a scratch file and a named external file? For example, what are the potential advantages of opening a scratch file versus doing a normal named OPEN(3f) followed by closing the file later with STATUS='DELETE'? What are the disadvantages?

There are many extensions and differences between programming environments regarding scratch files. What is really standards-conforming? Why are there so many extensions?

What will you do differently if you know exactly how scratch files are implemented?

What is a scratch file?

A scratch file is a special type of external file. It is an unnamed temporary file intended to exist only while being used by a single program execution (It does not need named as the intent is that it will not be permanently saved to disk).

In all other respects, a scratch file behaves like other external files. So the only unique things required of a scratch file are

  • a scratch file cannot be given a pathname on an OPEN(3f) with FILE='some_filename'

  • when closed explicitly or at normal program termination a scratch file is required to be deleted. That implies you cannot do

      CLOSE(LUN,STATUS='KEEP',...) ! **not allowed**

    to retain a scratch file using standard Fortran.

This simple definition gives some freedom in implementation that does not apply to typical external files. A scratch file easily might not even be a disk file – it might be transparently implemented in memory or in a database, for example. That being said scratch files are commonly implemented as system files too.

Note the Fortran standard does not say other files have to be regular system files either, but that is generally expected outside of highly specialized environments.

controlling the location of a scratch file

So one thing special about a scratch file is the programmer is not permitted to specify the pathname in the OPEN(3f) of the file. This means the user can be provided other ways external to Fortran (or as an extension) to specify where the scratch data resides.

So many Fortran implementations use the tempnam(3c) system routine or an equivalent procedure to initially name the scratch file when opening it at the system level.

Among other things this generally allows the directory name where the file is to be created to be controlled external to the program.

Typically a capability to specify where scratch files are created is provided via the first defined environment variable from the set TMPDIR, TMP, and TEMP (almost always in that order) without requiring access to the program code. $TMPDIR is the preferred variable name to use. If no scratch directory is specified GNU/Linux and Unix systems usually place the file in /tmp/; although some systems default to the current directory.

Being able to control where the scratch files are generated is very important

  • A user can make sure scratch files are generated in a filesystem with sufficient space, preventing filesystems from filling.
  • performance can be optimized by selecting a local disk, a memory-resident filesystem, or a high-speed parallel file server (eg. a Lustre server), as appropriate.
  • you can spread out concurrent program executions onto different resources, preventing a fileserver from being overloaded.
  • security can be improved by placing files in private directories. Not every implementation will provide unlinked files, which are inherently more secure than visible files; so assume the file will be a conventional file and make sure your file permissions for newly created files are as limited as possible. On GNU/Linux and Unix machines remember it is not only your umask(1), but kernel and OS defaults and fileserver options that can affect file permissions.
  • other processes can be protected by using quotaed file space so any unexpected file system usage does not crash other programs.
  • if scratch files are implemented as named files it will be a lot easier to clean up scratch files when a program fails if you can place all the scratch files in a scratch directory.

Having unique names generated by the system avoids a common problem

Since the program cannot specify the file pathname a Fortran implementation is freed to use methods to make sure the scratch pathnames are unique, which is another aspect of such procedures as tempnam(3c).

If the Fortran implementation automatically uniquely names scratch files pathname collisions caused by multiple programs accessing the same filename is eliminated. Then no one has to worry about scratch file pathnames being duplicated and colliding even when multiple commands are running.

If your compiler does not have an option to automatically generate unique pathnames for scratch files it should.

automatic cleanup for terminated processes

Since there is no requirement for a scratch file to be accessible by other processes or to exist even after abnormal program termination the implementation can also unlink(3c) the files as soon as they are created.

If a system supports unlinking a file, then a pathname becomes unavailable to normal file system commands (usually immediately after it is created) – e.g. it cannot be seen by the (GNU/Linux and Unix) ls(1) command and cannot be opened by any other normal process.

On most systems, unlinking the file has the advantage that the file will disappear when the program terminates even if the process terminates abnormally.

Disadvantages

So lets say your compiler takes advantage of the special attributes of a scratch file to make them go away even after abnormal termination, have their location controlled by external means such as the $TMPDIR environment variable, and automatically be created with unique names that do not collide with other files.

  • remember these features are all implementation-dependent. All those behaviors are allowed by the standard, but not required. On the other hand if your compiler does not do these things, “OPEN(STATUS=‘SCRATCH’…)” is really no different than opening a non-scratch file and closing it at program termination with “STATUS=‘DELETE’”.

  • if a file is unlinked so that other system processes can no longer open it, you cannot easily access the data for debugging.

  • if a file is unlinked so it goes away when the process does it is harder for anyone else such as a system administrator to tell who is using and possibly filling up a file system. A file system can be full but look like it has no files in it because unlinked files are not visible to other system commands. Depending on your system you may find commands like pfiles(1), lsof(1), “netstat -vatupn” or “find /proc/$PID/fs” on most GNU/Linux and Unix file systems will let you find what files a process has open and where they reside even if they are unlinked.

  • Some implementations may generate a “unique” filename that is only unique to a particular compute node and in addition may not unlink the files. That means if scratch files are going into the same directory on a shared file system such as an NFS file server from different compute nodes the generated names may not be unique and could collide with scratch files being generated on other compute nodes.

Extensions

Sadly, some compilers just name scratch files something like fort.LUN in the current directory, nullifying almost all the potentially automatic advantages of opening a scratch file. That is still standard-conforming. So if you build with multiple compilers make sure your application instructions let users know where scratch files will be generated and how they will be named for each environment.

Some compilers like the IBM compiler default to acting as described above but have extensions to allow for naming scratch files with an environment variable when desired (XLSFSCRATCH_unit and the runtime option “scratchvars”). These extensions are particularly handy for debugging, as you have a pathname to a regular file you can see with system commands.

There are many other common extensions. Some PEs allow you to name the file when opened. Some let you do an INQUIRE(3f) and get the filename being used as a scratch file, and some allow you to close a scratch file with STATUS='KEEP' and to close the file with a non-standard NAME= or FILE= option on CLOSE(3f)`. This is very non-standard and non-portable.

On most systems if you do not specify a filename on an OPEN(3f)

    `OPEN(UNIT=7)`

A regular system file is created, often named “fort.NNN” where NNN is the unit LUN number. The filename is system-dependent. The standard says

  "If the filename is omitted and the unit is not connected to a file,
  the STATUS= specifier shall be specified with a value of SCRATCH;
  in this case, the connection is made to a processor-dependent file."

That sounds like it means if you do an OPEN(3f) on a file that is not preconnected without a filename that it will be a scratch file and be removed at program termination. I have not seen a compiler do that yet. Regular files like “ftn.NNN” or “fort.NNN” are usually created either in the current directory or where the environment variable $TMPNAM points to.

Check your documentation and make sure you understand how scratch files are treated in your PE.

Here is a skeleton program that gives some ideas on how to see possibly significant environment variables, test if your system lets you use INQUIRE(3f) to get a scratch file name, and pause while a scratch file is open so you can examine your system and see if you can see scratch files and what their location and permissions are:


    program demo_scratch
    implicit none
    logical             :: ifnamed
    character           :: paws
    character(len=4096) :: filename,buffer
    integer             :: ios,lun,lun2

    write(*,*)'try to see where scratch files are open'
    write(*,*)'This is very implementation-dependent'
    write(*,*)'See your compiler documentation!'
    write(*,*)

    write(*,*)'Likely suspects if an environment variable'
    write(*,*)'is used to determine what directory a scratch'
    write(*,*)'file is written in (probably in this order):'
    write(*,*)

    buffer=' '
    call get_environment_variable('TMPDIR',buffer)
    write(*,*)'TMPDIR=',trim(buffer)
    call get_environment_variable('TMP',buffer)
    write(*,*)'TMP=',trim(buffer)
    call get_environment_variable('TEMP',buffer)
    write(*,*)'TEMP=',trim(buffer)

    ! see if you can query the pathname of a scratch file
    open(newunit=lun2) 
    inquire(unit=lun2,named=ifnamed)
    if(ifnamed)then
       inquire(unit=lun2,name=filename)
       write(*,*)'If you can see the pathname of a scratch file'
       write(*,*)'the standard does not require it ...'
       write(*,*)'filename=',trim(filename)
    endif

    ! NOTE: NOT standard, but some compilers let you specify a name 
    !       to create a named scratch file...
    !!open(newunit=lun,status='scratch',file='where_I_want')

    ! assuming you cannot see a scratch file name pause your program
    ! so you can examine your system to learn how to find where
    ! applications are putting scratch files and how to detect them
    open(newunit=lun,status='scratch')
    inquire(unit=lun,named=ifnamed)
    if(ifnamed)then
       inquire(unit=lun,name=filename)
       write(*,*)'filename=',trim(filename)
    else
       write(*,*)
       write(*,*)'file not named, so while the program is paused' 
       write(*,*)'look at what files it has open (lsof, pfiles, ....'
       write(*,'("Enter return to end program...")',advance='no')
       read(*,'(a)',iostat=ios)paws
    endif

    end program demo_scratch

Footnotes

Big scratch files should be unformatted

By definition scratch files are files the program interacts with as opposed to a person or other system commands. Therefore a large scratch file should almost always be a binary file to improve performance (versus a formatted text file). For simplicity that is not the case in some of the examples in this article.

LUNs can be negative values even if you cannot specify a negative UNIT= value

If cleaning up files like “fort.LUN”, or using the LUN to build filenames yourself remember the LUN number is often negative when using “OPEN(NEWUNIT=LUN,…)”. So something like OPEN(NEWUNIT=LUN) may very well create a file such as “fort.-10”.

Make sure scratch files are documented

A user of a program may have no idea the program is using scratch files, especially if they are not implemented as normal files in the current directory of the process. So if scratch files are not unlinked make sure they are being cleaned up and not left to become filesystem clutter and make sure the user understands the system demands made by the scratch files.

That is, since scratch file names are by definition system-dependent, it is hard to have automated system clean-up utilities or wrappers generically clean them up when programs abnormally terminate if scratch files are not unlinked files. That is why it can be handy to make a subdirectory for all your scratch files that is easily identified as scratch files, such as “/tmp/scratch.123/”.


Definitions

PE
PE stands for Programming Environment. This includes the constraints imposed by your compiler, loader, operating system and hardware.

urbanjost 20171118

category: discussion