Clicky

Fortran Wiki
lesson5_ucs4

Introduction to Fortran Unicode support

Lesson V: processing Unicode file names on OPEN() statements

If your OS supports utf-8 as the default encoding it is likely you will at some point encounter a filename containing multi-byte Unicode characters.

The definition of the OPEN() statement specifies that the filename expression is a “scalar-default-char-expr”. But it also states

A file may have a name; a file that has a name is called a named
file. The name of a named file is represented by a character
string value. The set of allowable names for a file is processor
dependent.

And the description of the FILE= specifier states

12.5.6.10 FILE= specifier in the OPEN statement

The value of the FILE= specifier is the name of the file to
be connected to the specified unit. Any trailing blanks are
ignored. The file-name-expr shall be a name that is allowed by
the processor. The interpretation of case is processor dependent.

So what filenames are allowed is processor-dependent – but probably is restricted to a default character expression, which currently is typically ASCII or extended ASCII.

So if your Fortran compiler allows Unicode filenames the filename is likely to require being specified as a stream of bytes of the default CHARACTER kind representing utf-8 characters.

But what if you have the filename in UCS-4 internal representation? Fortran does not currently provide an intrinsic procedure for converting ucs-4 to utf-8 Unicode.

Fortran does the conversion needed when writing ucs-4 internal data to utf-8-encoded files. We can use that functionality to create a simple conversion routine.

program read_filename
! @(#) convert ucs-4 filename to utc-8 for OPEN() statement
use, intrinsic :: iso_fortran_env, only : output_unit
implicit none
integer, parameter                     :: ucs4 = selected_char_kind ('ISO_10646')
character(len=:),allocatable           :: afilename
character(len=:,kind=ucs4),allocatable :: ufilename
integer                                :: lun

   ! we have a UCS-4 filename from somewhere ...
   ufilename = & ! ENCODING:môj_obľúbený_súbor "my_favorite_file"
   char(int(z'6D'),kind=ucs4)  // char(int(z'F4'),kind=ucs4) // char(int(z'6A'),kind=ucs4)// &
   char(int(z'5F'),kind=ucs4)  // char(int(z'6F'),kind=ucs4) // char(int(z'62'),kind=ucs4)// &
   char(int(z'13E'),kind=ucs4) // char(int(z'FA'),kind=ucs4) // char(int(z'62'),kind=ucs4)// &
   char(int(z'65'),kind=ucs4)  // char(int(z'6E'),kind=ucs4) // char(int(z'FD'),kind=ucs4)// &
   char(int(z'5F'),kind=ucs4)  // char(int(z'73'),kind=ucs4) // char(int(z'FA'),kind=ucs4)// &
   char(int(z'62'),kind=ucs4)  // char(int(z'6F'),kind=ucs4) // char(int(z'72'),kind=ucs4)

   open (output_unit, encoding='utf-8')
   write(output_unit,*)'FILENAME:',ufilename

   afilename=ucs4_to_utf8(ufilename)

   open (newunit=lun, file=afilename, encoding='utf-8')
   !CLOSE(unit=lun, status='delete')

contains

function ucs4_to_utf8(ucs4_string) result(ascii_string)
character(len=*,kind=ucs4),intent(in) :: ucs4_string
character(len=:),allocatable          :: ascii_string
character(len=(len(ucs4_string)*4))   :: line
integer                               :: lun
   open(newunit=lun,encoding='UTF-8',status='scratch')
   write(lun,'(A)')ucs4_string
   rewind(lun)
   open(unit=lun,encoding='default')
   read(lun,'(A)')line
   close(lun)
   ascii_string=trim(line)
end function ucs4_to_utf8

end program read_filename

Summary

If your processor supports Unicode filenames you probably need to convert any filename in UCS-4 encoding to UTF-8 encoding to use the name on an OPEN() statement.

Using Fortran’s ability to encode UCS-4 data as UTF-8 when writing external files it is easy to create a function for converting between the two encodings.

Note that modules of related functions can be found at github.com/urbanjost/M_unicode and github.com/urbanjost/M_unicode that use methods more efficient than using scratch files.

If your processor does not support Unicode filenames your operating system may support links. So an alternative might be to make an ASCII filename that is an alias for the unusable filename. This can typically be done with system commands using the intrinsic EXECUTE_COMMAND_LINE(3) if you do not have procedures for creating (and removing) links.