Clicky

Fortran Wiki
backslash_ext

off the beaten path: common Unicode extensions

Extension II: backslash extension

A common Fortran extension is to support Unicode escape sequences which specify characters by their hexadecimal code points. This allows for building UCS-4 strings more easily than using BOZ literals and the CHAR() function. Usually the form is

\xnn:        8-bit hexadecimal code nn
\unnnn:     16-bit hexadecimal code nnnn
\Unnnnnnnn: 32-bit hexadecimal code nnnnnnnn 

To enable this generally requires a compiler switch such as -fbackslash or -Mbackslash. Without an option, backslashes within string literals are typically treated as literal backslash characters. However, in at least one case the default is to enable backslash escape sequences and a switch is required to cause standard-conforming behavior. Other C-style escape sequences such as “\n” for a newline and “\t” for a tab character are also typically supported.

The following example prints the Unicode symbol ☻ (black smiling face) of code point U+263B. The compiled binary must be executed in a terminal with Unicode support, like XTerm or sakura.

program backslash_escape 
use,intrinsic :: iso_fortran_env, only: output_unit
implicit none
integer,parameter :: ucs4 = selected_char_kind('ISO_10646')
character(kind=ucs4,len=:),allocatable :: str

   ! EXTENSION:
   str = ucs4_'Unicode character: \u263B'

   open (output_unit, encoding='utf-8')
   print '(a)', str
   print '(a)', ucs4_'Unicode character: \U0000263B'
end program backslash_escape 

When using gfortran(1) build and run the executable with:

$ gfortran -fbackslash -o unicode unicode.f90
$ ./unicode
Unicode character: ☻

This is equivalent to BOZ literals, for instance:

str = ucs4_'Unicode character: ' // char(int(z'263B'), kind=ucs4)

Or, simply by using the decimal character code point:

str = ucs4_'Unicode character: ' // char(9787,ucs4)

Since these strings require an extension and may require specific compiler options using a standard method is preferred but it is important to be aware that code might be using C-like escape sequences, as building such code without the extension active can produce incorrect strings that can initially go unnoticed.

Summary

Several compilers allow for quoted strings to contain code point escape sequences. This is not standard and the syntax may therefore vary from processor to processor.

Note that if the code point values are above 255 decimal that the string being created must be of type ISO_10646, not ASCII.

Compiler support

gfortran

gfortran(1) has the -fbackslash compiler option:

    -

        "\x"nn, "\u"nnnn and "\U"nnnnnnnn (where each n is a hexadecimal
        digit) are translated into the Unicode characters corresponding to
        the specified code points.

flang new (the LLVM version)

C-style backslash escape sequences in quoted CHARACTER literals (but not Hollerith) [-fbackslash], including Unicode escapes with \U.

NAG Fortran

Compiler supports UCS-4 beginning in release 5.3 (as well as UCS-2 and JIS X 0213) but does not support Unicode escape sequences.

ifx

Intel Fortran does not support ISO_10646.