Clicky

Fortran Wiki
hash table example

A module which is an example implementation of object oriented hash tables in Fortran 2003. The module is licensed under the LGPL. This module could be extended to be generic by use of the transfer function, or specialization through type extension. The methods are attached to the objects via type-bound procedures. Additional PRIVATE statements should be inserted or uncommented for the production code; the raw objects and methods have been left exposed here for testing purposes. This library and the corresponding test program compile with the Intel Fortran compiler, version 11.1.046. The program and library seem to be correct, but they have not been thoroughly tested.

Code

Library

This library was fashioned after the description of hash table data structures in The Practice of Programming.

! Module implementing an OO hash table (dictionary) in Fortran 2003.
! Compiles and runs with accompanying test program under the Intel 
! Fortran Compiler, version 11.1.046

! Copyright (c) Izaak Beekman 2010

    ! This program is free software: you can redistribute it and/or modify
    ! it under the terms of the GNU Lesser General Public License as published by
    ! the Free Software Foundation, either version 3 of the License, or
    ! (at your option) any later version.

    ! This program is distributed in the hope that it will be useful,
    ! but WITHOUT ANY WARRANTY; without even the implied warranty of
    ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    ! GNU Lesser General Public License for more details.

    ! You should have received a copy of the GNU Lesser General Public License
    ! along with this program.  If not, see <http://www.gnu.org/licenses/>.

MODULE hashtbl
  IMPLICIT NONE ! Use strong typing
  INTEGER, PARAMETER :: tbl_size = 50

  TYPE sllist
     TYPE(sllist), POINTER :: child => NULL()
     CHARACTER(len=:), ALLOCATABLE :: key, val
   CONTAINS
     PROCEDURE :: put  => put_sll
     PROCEDURE :: get  => get_sll
     PROCEDURE :: free => free_sll
  END TYPE sllist

  TYPE hash_tbl_sll
     TYPE(sllist), DIMENSION(:), ALLOCATABLE :: vec
     INTEGER                                 :: vec_len = 0
     LOGICAL                                 :: is_init = .FALSE.
   CONTAINS
     PROCEDURE :: init => init_hash_tbl_sll
     PROCEDURE :: put  => put_hash_tbl_sll
     PROCEDURE :: get  => get_hash_tbl_sll
     PROCEDURE :: free => free_hash_tbl_sll
  END TYPE hash_tbl_sll

  PUBLIC :: hash_tbl_sll

Here we see the two derived types the hash table is built from, and the type-bound procedures (methods) each of these data objects has1.

The first object, sllist is a singly linked list, with allocatable scalar CHARACTER type key and value components. (The string lengths may be allocated at runtime.) This type has 3 methods: 1. put to put (and create if necessary) the key-value pair into the appropriate list element. 1. get to retrieve the string (value) corresponding to the key, if the key-value pair exist. 1. free to completely destroy the linked list.

The second object is the hash table. It contains elements to store some meta-data like the vector length (important for hashing the keys) and the state of the object, as well as the array of singly linked lists, vec. The methods associated with this object are: 1. init to initialize the object by allocating vec to be a certain length, and keeping track of the object state and meta-data. 1. put to hash the key and store the value in the proper element of the appropriate linked list, creating it if necessary. 1. get to hash the key and retrieve the associated value, if the key-value pair exists. 1. free to destroy the hash table, freeing up memory and keeping track of meta-data and object state.

Below is a schematic of an instantiated hash table object, as implemented in this library: Hash table with singly linked list elements

The rest of the module, including all the procedures follows below.

CONTAINS

  RECURSIVE SUBROUTINE put_sll(list,key,val)
    CLASS(sllist),    INTENT(inout) :: list
    CHARACTER(len=*), INTENT(in)    :: key, val
    INTEGER                         :: keylen, vallen

    keylen = LEN(key)
    vallen = LEN(val)
    IF (ALLOCATED(list%key)) THEN
       IF (list%key /= key) THEN
          IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
          CALL put_sll(list%child,key,val)
       END IF
    ELSE
       IF (.NOT. ALLOCATED(list%key)) &
            ALLOCATE(CHARACTER(len=keylen) :: list%key)
       list%key = key
       IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
       ALLOCATE(CHARACTER(len=vallen) :: list%val)
       list%val = val
    END IF
  END SUBROUTINE put_sll


  RECURSIVE SUBROUTINE get_sll(list,key,val)
    CLASS(sllist),                 INTENT(in)    :: list
    CHARACTER(len=*),              INTENT(in)    :: key
    CHARACTER(len=:), ALLOCATABLE, INTENT(out)   :: val
    INTEGER                                      :: vallen

    vallen = 0
    IF (ALLOCATED(list%key) .AND. (list%key == key)) THEN
       vallen = LEN(list%val)
       IF (ALLOCATED(val)) DEALLOCATE(val)
       ALLOCATE(CHARACTER(len=vallen) :: val)
       val = list%val
    ELSE IF(ASSOCIATED(list%child)) THEN ! keep going
       CALL get_sll(list%child,key,val)
    ELSE ! At the end of the list, no key found
       IF (ALLOCATED(val)) DEALLOCATE(val) ! Exit indication
       RETURN
    END IF
  END SUBROUTINE get_sll


  RECURSIVE SUBROUTINE free_sll(list)
    CLASS(sllist), INTENT(inout) :: list
    IF (ASSOCIATED(list%child)) THEN
       CALL free_sll(list%child)
       DEALLOCATE(list%child)
    END IF
    list%child => NULL()
    IF (ALLOCATED(list%key)) DEALLOCATE(list%key)
    IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
  END SUBROUTINE free_sll

The above procedures are bound to the singly linked list objects. The first dummy argument is the object it is bound to, and is passed automatically whenever the procedure is invoked via the object. Since this is a singly linked list we can only traverse the list in one direction, and we can define the methods associated with the list relatively tersely using direct recursion.

  SUBROUTINE init_hash_tbl_sll(tbl,tbl_len)
    CLASS(hash_tbl_sll),   INTENT(inout) :: tbl
    INTEGER,     OPTIONAL, INTENT(in)    :: tbl_len

    IF (ALLOCATED(tbl%vec)) DEALLOCATE(tbl%vec)
    IF (PRESENT(tbl_len)) THEN
       ALLOCATE(tbl%vec(0:tbl_len-1))
       tbl%vec_len = tbl_len
    ELSE
       ALLOCATE(tbl%vec(0:tbl_size-1))
       tbl%vec_len = tbl_size
    END IF
    tbl%is_init = .TRUE.
  END SUBROUTINE init_hash_tbl_sll

  ! The first part of the hashing procedure using the string
  ! collating sequence
  ELEMENTAL FUNCTION sum_string(str) RESULT(sig)
    CHARACTER(len=*), INTENT(in)   :: str
    INTEGER                        :: sig
    CHARACTER, DIMENSION(LEN(str)) :: tmp
    INTEGER :: i

    FORALL (i=1:LEN(str))
       tmp(i) = str(i:i)
    END FORALL
    sig = SUM(ICHAR(tmp))
  END FUNCTION sum_string


  SUBROUTINE put_hash_tbl_sll(tbl,key,val)
    CLASS(hash_tbl_sll), INTENT(inout) :: tbl
    CHARACTER(len=*),    INTENT(in)    :: key, val
    INTEGER                            :: hash

    hash = MOD(sum_string(key),tbl%vec_len)
    CALL tbl%vec(hash)%put(key=key,val=val)
  END SUBROUTINE put_hash_tbl_sll


  SUBROUTINE get_hash_tbl_sll(tbl,key,val)
    CLASS(hash_tbl_sll),           INTENT(in)    :: tbl
    CHARACTER(len=*),              INTENT(in)    :: key
    CHARACTER(len=:), ALLOCATABLE, INTENT(out)   :: val
    INTEGER                                      :: hash

    hash = MOD(sum_string(key),tbl%vec_len)
    CALL tbl%vec(hash)%get(key=key,val=val)
  END SUBROUTINE get_hash_tbl_sll


  SUBROUTINE free_hash_tbl_sll(tbl)
    CLASS(hash_tbl_sll), INTENT(inout) :: tbl    
    INTEGER     :: i, low, high

    low  = LBOUND(tbl%vec,dim=1)
    high = UBOUND(tbl%vec,dim=1) 
    IF (ALLOCATED(tbl%vec)) THEN
       DO i=low,high
          CALL tbl%vec(i)%free()
       END DO
       DEALLOCATE(tbl%vec)
    END IF
    tbl%is_init = .FALSE.
  END SUBROUTINE free_hash_tbl_sll

END MODULE hashtbl

The procedures defined above operate on the hash table object. Because of the high degree of encapsulation and abstraction implementing these procedures is relatively easy; we can use the methods defined for the singly linked list.

Test Program

This test program exercises the library defined above, and tests it for correctness. This is not very thorough, but it is a decent start.

! Test program for module hashtbl implementing an OO hash table (dictionary) 
! in Fortran 2003. Compiles and runs with accompanying test program under 
! the Intel Fortran Compiler, version 11.1.046

! Copyright (c) Izaak Beekman 2010

    ! This program is free software: you can redistribute it and/or modify
    ! it under the terms of the GNU Lesser General Public License as published by
    ! the Free Software Foundation, either version 3 of the License, or
    ! (at your option) any later version.

    ! This program is distributed in the hope that it will be useful,
    ! but WITHOUT ANY WARRANTY; without even the implied warranty of
    ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    ! GNU Lesser General Public License for more details.

    ! You should have received a copy of the GNU Lesser General Public License
    ! along with this program.  If not, see <http://www.gnu.org/licenses/>.

PROGRAM test_hashtbl
  USE hashtbl
  IMPLICIT NONE
  TYPE(hash_tbl_sll)            :: table
  CHARACTER(len=:), ALLOCATABLE :: out
  INTEGER, parameter :: tbl_length = 100
  INTEGER            :: sum, i, rand_int1, rand_int2 ! 4 byte integer, hopefully
  REAL               :: rand
  CHARACTER(len=4)   :: rand_str1, rand_str2 ! each char should be 1 byte

  PRINT*, ' '
  PRINT*, 'This program is free software: you can redistribute it and/or &
       &modify it under the terms of the GNU Lesser General Public License&
       & as published by the Free Software Foundation, either version 3 of&
       & the License, or (at your option) any later version.'
  PRINT*, ' '
  PRINT*, 'This program is distributed in the hope that it will be useful,&
       & but WITHOUT ANY WARRANTY; without even the implied warranty of&
       & MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the &
       &GNU Lesser General Public License for more details.'
  PRINT*, ' '
  PRINT*, 'You should have received a copy of the GNU Lesser General Public &
       &License along with this program.  If not, see &
       &<http://www.gnu.org/licenses/>.'
  PRINT*, ' '

  CALL table%init(tbl_length)
  CALL table%put(key='first_name', val='John')
  PRINT*, 'Hash: ', MOD(sum_string('first_name'),tbl_length)
  CALL table%put(key='last_name', val='Smith')
  PRINT*, 'Hash: ', MOD(sum_string('last_name'),tbl_length)
  CALL table%put(key='birthday', val='July 30, 1964')
  PRINT*, 'Hash: ', MOD(sum_string('birthday'),tbl_length)
  CALL table%put(key='hair_color', val='brown')
  PRINT*, 'Hash: ', MOD(sum_string('hair_color'),tbl_length)
  CALL table%put(key='eye_color', val='brown')
  PRINT*, 'Hash: ', MOD(sum_string('eye_color'),tbl_length)
  CALL table%put(key='weight', val='213 lbs')
  PRINT*, 'Hash: ', MOD(sum_string('weight'),tbl_length)
  CALL table%put(key='height', val='6''3"')
  PRINT*, 'Hash: ', MOD(sum_string('height'),tbl_length)

  PRINT*, ' ' 

  CALL table%get(key='first_name',val=out)
  PRINT*, out
  CALL table%get('last_name',out)
  PRINT*, out
  CALL table%get('birthday',out)
  PRINT*, out
  CALL table%get('hair_color',out)
  PRINT*, out
  CALL table%get('eye_color',out)
  PRINT*, out
  CALL table%get('weight',out)
  PRINT*, out
  CALL table%get('height',out)
  PRINT*, out

  ! INCLUDE 'stress_test.f90'

  PRINT*, ' '

  sum = 0
  PRINT*, 'Indices of the hash table with content:'
  DO i = LBOUND(table%vec,dim=1), UBOUND(table%vec,dim=1)
     IF (ALLOCATED(table%vec(i)%key)) THEN 
        PRINT*, i
        sum = sum + 1
     END IF
  END DO
  PRINT*, 'Total used elements:', sum
  CALL table%free
  PRINT*, ' '
  STOP 0
END PROGRAM test_hashtbl

The stress test from the INCLUDE statement above, stress_test.f90, can be seen below. It should be noted that these tests fail. We can check that the reason of this failure comes from the RANDOM_NUMBER(.) function, which repeats a value of rand_int1 coupled with two different values of rand_int2. This fact can be seen by printing the list of nodes of table%vec(idx) when out/=rand_str2, where idx is a hash index of rand_str1.

  ! Included file for test_hashtbl.f90
  PRINT*, ' '

  sum = 0
  PRINT*, 'Mild stress test.'
  DO i = 1,2000 ! 4byte integers default on most systems
     CALL RANDOM_NUMBER(rand)
     rand_int1 = NINT(rand*1000)
     rand_str1 = TRANSFER(rand_int1,rand_str1)
     CALL RANDOM_NUMBER(rand)
     rand_int2 = NINT(rand*1000)
     rand_str2 = TRANSFER(rand_int2,rand_str2)
     CALL table%put(key=rand_str1, val=rand_str2)
     CALL table%get(key=rand_str1,val=out)
     IF (TRANSFER(out,rand_int2) /= rand_int2) THEN
        PRINT*, 'Error, i=',i,' key=',rand_int1,' Val=',rand_int2,' &
             &Out=',TRANSFER(out,rand_int2)
        sum = sum + 1
     END IF     
  END DO
  PRINT*, 'Number of errors:',sum

A call graph with profiling information can be generated. This can be seen below, but the profiling information is useless because the test program is not long enough (i.e. it doesn’t call the various subroutines as many times as is needed for meaningful profiling information to be produced).

Test program and library call graph

This image can be made using the build instructions in the following section, using gprof and gprof2dot.py.

Build Instructions

The following is a GNU Make makefile for building the library, building the test program, profiling the library and test program, building the call graph, and building the hash table data structure schematic.

# Make file for hash table (dictionary) example.  Makes a call graph with profiling information and
# a schematic of the data structure.

# Copyright (c) 2010 Izaak Beekman

    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU Lesser General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.

    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU Lesser General Public License for more details.

    # You should have received a copy of the GNU Lesser General Public License
    # along with this program.  If not, see <http://www.gnu.org/licenses/>.


# Compilers and flags
FC = ifort
FCFLAGS = -g -traceback -fno-omit-frame-pointer -p -fno-inline -fno-inline-functions #-ipo -O3
LDFLAGS = $(FCFLAGS) #-fast

FCSYNCHK = -syntax-only -warn

# Build rules
COMPILE.f90 = $(FC) $(FCFLAGS) -c $<

LINK.f90    = $(FC) $(LDFLAGS) -o $@ $^


# Portability macros
RM = rm
DOT = dot
PROFILER = gprof
GPROF2DOT = gprof2dot.py -n0 -e0 #Can download from: http://code.google.com/p/jrfonseca/wiki/Gprof2Dot

# Compilation pattern rules, may need to override builtins
# For GNU Make
MAKE = $(MAKE) -R
# to override

%.o: %.f90
	$(COMPILE.f90)

%: %.o
	$(LINK.f90)

%.mod: %.f90 # Just lowercase module names
	$(FC) $(FCSYNCHK) $<


# Patern rule for making PNGs from graphviz dot files
%.png: %.dot
	$(DOT) -Tpng < $< > $@

%.dot: %
	./$<  $(ARGS)
	$(PROFILER) ./$< | $(GPROF2DOT) > $@


# Default target, call graph of test_hashtbl
test_hashtbl.png:
.PRECIOUS: test_hashtbl.dot

.PRECIOUS:test_hashtbl
test_hashtbl: modhashtbl.o

call_graph.dot: test_hashtbl


.PHONY: test_hashtbl.mod
test_hashtbl.mod: modhashtbl.mod

# Build the image of our dictionary/hash-table data type
data_struct.png: 88x31-CC-by-sa.png


.PHONY: clean
clean:
	-$(RM) -f *.o *.mod

Comments

In the stress test case, there are two possible problems:

  1. Keys are generated by RANDOM_NUMBER(), so same keys could be put into the hash table. Repetivie keys are not well-treated in this module. In the subroutine put_sll(list, key, val), there is not code to treat the value with the same key appeared before.

  2. Another risk is the use of transfer function. Although the problem may not appear in this test case, it is important to carefully choose the size of the transferred type. Also, my personal experience suggests that it might be better to use transfer(key, your original key type) of the module in certian if statements. For example, in the module, we have:

    IF (list%key /= key) THEN
     IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
     CALL put_sll(list%child,key,val)
    END IF

    If your original key is an integer, it might be better to use following code:

    IF (transfer(list%key, integer) /= transfer(key, integer)) THEN
     IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
     CALL put_sll(list%child,key,val)
    END IF

  1. Finalization should also be added to these objects, but I am unsure of compiler support for this feature as of June 5th, 2010.