A module which is an example implementation of object oriented hash tables in Fortran 2003. The module is licensed under the LGPL. This module could be extended to be generic by use of the transfer function, or specialization through type extension. The methods are attached to the objects via type-bound procedures. Additional PRIVATE
statements should be inserted or uncommented for the production code; the raw objects and methods have been left exposed here for testing purposes. This library and the corresponding test program compile with the Intel Fortran compiler, version 11.1.046. The program and library seem to be correct, but they have not been thoroughly tested.
This library was fashioned after the description of hash table data structures in The Practice of Programming.
! Module implementing an OO hash table (dictionary) in Fortran 2003.
! Compiles and runs with accompanying test program under the Intel
! Fortran Compiler, version 11.1.046
! Copyright (c) Izaak Beekman 2010
! This program is free software: you can redistribute it and/or modify
! it under the terms of the GNU Lesser General Public License as published by
! the Free Software Foundation, either version 3 of the License, or
! (at your option) any later version.
! This program is distributed in the hope that it will be useful,
! but WITHOUT ANY WARRANTY; without even the implied warranty of
! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
! GNU Lesser General Public License for more details.
! You should have received a copy of the GNU Lesser General Public License
! along with this program. If not, see <http://www.gnu.org/licenses/>.
MODULE hashtbl
IMPLICIT NONE ! Use strong typing
INTEGER, PARAMETER :: tbl_size = 50
TYPE sllist
TYPE(sllist), POINTER :: child => NULL()
CHARACTER(len=:), ALLOCATABLE :: key, val
CONTAINS
PROCEDURE :: put => put_sll
PROCEDURE :: get => get_sll
PROCEDURE :: free => free_sll
END TYPE sllist
TYPE hash_tbl_sll
TYPE(sllist), DIMENSION(:), ALLOCATABLE :: vec
INTEGER :: vec_len = 0
LOGICAL :: is_init = .FALSE.
CONTAINS
PROCEDURE :: init => init_hash_tbl_sll
PROCEDURE :: put => put_hash_tbl_sll
PROCEDURE :: get => get_hash_tbl_sll
PROCEDURE :: free => free_hash_tbl_sll
END TYPE hash_tbl_sll
PUBLIC :: hash_tbl_sll
Here we see the two derived types the hash table is built from, and the type-bound procedures (methods) each of these data objects has1.
The first object, sllist
is a singly linked list, with allocatable scalar CHARACTER
type key and value components. (The string lengths may be allocated at runtime.) This type has 3 methods: 1. put
to put (and create if necessary) the key-value pair into the appropriate list element. 1. get
to retrieve the string (value) corresponding to the key, if the key-value pair exist. 1. free
to completely destroy the linked list.
The second object is the hash table. It contains elements to store some meta-data like the vector length (important for hashing the keys) and the state of the object, as well as the array of singly linked lists, vec
. The methods associated with this object are: 1. init
to initialize the object by allocating vec
to be a certain length, and keeping track of the object state and meta-data. 1. put
to hash the key
and store the value in the proper element of the appropriate linked list, creating it if necessary. 1. get
to hash the key
and retrieve the associated value, if the key-value pair exists. 1. free
to destroy the hash table, freeing up memory and keeping track of meta-data and object state.
Below is a schematic of an instantiated hash table object, as implemented in this library:
The rest of the module, including all the procedures follows below.
CONTAINS
RECURSIVE SUBROUTINE put_sll(list,key,val)
CLASS(sllist), INTENT(inout) :: list
CHARACTER(len=*), INTENT(in) :: key, val
INTEGER :: keylen, vallen
keylen = LEN(key)
vallen = LEN(val)
IF (ALLOCATED(list%key)) THEN
IF (list%key /= key) THEN
IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
CALL put_sll(list%child,key,val)
END IF
ELSE
IF (.NOT. ALLOCATED(list%key)) &
ALLOCATE(CHARACTER(len=keylen) :: list%key)
list%key = key
IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
ALLOCATE(CHARACTER(len=vallen) :: list%val)
list%val = val
END IF
END SUBROUTINE put_sll
RECURSIVE SUBROUTINE get_sll(list,key,val)
CLASS(sllist), INTENT(in) :: list
CHARACTER(len=*), INTENT(in) :: key
CHARACTER(len=:), ALLOCATABLE, INTENT(out) :: val
INTEGER :: vallen
vallen = 0
IF (ALLOCATED(list%key) .AND. (list%key == key)) THEN
vallen = LEN(list%val)
IF (ALLOCATED(val)) DEALLOCATE(val)
ALLOCATE(CHARACTER(len=vallen) :: val)
val = list%val
ELSE IF(ASSOCIATED(list%child)) THEN ! keep going
CALL get_sll(list%child,key,val)
ELSE ! At the end of the list, no key found
IF (ALLOCATED(val)) DEALLOCATE(val) ! Exit indication
RETURN
END IF
END SUBROUTINE get_sll
RECURSIVE SUBROUTINE free_sll(list)
CLASS(sllist), INTENT(inout) :: list
IF (ASSOCIATED(list%child)) THEN
CALL free_sll(list%child)
DEALLOCATE(list%child)
END IF
list%child => NULL()
IF (ALLOCATED(list%key)) DEALLOCATE(list%key)
IF (ALLOCATED(list%val)) DEALLOCATE(list%val)
END SUBROUTINE free_sll
The above procedures are bound to the singly linked list objects. The first dummy argument is the object it is bound to, and is passed automatically whenever the procedure is invoked via the object. Since this is a singly linked list we can only traverse the list in one direction, and we can define the methods associated with the list relatively tersely using direct recursion.
SUBROUTINE init_hash_tbl_sll(tbl,tbl_len)
CLASS(hash_tbl_sll), INTENT(inout) :: tbl
INTEGER, OPTIONAL, INTENT(in) :: tbl_len
IF (ALLOCATED(tbl%vec)) DEALLOCATE(tbl%vec)
IF (PRESENT(tbl_len)) THEN
ALLOCATE(tbl%vec(0:tbl_len-1))
tbl%vec_len = tbl_len
ELSE
ALLOCATE(tbl%vec(0:tbl_size-1))
tbl%vec_len = tbl_size
END IF
tbl%is_init = .TRUE.
END SUBROUTINE init_hash_tbl_sll
! The first part of the hashing procedure using the string
! collating sequence
ELEMENTAL FUNCTION sum_string(str) RESULT(sig)
CHARACTER(len=*), INTENT(in) :: str
INTEGER :: sig
CHARACTER, DIMENSION(LEN(str)) :: tmp
INTEGER :: i
FORALL (i=1:LEN(str))
tmp(i) = str(i:i)
END FORALL
sig = SUM(ICHAR(tmp))
END FUNCTION sum_string
SUBROUTINE put_hash_tbl_sll(tbl,key,val)
CLASS(hash_tbl_sll), INTENT(inout) :: tbl
CHARACTER(len=*), INTENT(in) :: key, val
INTEGER :: hash
hash = MOD(sum_string(key),tbl%vec_len)
CALL tbl%vec(hash)%put(key=key,val=val)
END SUBROUTINE put_hash_tbl_sll
SUBROUTINE get_hash_tbl_sll(tbl,key,val)
CLASS(hash_tbl_sll), INTENT(in) :: tbl
CHARACTER(len=*), INTENT(in) :: key
CHARACTER(len=:), ALLOCATABLE, INTENT(out) :: val
INTEGER :: hash
hash = MOD(sum_string(key),tbl%vec_len)
CALL tbl%vec(hash)%get(key=key,val=val)
END SUBROUTINE get_hash_tbl_sll
SUBROUTINE free_hash_tbl_sll(tbl)
CLASS(hash_tbl_sll), INTENT(inout) :: tbl
INTEGER :: i, low, high
low = LBOUND(tbl%vec,dim=1)
high = UBOUND(tbl%vec,dim=1)
IF (ALLOCATED(tbl%vec)) THEN
DO i=low,high
CALL tbl%vec(i)%free()
END DO
DEALLOCATE(tbl%vec)
END IF
tbl%is_init = .FALSE.
END SUBROUTINE free_hash_tbl_sll
END MODULE hashtbl
The procedures defined above operate on the hash table object. Because of the high degree of encapsulation and abstraction implementing these procedures is relatively easy; we can use the methods defined for the singly linked list.
This test program exercises the library defined above, and tests it for correctness. This is not very thorough, but it is a decent start.
! Test program for module hashtbl implementing an OO hash table (dictionary)
! in Fortran 2003. Compiles and runs with accompanying test program under
! the Intel Fortran Compiler, version 11.1.046
! Copyright (c) Izaak Beekman 2010
! This program is free software: you can redistribute it and/or modify
! it under the terms of the GNU Lesser General Public License as published by
! the Free Software Foundation, either version 3 of the License, or
! (at your option) any later version.
! This program is distributed in the hope that it will be useful,
! but WITHOUT ANY WARRANTY; without even the implied warranty of
! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
! GNU Lesser General Public License for more details.
! You should have received a copy of the GNU Lesser General Public License
! along with this program. If not, see <http://www.gnu.org/licenses/>.
PROGRAM test_hashtbl
USE hashtbl
IMPLICIT NONE
TYPE(hash_tbl_sll) :: table
CHARACTER(len=:), ALLOCATABLE :: out
INTEGER, parameter :: tbl_length = 100
INTEGER :: sum, i, rand_int1, rand_int2 ! 4 byte integer, hopefully
REAL :: rand
CHARACTER(len=4) :: rand_str1, rand_str2 ! each char should be 1 byte
PRINT*, ' '
PRINT*, 'This program is free software: you can redistribute it and/or &
&modify it under the terms of the GNU Lesser General Public License&
& as published by the Free Software Foundation, either version 3 of&
& the License, or (at your option) any later version.'
PRINT*, ' '
PRINT*, 'This program is distributed in the hope that it will be useful,&
& but WITHOUT ANY WARRANTY; without even the implied warranty of&
& MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the &
&GNU Lesser General Public License for more details.'
PRINT*, ' '
PRINT*, 'You should have received a copy of the GNU Lesser General Public &
&License along with this program. If not, see &
&<http://www.gnu.org/licenses/>.'
PRINT*, ' '
CALL table%init(tbl_length)
CALL table%put(key='first_name', val='John')
PRINT*, 'Hash: ', MOD(sum_string('first_name'),tbl_length)
CALL table%put(key='last_name', val='Smith')
PRINT*, 'Hash: ', MOD(sum_string('last_name'),tbl_length)
CALL table%put(key='birthday', val='July 30, 1964')
PRINT*, 'Hash: ', MOD(sum_string('birthday'),tbl_length)
CALL table%put(key='hair_color', val='brown')
PRINT*, 'Hash: ', MOD(sum_string('hair_color'),tbl_length)
CALL table%put(key='eye_color', val='brown')
PRINT*, 'Hash: ', MOD(sum_string('eye_color'),tbl_length)
CALL table%put(key='weight', val='213 lbs')
PRINT*, 'Hash: ', MOD(sum_string('weight'),tbl_length)
CALL table%put(key='height', val='6''3"')
PRINT*, 'Hash: ', MOD(sum_string('height'),tbl_length)
PRINT*, ' '
CALL table%get(key='first_name',val=out)
PRINT*, out
CALL table%get('last_name',out)
PRINT*, out
CALL table%get('birthday',out)
PRINT*, out
CALL table%get('hair_color',out)
PRINT*, out
CALL table%get('eye_color',out)
PRINT*, out
CALL table%get('weight',out)
PRINT*, out
CALL table%get('height',out)
PRINT*, out
! INCLUDE 'stress_test.f90'
PRINT*, ' '
sum = 0
PRINT*, 'Indices of the hash table with content:'
DO i = LBOUND(table%vec,dim=1), UBOUND(table%vec,dim=1)
IF (ALLOCATED(table%vec(i)%key)) THEN
PRINT*, i
sum = sum + 1
END IF
END DO
PRINT*, 'Total used elements:', sum
CALL table%free
PRINT*, ' '
STOP 0
END PROGRAM test_hashtbl
The stress test from the INCLUDE
statement above, stress_test.f90, can be seen below. It should be noted that these tests fail. We can check that the reason of this failure comes from the RANDOM_NUMBER(.)
function, which repeats a value of rand_int1
coupled with two different values of rand_int2
. This fact can be seen by printing the list of nodes of table%vec(idx)
when out/=rand_str2
, where idx
is a hash index of rand_str1
.
! Included file for test_hashtbl.f90
PRINT*, ' '
sum = 0
PRINT*, 'Mild stress test.'
DO i = 1,2000 ! 4byte integers default on most systems
CALL RANDOM_NUMBER(rand)
rand_int1 = NINT(rand*1000)
rand_str1 = TRANSFER(rand_int1,rand_str1)
CALL RANDOM_NUMBER(rand)
rand_int2 = NINT(rand*1000)
rand_str2 = TRANSFER(rand_int2,rand_str2)
CALL table%put(key=rand_str1, val=rand_str2)
CALL table%get(key=rand_str1,val=out)
IF (TRANSFER(out,rand_int2) /= rand_int2) THEN
PRINT*, 'Error, i=',i,' key=',rand_int1,' Val=',rand_int2,' &
&Out=',TRANSFER(out,rand_int2)
sum = sum + 1
END IF
END DO
PRINT*, 'Number of errors:',sum
A call graph with profiling information can be generated. This can be seen below, but the profiling information is useless because the test program is not long enough (i.e. it doesn’t call the various subroutines as many times as is needed for meaningful profiling information to be produced).
This image can be made using the build instructions in the following section, using gprof and gprof2dot.py.
The following is a GNU Make makefile for building the library, building the test program, profiling the library and test program, building the call graph, and building the hash table data structure schematic.
# Make file for hash table (dictionary) example. Makes a call graph with profiling information and
# a schematic of the data structure.
# Copyright (c) 2010 Izaak Beekman
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
# You should have received a copy of the GNU Lesser General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# Compilers and flags
FC = ifort
FCFLAGS = -g -traceback -fno-omit-frame-pointer -p -fno-inline -fno-inline-functions #-ipo -O3
LDFLAGS = $(FCFLAGS) #-fast
FCSYNCHK = -syntax-only -warn
# Build rules
COMPILE.f90 = $(FC) $(FCFLAGS) -c $<
LINK.f90 = $(FC) $(LDFLAGS) -o $@ $^
# Portability macros
RM = rm
DOT = dot
PROFILER = gprof
GPROF2DOT = gprof2dot.py -n0 -e0 #Can download from: http://code.google.com/p/jrfonseca/wiki/Gprof2Dot
# Compilation pattern rules, may need to override builtins
# For GNU Make
MAKE = $(MAKE) -R
# to override
%.o: %.f90
$(COMPILE.f90)
%: %.o
$(LINK.f90)
%.mod: %.f90 # Just lowercase module names
$(FC) $(FCSYNCHK) $<
# Patern rule for making PNGs from graphviz dot files
%.png: %.dot
$(DOT) -Tpng < $< > $@
%.dot: %
./$< $(ARGS)
$(PROFILER) ./$< | $(GPROF2DOT) > $@
# Default target, call graph of test_hashtbl
test_hashtbl.png:
.PRECIOUS: test_hashtbl.dot
.PRECIOUS:test_hashtbl
test_hashtbl: modhashtbl.o
call_graph.dot: test_hashtbl
.PHONY: test_hashtbl.mod
test_hashtbl.mod: modhashtbl.mod
# Build the image of our dictionary/hash-table data type
data_struct.png: 88x31-CC-by-sa.png
.PHONY: clean
clean:
-$(RM) -f *.o *.mod
In the stress test case, there are two possible problems:
Keys are generated by RANDOM_NUMBER(), so same keys could be put
into the hash table. Repetivie keys are not well-treated in this module. In the subroutine put_sll(list, key, val)
, there is not code to treat the value with the same key appeared before.
Another risk is the use of transfer function. Although the problem may not appear in this test case, it is important to carefully choose the size of the transferred type. Also, my personal experience suggests that it might be better to use transfer(key, your original key type) of the module in certian if
statements. For example, in the module, we have:
IF (list%key /= key) THEN
IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
CALL put_sll(list%child,key,val)
END IF
If your original key is an integer, it might be better to use following code:
IF (transfer(list%key, integer) /= transfer(key, integer)) THEN
IF ( .NOT. ASSOCIATED(list%child) ) ALLOCATE(list%child)
CALL put_sll(list%child,key,val)
END IF
Finalization should also be added to these objects, but I am unsure of compiler support for this feature as of June 5th, 2010. ↩