LibDocument

image-alt

Document and Trie Processing.


Version: 0.1.1

Functions and data structures for CRUD operations on tries (prefix trees). More…

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/queue.h>
#include <unistd.h>

Go to the source code of this file.

Classes

struct  ldoc_trie_anno_t
 Trie node annotation. More...
 
union  ldoc_trie_char_t
 Character pointer types. More...
 
struct  ldoc_trie_nde_t
 Trie node. More...
 
struct  ldoc_trie_nde_arr_t
 Trie-node array. More...
 
struct  ldoc_trie_t
 A trie (prefix tree). More...
 

Macros

#define LDOC_TRIE_NDS_ARR_INIT   64
 Initial size of a trie-node array. More…
 
#define LDOC_TRIE_NDS_ARR_INC   16
 Allocation increment for trie-node arrays. More…
 
#define LDOC_TRIE_RES_CHR   '^'
 Reserved character that cannot appear in a trie. More…
 

Typedefs

typedef struct ldoc_trie_anno_t ldoc_trie_anno_t
 Trie node annotation. More…
 
typedef struct ldoc_trie_nde_t ldoc_trie_nde_t
 Trie node. More…
 
typedef struct ldoc_trie_nde_arr_t ldoc_trie_nde_arr_t
 Trie-node array. More…
 
typedef struct ldoc_trie_t ldoc_trie_t
 A trie (prefix tree). More…
 

Enumerations

ldoc_trie_ptr_t {
  EN_ALPH, EN_ALPHNUM, ASCII, UTF16,
  UTF32
}
 Type of pointer used to access descendant trie nodes. More...
 
ldoc_trie_alloc_t { NDE_ROOT, NDE_EMPTY, NDE_ANNO }
 Trie-node type. More...
 

Functions

ldoc_trie_tldoc_trie_new ()
 Creates a new trie object (including trie root node). More…
 
void ldoc_trie_free (ldoc_trie_t *trie)
 Frees the memory of a trie object. More…
 
ldoc_trie_nde_arr_tldoc_trie_nde_arr_new ()
 Creates a new trie-node array. More…
 
void ldoc_trie_nde_arr_free (ldoc_trie_nde_arr_t *arr)
 Frees a trie-node array. More…
 
void ldoc_trie_nde_arr_appnd (ldoc_trie_nde_arr_t *arr, ldoc_trie_nde_t *nde)
 Appends a node at the end of a trie-node array. More…
 
void ldoc_trie_add (ldoc_trie_t *trie, const char *str, ldoc_trie_ptr_t tpe, ldoc_trie_anno_t anno)
 Adds a string to a trie. More…
 
ldoc_trie_nde_tldoc_trie_remove (ldoc_trie_t *trie, const char *str)
 Remove a string (prefix) from a trie. More…
 
ldoc_trie_nde_tldoc_trie_lookup (ldoc_trie_t *trie, const char *string, bool prefixes)
 Lookup a string in a trie. More…
 
char * ldoc_trie_collect (ldoc_trie_t *trie, const char *sep)
 Create a list of all strings in a trie. More…
 
void ldoc_trie_dmp (ldoc_trie_t *trie)
 Produce debugging output of a trie on STDOUT. More…
 

Variables

ldoc_trie_anno_t LDOC_TRIE_NDE_NULL
 Null pointer for trie nodes. More…
 
ldoc_trie_anno_t LDOC_TRIE_ANNO_NULL
 Null pointer for trie annotations. More…
 

Detailed Description

Functions and data structures for CRUD operations on tries (prefix trees).

Macro Definition Documentation

#define LDOC_TRIE_NDS_ARR_INC   16

Allocation increment for trie-node arrays.

Used to increase the memory in search results.

#define LDOC_TRIE_NDS_ARR_INIT   64

Initial size of a trie-node array.

Used to allocate memory for search results.

#define LDOC_TRIE_RES_CHR   '^'

Reserved character that cannot appear in a trie.

Typedef Documentation

Trie node annotation.

Trie-node array.

A trie-node array is filled up to the index denoted by wptr and its allocated memory is sufficient to hold up to max entries. Indexes from wptr to max are unassigned and should not be accessed.

Trie node.

typedef struct ldoc_trie_t ldoc_trie_t

A trie (prefix tree).

Enumeration Type Documentation

Trie-node type.

A trie has a single unique root (NDE_ROOT) and its descendants are either empty nodes (NDE_EMPTY) or annotated nodes (NDE_ANNO). Empty nodes can occur on a path from the root to an annotated node; they basically denote that a traversed prefix is not a prefix that was added to a trie. Annotated nodes denote prefixes that have been explicitly added to a trie.

Enumerator

NDE_ROOT

Root node.

NDE_EMPTY

An empty node which is on a path to an annotated node.

NDE_ANNO

An annotated node.

Type of pointer used to access descendant trie nodes.

A trie can be composed of nodes with multiple kinds of pointer types. Mixing of trie nodes with different pointer types can be useful when optimizing for lookup or insertion performance and/or memory usage.

Enumerator

EN_ALPH

Characters of the English alphabet ("a" to "z", lower case; " ", space).

EN_ALPHNUM

Characters of the English alphabet ("a" to "z", lower case; " ", space), followed by numerical characters ("0" to "9").

ASCII

ASCII characters, except LDOC_TRIE_RES_CHR.

UTF16

UTF16, except LDOC_TRIE_RES_CHR.

Not implemented yet!

UTF32

UTF32, except LDOC_TRIE_RES_CHR.

Not implemented yet!

Function Documentation

void ldoc_trie_add (ldoc_trie_t *trie, const char *str, ldoc_trie_ptr_t tpe, ldoc_trie_anno_t anno)

Adds a string to a trie.

Parameters and Return Value
trie
Trie object to which the string str is being added.
str
String that is being added to the trie object trie.
tpe
If nodes need to be allocated to store str in trie, then they will be of type tpe.
anno
Annotation that is being stored along with the string str.

char* ldoc_trie_collect (ldoc_trie_t *trie, const char *sep)

Create a list of all strings in a trie.

Parameters and Return Value
trie
Trie whose strings should be listed.
sep
Separator to use between strings in the list.
Returns
List of all strings in trie separated by sep.

void ldoc_trie_dmp (ldoc_trie_t *trie)

Produce debugging output of a trie on STDOUT.

Parameters and Return Value
trie
Trie that should be output on STDOUT.

void ldoc_trie_free (ldoc_trie_t *trie)

Frees the memory of a trie object.

Releases the memory of the trie object, including all nodes, but it does not release the memory of annotations.

Parameters and Return Value
trie
Trie whose memory is being released.

ldoc_trie_nde_t* ldoc_trie_lookup (ldoc_trie_t *trie, const char *string, bool prefixes)

Lookup a string in a trie.

Parameters and Return Value
trie
Trie object to search for the presence of string.
string
String to search for in trie trie.
prefixes
When true, then return the node in trie at which the search for string ended. This indicates whether string is a prefix of another string in trie, but string itself was not added to trie.
Returns
Trie node that matches string (see also prefixes description), or LDOC_TRIE_NDE_NULL otherwise.

void ldoc_trie_nde_arr_appnd (ldoc_trie_nde_arr_t *arr, ldoc_trie_nde_t *nde)

Appends a node at the end of a trie-node array.

If there is insufficient space available in the array, then the array's size will be increased by LDOC_TRIE_NDS_ARR_INC.

Parameters and Return Value
arr
Array to which the node nde is appended.
nde
Node that is being appended to array arr.

void ldoc_trie_nde_arr_free (ldoc_trie_nde_arr_t *arr)

Frees a trie-node array.

Note: This will node free the memory allocated for the individual nodes in the array.

Parameters and Return Value
arr
Trie-node array whose memory is being released.

ldoc_trie_nde_arr_t* ldoc_trie_nde_arr_new ()

Creates a new trie-node array.

The initial size of the array will be LDOC_TRIE_NDS_ARR_INIT.

Parameters and Return Value
Returns
New trie-node array.

ldoc_trie_t* ldoc_trie_new ()

Creates a new trie object (including trie root node).

Parameters and Return Value
Returns
A new trie object.

ldoc_trie_nde_t* ldoc_trie_remove (ldoc_trie_t *trie, const char *str)

Remove a string (prefix) from a trie.

Parameters and Return Value
trie
Trie from which the string str is to be removed.
str
String that is being removed from trie.
Returns
Trie node that was removed, or LDOC_TRIE_NDE_NULL if str was not in trie.

Variable Documentation

ldoc_trie_anno_t LDOC_TRIE_ANNO_NULL

Null pointer for trie annotations.

ldoc_trie_anno_t LDOC_TRIE_NDE_NULL

Null pointer for trie nodes.