rdf_litindex.pl -- Search literals
This module finds literals of the RDF database based on words, stemming and sounds like (metaphone). The normal user-level predicate is
- rdf_set_literal_index_option(+Options:list)
- Set options for the literal package. Currently defined options
- verbose(Bool)
- If
true
, print progress messages while building the index tables. - index_threads(+Count)
- Number of threads to use for initial indexing of literals
- index(+How)
- How to deal with indexing new literals. How is one of
self
(execute in the same thread),thread(N)
(execute in N concurrent threads) ordefault
(depends on number of cores). - stopgap_threshold(+Count)
- Add a token to the dynamic stopgap set if it appears in more than Count literals. The default is 50,000.
- rdf_find_literal(+Spec, -Literal) is nondet
- rdf_find_literals(+Spec, -Literals) is det
- Find literals in the RDF database matching Spec. Spec is defined
as:
Spec ::= and(Spec,Spec) Spec ::= or(Spec,Spec) Spec ::= not(Spec) Spec ::= sounds(Like) Spec ::= stem(Like) % same as stem(Like, en) Spec ::= stem(Like, Lang) Spec ::= prefix(Prefix) Spec ::= between(Low, High) % Numerical between Spec ::= ge(High) % Numerical greater-equal Spec ::= le(Low) % Numerical less-equal Spec ::= Token
sounds(Like)
andstem(Like)
both map to a disjunction. First we compile the spec to normal form: a disjunction of conjunctions on elementary tokens. Then we execute all the conjunctions and generate the union using ordered-set algorithms.Stopgaps are ignored. If the final result is only a stopgap, the predicate fails.
- rdf_token_expansions(+Spec, -Extensions)
- Determine which extensions of a token contribute to finding literals.
- rdf_delete_literal_index(+Type)
- Fully delete a literal index
- rdf_tokenize_literal(+Literal, -Tokens) is semidet
- Tokenize a literal. We make this hookable as tokenization is generally domain dependent.
- rdf_stopgap_token(-Token) is nondet
- True when Token is a stopgap token. Currently, this implies one
of:
exclude_from_index(token, Token)
is truedefault_stopgap(Token)
is true- Token is an atom of length 1
- Token was added to the dynamic stopgap token set because it appeared in more than stopgap_threshold literals.
- rdf_literal_index(+Type, -Index) is det
- True when Index is a literal map containing the index of Type.
Type is one of:
- token
- Tokens are basically words of literal values. See
rdf_tokenize_literal/2. The
token
map maps tokens to full literal texts. - stem
- Index of stemmed tokens. If the language is available, the
tokens are stemmed using the matching snowball stemmer.
The
stem
map maps stemmed to full tokens. - metaphone
- Phonetic index of tokens. The
metaphone
map maps phonetic keys to tokens.