http_open.pl -- HTTP client library
This library defines http_open/3, which opens a URL as a Prolog stream. The functionality of the library can be extended by loading two additional modules that act as plugins:
- library(http/http_ssl_plugin)
- Loading this library causes http_open/3 to handle HTTPS connections.
Relevant options for SSL certificate handling are handed to
ssl_context/3. This plugin is loaded automatically if the scheme
https
is requested using a default SSL context. See the plugin for additional information regarding security. - library(http/http_cookie)
- Loading this library adds tracking cookies to http_open/3. Returned cookies are collected in the Prolog database and supplied for subsequent requests.
Here is a simple example to fetch a web-page:
?- http_open('http://www.google.com/search?q=prolog', In, []), copy_stream_data(In, user_output), close(In). <!doctype html><head><title>prolog - Google Search</title><script> ...
The example below fetches the modification time of a web-page. Note that Modified is '' (the empty atom) if the web-server does not provide a time-stamp for the resource. See also parse_time/2.
modified(URL, Stamp) :- http_open(URL, In, [ method(head), header(last_modified, Modified) ]), close(In), Modified \== '', parse_time(Modified, Stamp).
Then next example uses Google search. It exploits library(uri) to manage URIs, library(sgml) to load an HTML document and library(xpath) to navigate the parsed HTML. Note that you may need to adjust the XPath queries if the data returned by Google changes.
:- use_module(library(http/http_open)). :- use_module(library(xpath)). :- use_module(library(sgml)). :- use_module(library(uri)). google(For, Title, HREF) :- uri_encoded(query_value, For, Encoded), atom_concat('http://www.google.com/search?q=', Encoded, URL), http_open(URL, In, []), call_cleanup( load_html(In, DOM, []), close(In)), xpath(DOM, //h3(@class=r), Result), xpath(Result, //a(@href=HREF0, text), Title), uri_components(HREF0, Components), uri_data(search, Components, Query), uri_query_components(Query, Parts), memberchk(q=HREF, Parts).
An example query is below:
?- google(prolog, Title, HREF). Title = 'SWI-Prolog', HREF = 'http://www.swi-prolog.org/' ; Title = 'Prolog - Wikipedia', HREF = 'https://nl.wikipedia.org/wiki/Prolog' ; Title = 'Prolog - Wikipedia, the free encyclopedia', HREF = 'https://en.wikipedia.org/wiki/Prolog' ; Title = 'Pro-Log is logistiek dienstverlener m.b.t. vervoer over water.', HREF = 'http://www.pro-log.nl/' ; Title = 'Learn Prolog Now!', HREF = 'http://www.learnprolognow.org/' ; Title = 'Free Online Version - Learn Prolog ...
- user_agent(-Agent) is det[private]
- Default value for
User-Agent
, can be overruled using the optionuser_agent(Agent)
of http_open/3. - http_open(+URL, -Stream, +Options) is det
- Open the data at the HTTP server as a Prolog stream. URL is
either an atom specifying a URL or a list representing a
broken-down URL as specified below. After this predicate
succeeds the data can be read from Stream. After completion this
stream must be closed using the built-in Prolog predicate
close/1. Options provides additional options:
- authenticate(+Boolean)
- If
false
(defaulttrue
), do not try to automatically authenticate the client if a 401 (Unauthorized) status code is received. - authorization(+Term)
- Send authorization. See also http_set_authorization/2. Supported
schemes:
- basic(+User, +Password)
- HTTP Basic authentication.
- bearer(+Token)
- HTTP Bearer authentication.
- digest(+User, +Password)
- HTTP Digest authentication. This option is only provided if the plugin library(http/http_digest) is also loaded.
- connection(+Connection)
- Specify the
Connection
header. Default isclose
. The alternative isKeep-alive
. This maintains a pool of available connections as determined by keep_connection/1. Thelibrary(http/websockets)
usesKeep-alive, Upgrade
. Keep-alive connections can be closed explicitly using http_close_keep_alive/1. Keep-alive connections may significantly improve repetitive requests on the same server, especially if the IP route is long, HTTPS is used or the connection uses a proxy. - final_url(-FinalURL)
- Unify FinalURL with the final destination. This differs from the original URL if the returned head of the original indicates an HTTP redirect (codes 301, 302 or 303). Without a redirect, FinalURL is the same as URL if URL is an atom, or a URL constructed from the parts.
- header(Name, -AtomValue)
- If provided, AtomValue is unified with the value of the indicated field in the reply header. Name is matched case-insensitive and the underscore (_) matches the hyphen (-). Multiple of these options may be provided to extract multiple header fields. If the header is not available AtomValue is unified to the empty atom ('').
- headers(-List)
- If provided, List is unified with a list of Name(Value) pairs
corresponding to fields in the reply header. Name and Value
follow the same conventions used by the
header(Name,Value)
option. - method(+Method)
- One of
get
(default),head
,delete
,post
,put
orpatch
. Thehead
message can be used in combination with theheader(Name, Value)
option to access information on the resource without actually fetching the resource itself. The returned stream must be closed immediately.If
post(Data)
is provided, the default ispost
. - size(-Size)
- Size is unified with the integer value of
Content-Length
in the reply header. - version(-Version)
- Version is a pair
Major-Minor
, where Major and Minor are integers representing the HTTP version in the reply header. - range(+Range)
- Ask for partial content. Range is a term Unit(From,To),
where From is an integer and To is either an integer or
the atom
end
. HTTP 1.1 only supports Unit =bytes
. E.g., to ask for bytes 1000-1999, use the optionrange(bytes(1000,1999))
- redirect(+Boolean)
- If
false
(defaulttrue
), do not automatically redirect if a 3XX code is received. Must be combined withstatus_code(Code)
and one of the header options to read the redirect reply. In particular, withoutstatus_code(Code)
a redirect is mapped to an exception. - status_code(-Code)
- If this option is present and Code unifies with the HTTP status code, do not translate errors (4xx, 5xx) into an exception. Instead, http_open/3 behaves as if 200 (success) is returned, providing the application to read the error document from the returned stream.
- output(-Out)
- Unify the output stream with Out and do not close it. This can be used to upgrade a connection.
- timeout(+Timeout)
- If provided, set a timeout on the stream using set_stream/2.
With this option if no new data arrives within Timeout seconds
the stream raises an exception. Default is to wait forever
(
infinite
). - post(+Data)
- Issue a
POST
request on the HTTP server. Data is handed to http_post_data/3. - proxy(+Host:Port)
- Use an HTTP proxy to connect to the outside world. See also socket:proxy_for_url/3. This option overrules the proxy specification defined by socket:proxy_for_url/3.
- proxy(+Host, +Port)
- Synonym for
proxy(+Host:Port)
. Deprecated. - proxy_authorization(+Authorization)
- Send authorization to the proxy. Otherwise the same as the
authorization
option. - bypass_proxy(+Boolean)
- If
true
, bypass proxy hooks. Default isfalse
. - request_header(Name=Value)
- Additional name-value parts are added in the order of appearance to the HTTP request header. No interpretation is done.
- max_redirect(+Max)
- Sets the maximum length of a redirection chain. This is needed
for some IRIs that redirect indefinitely to other IRIs without
looping (e.g., redirecting to IRIs with a random element in them).
Max must be either a non-negative integer or the atom
infinite
. The default value is10
. - user_agent(+Agent)
- Defines the value of the
User-Agent
field of the HTTP header. Default isSWI-Prolog
.
The hook http:open_options/2 can be used to provide default options based on the broken-down URL. The option
status_code(-Code)
is particularly useful to query REST interfaces that commonly return status codes other than200
that need to be be processed by the client code. - autoload_https(+Parts) is det[private]
- If the requested scheme is https or wss, load the HTTPS plugin.
- send_rec_header(+StreamPair, -Stream, +Host, +RequestURI, +Parts, +Options) is det[private]
- Send header to Out and process reply. If there is an error or failure, close In and Out and return the error or failure.
- http_version(-Version:atom) is det[private]
- HTTP version we publish. We can only use 1.1 if we support chunked encoding.
- x_headers(+Options, +URI, +Out) is det[private]
- Emit extra headers from
request_header(Name=Value)
options in Options. - auth_header(+AuthOption, +Options, +HeaderName, +Out)[private]
- do_open(+HTTPVersion, +HTTPStatusCode, +HTTPStatusComment, +Header, +Options, +Parts, +Host, +In, -FinalIn) is det[private]
- Handle the HTTP status. If 200, we are ok. If a redirect, redo the open, returning a new stream. Else issue an error.
- redirect_limit_exceeded(+Options:list(compound), -Max:nonneg) is semidet[private]
- True if we have exceeded the maximum redirection length (default 10).
- redirect_loop(+Parts, +Options) is semidet[private]
- True if we are in a redirection loop. Note that some sites redirect once to the same place using cookies or similar, so we allow for two tries. In fact, we should probably test whether authorization or cookie headers have changed.
- redirect_options(+Options0, -Options) is det[private]
- A redirect from a POST should do a GET on the returned URI. This
means we must remove the
method(post)
andpost(Data)
options from the original option-list. - map_error_code(+HTTPCode, -PrologError) is semidet[private]
- Map HTTP error codes to Prolog errors.
- open_socket(+Address, -StreamPair, +Options) is det[private]
- Create and connect a client socket to Address. Options
- timeout(+Timeout)
- Sets timeout on the stream, after connecting the socket.
- parse_headers(+Lines, -Headers:list(compound)) is det[private]
- Parse the header lines for the
headers(-List)
option. Invalid header lines are skipped, printing a warning using pring_message/2. - return_final_url(+Options) is semidet[private]
- If Options contains
final_url(URL)
, unify URL with the final URL after redirections. - transfer_encoding_filter(+Lines, +In0, -In) is det[private]
- Install filters depending on the transfer encoding. If In0 is a stream-pair, we close the output side. If transfer-encoding is not specified, the content-encoding is interpreted as a synonym for transfer-encoding, because many servers incorrectly depend on this. Exceptions to this are content-types for which disable_encoding_filter/1 holds.
- http:disable_encoding_filter(+ContentType) is semidet[multifile]
- Do not use the
Content-encoding
asTransfer-encoding
encoding for specific values of ContentType. This predicate is multifile and can thus be extended by the user. - transfer_encoding(+Lines, -Encoding) is semidet[private]
- True if Encoding is the value of the
Transfer-encoding
header. - content_encoding(+Lines, -Encoding) is semidet[private]
- True if Encoding is the value of the
Content-encoding
header. - read_header(+In:stream, +Parts, -Version, -Code:int, -Comment:atom, -Lines:list) is det[private]
- Read the HTTP reply-header. If the reply is completely empty
an existence error is thrown. If the replied header is
otherwise invalid a 500 HTTP error is simulated, having the
comment
Invalid reply header
. - content_length(+Header, -Length:int) is semidet[private]
- Find the Content-Length in an HTTP reply-header.
- integer(-Int)//[private]
- Read 1 or more digits and return as integer.
- rest(-Atom:atom)//[private]
- Get rest of input as an atom.
- http_set_authorization(+URL, +Authorization) is det
- Set user/password to supply with URLs that have URL as prefix.
If Authorization is the atom
-
, possibly defined authorization is cleared. For example:?- http_set_authorization('http://www.example.com/private/', basic('John', 'Secret'))
- authorization(+URL, -Authorization) is semidet[private]
- True if Authorization must be supplied for URL.
- parse_url_ex(+URL, -Parts)[private]
- Parts: Scheme, Host, Port, User:Password, RequestURI (no fragment).
- parts_scheme(+Parts, -Scheme) is det[private]
- parts_uri(+Parts, -URI) is det[private]
- parts_request_uri(+Parts, -RequestURI) is det[private]
- parts_search(+Parts, -Search) is det[private]
- parts_authority(+Parts, -Authority) is semidet[private]
- iostream:open_hook(+Spec, +Mode, -Stream, -Close, +Options0, -Options) is semidet[multifile]
- Hook implementation that makes open_any/5 support
http
andhttps
URLs forMode == read
. - consider_keep_alive(+HeaderLines, +Parts, +Host, +Stream0, -Stream, +Options) is det[private]
- read_incomplete(+In, +Left) is semidet[private]
- If we have not all input from a Keep-alive connection, read the remainder if it is short. Else, we fail and close the stream.
- keep_connection(+Address) is semidet[private]
- Succeeds if we want to keep the connection open. We currently keep a maximum of 10 connections waiting and a maximum of 2 waiting for the same address. Connections older than 2 seconds are closed.
- http_close_keep_alive(+Address) is det
- Close all keep-alive connections matching Address. Address is of
the form Host:Port. In particular,
http_close_keep_alive(_)
closes all currently known keep-alive connections. - keep_alive_error(+Error)[private]
- Deal with an error from reusing a keep-alive connection. If the error is due to an I/O error or end-of-file, fail to backtrack over get_from_pool/2. Otherwise it is a real error and we thus re-raise it.
- http:open_options(+Parts, -Options) is nondet[multifile]
- This hook is used by the HTTP client library to define default
options based on the the broken-down request-URL. The following
example redirects all trafic, except for localhost over a proxy:
:- multifile http:open_options/2. http:open_options(Parts, Options) :- option(host(Host), Parts), Host \== localhost, Options = [proxy('proxy.local', 3128)].
This hook may return multiple solutions. The returned options are combined using merge_options/3 where earlier solutions overrule later solutions.
- http:write_cookies(+Out, +Parts, +Options) is semidet[multifile]
- Emit a
Cookie:
header for the current connection. Out is an open stream to the HTTP server, Parts is the broken-down request (see uri_components/2) and Options is the list of options passed to http_open. The predicate is called as if using ignore/1. - http:update_cookies(+CookieData, +Parts, +Options) is semidet[multifile]
- Update the cookie database. CookieData is the value of the
Set-Cookie
field, Parts is the broken-down request (see uri_components/2) and Options is the list of options passed to http_open.