Rationale ========= Since introduction of the WAL-based replication into the PostgreSQL, it is possible to create high-availability and load-balancing clusters. However, there is no support for failover in the client libraries. So, only way to provide transparent for client application failover is IP address migration. This approach has some limitation, i.e. it requires that master and backup servers reside in the same subnet or may not be feasible for other reasons. It may seem that dynamic DNS update allows to overcome this limitation. But DNS information is typically heavily cached, and it is scarcely possible that DNS changes would propagate in the period of few seconds. Moreover, it might be quite difficult to allow dynamic DNS updates by failover servers due to political reasons. Commercial RDBMS, such as Oracle, employ more flexible approach. They allow to specify multiple servers in the connect string, so if primary server is not available, client library tries to connect to other ones. PostgreSQL JDBC driver already implements such feature and allows to specify multiple servers in the URL. This approach allows to use geographically distributed failover clusters and also is a cheap way to implement load-balancing (which is not possible with IP address migration). Implementation of this feature at the client library level would provide uniform interface, usable by all applications and save much work compared with the implementation on the application level. Proposed change =============== Allow to specify multiple hosts in the libpq connect string. If multiple hosts are specified, connection could be attempted to these server either in sequential order, in random order or simultaneously, then choosing one connection to use and closing other. Each variant has its own advantages and drawbacks and each can be preferred in some situations. Attempt to connect servers in order and trying the second one only if first fails minimizes load on the server, but can result in quite long connect times (if server is powered off completely and connection attempt fails only when connect\_timeout is reached). Random order allows to balance load to multiple servers. Unfortunately, load-balancing has very limited applications on Postgresql, because multimaster replication is not supported yet. But there can be situations where multiple readonly clients exists, and they can be loadbalanced among pool of hot standby servers. Also there are some proprietary multimaster solutions, which users can benefit from loadbalancing on the client library level. Simultaneous connection allows to minimize connect time in case master server fails, and one of standby servers takes over. But it places extra load on standby server, if it does accept connections. In this case each client connects to all servers at once, then checks read-only status of the connections and closes connections to all standbys, leaving connection to the master operable. It is evident that this load should be bearable by standby servers, because it is exactly same amount of the connections, as master accepts, and standby servers should be able to take over master role at any time. If cluster administrator don't want standby server to carry this load, one can make standby reject connection either by postgresql configuration or using OS firewall. This approach has evident drawback that standby to master promotion procedure have to change this configuration too and make server accept client connection. Syntax ------ Libpq connect string can be either set of the keyword=value pairs or an URL. In the first form it can be just allowed to specify keyword host multiple times. "host=main-server host=standby1 host=standby2 port=5432 dbname=database" But this approach has drawback, that it doesn't allow to specify different ports for different servers. In order to allow different port on each cluster node, it is allowed to specify port in the host parameter, using colon as separator (as it is done in the URL) "host=main-server:5432 host=standby1 host=standby2 port=6453" In this example main-server use port 5432, and all standbys use port, specified in the `port' parameter syntax: In this form connect procedure should walk over the lists of hosts and ports in parallel. In the second form host can be specified either in the first part of URL or in the query parameters. postgresql://user@host/database postgresql:///database?host=hostname&user=username If host is specified as a parameter, it is also possible to allow multiple host parameters without breaking existing syntax. But we propose syntax, compatible with postgresql JDBC: postgresql://host1:port1,host2:port2/database In order to implement three modes of operations described above, additional parameters should be added: *. hostorder=keyword, where keyword is one of sequential, parallel and random. Default is sequential. Sequential is try to connect first server in the connect string, if it fails, use second and so on. Parallel - try to connect all servers simultaneously and use first one which would be able to serve (by default, we need read-write connection, so after successful authentication we execute `pg_in_recovery()' sql function operator and consider connection successful only if it returns false) *. loadBalanceHosts = boolean (as in JDBC and with same semantics). If true, than connection to multiple hosts is made in random order. For the compatibility with other parameters, as connect\_timout, this parameter should probably have an alias load\_balalce\_hosts. Equivalent of hostorder=random *. readonly = boolean. If false, then connection is assumed to be successful only after `select pg_in_recovery()` statement is executed and returned False. Default false (i.e. any server which accepts connection assumed to be usable). *. failover\_timeout = seconds. Number of seconds to retry connection if no usable master server found. Must be sufficient to allow cluster watchdog software to complete promotion of one of standbys to master. Support for this syntax extensions is added to the PQconnectdb, PQconnectdbParams, PQConnectStart and PQConnectStartParams, but not PQsetdb/PQsetdblogin functions. Behavior -------- If PQconnectdb encounters connect string with multiple hosts specified, it attempts to establish connection with all these hosts according the hostorder parameter value. If hostorder is sequential (the default) it tries hosts in order they are specified in the command line, and moves to next one only if connection to previous is failed. Be sure to specify sensible connect_timeout. If hostorder is random, tries hosts in random order. If hostorder is parallel, then tries to connect all host simultaneously and uses that connection which is fully established first. By default. If the load\_balance\_hosts parameter is true, it is equvalent of hostorder=random. If the parameter readonly is false, after authenticating with server it executes show transaction\_read\_only, to find out whether current connection is to the master or to the hot standby, and connection is considered successful only if server allows read write transactions. This allows to have clients which write to the database and clients which perform read-only access. Read-only clients would be load-balanced between the master and slave servers, and read-write clients connect only to the master (whichever server has this role at the moment of connection). If no usable server found, waits for failover\_delay seconds and then repeats the procedure until failover\_timeout seconds passed. Implementation notes ==================== Information of the alternate servers should be stored in the PGconn structure. There is already room for storing several server IPs if DNS name resolves into multiple IPs. Function PQreset should be able to take advantage of new syntax and possibly open connection to the new master, if failover occurred during lifetime of the connection. Most of changes should reside in the PQconnectPoll state machine. This function is used by all public connect functions. Probably couple of new states should be added to ConnStatusType enum. There also should be changes in the option parsing and in the pg_conn structure to incorporate multiple hosts data. There should be new interface function, to allow asynchroneous applications to get all the currently open sockets to integrate in the event loop. If we can manage to avoid simultaneous connections by the applications, which don't call this function, we probably can achieve 100% compatibility with existing applications, as it is already documented feature of PGsocket, that socket descriptor can change during connection setup phase. Possible drawbacks ================== Compatibility ------------- Proposed patch requires no modifications to the server or protocol, and modification of synchronous function (PQconnectdb, PQconnectdbParams) doesn't introduce incompatible changes to the client library. Even if connect string with multiple host would be erroneously used with version of libpq, which do not support this feature, it is not an error. It just use last host specified in the connect string. There could be some compatibility problems with asynchronous connections created with PQConnectStart functions. Problem is that we are trying to establish several connections at once, and there are several sockets which should be integrated into application event loop. Even if we would try servers in some particular order (such as randomized order during load balancing), file descriptor of socket can change during execution PQConnectPoll, and existing applications are not prepared to it. Performance impact ------------------ Performance impact seems to be negligible. 1. If connect string contain only one host, the only complication is the maintenance of the data structure, which possible can hold more than one host name. Connection process itself would not be affected. 2. If there is pure high-availability cluster, i.e. standby servers do not accept client connections on the specified port, there is no extra load on standby servers, and almost no (only several unsuccessful connect calls) on client. 3. If there is high-availability cluster with standby servers accepting connection, then these standby servers have to process same amount of connection as master does, and client has to handle as many connections as there are servers in the connect string, executing show transaction\_read\_only on all of them. It would insufficiently increase connect time. 4. If there is load balancing cluster, there is no performance impacts for read-only client, but each read-write client causes standby servers to process extra connection to the point where server can report read-only state of transaction (i.e. including SSL handshake and postgresql authentication). Typically, in the situation where read-only clients should be load-balanced using this feature, there are much more read-only clients, than read-write ones. So some extra load related with read-write connection seems to be justified by simplification of client configuration. -- Victor Wagner <vitus@wagner.pp.ru>