Skip to content

Installing and running HerdDB

Enrico Olivelli edited this page Sep 3, 2020 · 7 revisions

This page describes HerdDB deploying modes and how to configure the JDBC client for each one. If you want to try out HerdDB please refer to the Getting Started guide.

Server deployment modes

HerdDB was primarily designed to run server and client inside the same JVM application (embedded mode) but it also support a "standalone mode" with external client connecting to a separate server. This started mainly due to the requirement to support standard reference benchmarks (like YCSB), but it's a configuration supported also for production.

This means that you can run the server as you would do with MySQL, PostgreSQL or any other database, but you can also deploy the server inside the same process of the client application, like SQLLite, RocksDB, BerkeleyDB.

Note that HerdDB can be deploy in a fully replicated environment both in "embedded mode" and "standalone mode".

Notes about replication

HerdDB supports replication for these two main use-cases:

  • high-availability: replicate data between nodes to prevent the loss of data in case of machine failure
  • scalability: scale up performances by adding multiple nodes

Beware that for the "scalability" point to be true you must remember that HerdDB scales out evenly only if you have many tablespaces because all of the data of a single tablespace must reside on a single node. Having a single tablespace with ten replicas won't buy you more "performances", but only more copies of the data. on the contrary, the more tablespaces you have the more you can achieve a good spreading of load across the nodes.

Recipes

Running server for tests

You can run HerdDB for unit tests of your java application. In this case data won't be persisted, and you won't be able to handle databases bigger than the available Heap memory.

You are going to use a JDBC URL like jdbc:herddb:local:dbname or HerdDBEmbeddedDataSource.

Standalone mode

You can run a single instance server, as a standalone process. This way data, metadata and the journal reside on a local disk to the server. You can use different directories for metadata, data and journal.

The JDBC URL will look like: `jdbc:herddb:server:localhost:7000'. For configuration of security features (SASL, TLS...) please referer to the documentation.

In order to start and manage the standalone service just unzip the distribution package, configure conf/server.properties and run bin/service server start

The default port is 7000 and the CLI (bin/herddb-cli.sh) will by default connect to that port, without encryption and with default simple authentication with default username/password.

Cluster mode - classic

In order to run in replicated mode you have to setup a ZooKeeper cluster and provide the connection string to the ZooKeeper.

In replicated mode you can see:

  • metadata stored on ZooKeeper
  • service discovery data on ZooKeeper
  • journal on BookKeeper
  • data replicated to local disks of HerdDB servers

For each tablespace you have a set of nodes (replicas) that keep a local copy of the data. Each write from the client flows to the 'leader' node of the tablespace, that writes the change to the journal using BookKeeper. BookKeeper replicates the entry to a number of nodes (bookies)

Cluster mode - diskless flavour

In order to run in replicated mode you have to setup a ZooKeeper cluster and provide the connection string to the ZooKeeper. In diskless-cluster mode data is persisted entirely on Bookies and metadata on ZooKeeper, servers are totally stateless containers for the runtime.

In replicated diskless mode you can see:

  • metadata stored on ZooKeeper
  • service discovery data on ZooKeeper
  • journal on BookKeeper
  • data written to Bookies

For each tablespace you have a set of nodes (replicas) that are allowed to serve traffic. Each write from the client flows to the 'leader' node of the tablespace, that writes the change to the journal using BookKeeper. BookKeeper replicates the entry to a number of nodes (bookies). Data pages are written to BookKeeper as well, with a replication factor equal to the 'expectedreplicacount' configuration of the tablespace.

Embedding in the client application

HerdDB supports embedding server and client inside the same JVM application: it was in fact designed for this primary purpose and it's known to be used in production in this mode.

Running on docker

Currently we do not have any reported usage of HerdDB in production on docker. Its core components (ZooKeeper and BookKeeper) are ready for the cloud and they are known to be used in production by a large number of companies.

(describe point of attention to run herddb on fire and forget machine, eg. which components/data of herd that need to be persisted)

WORK-IN-PROGRESS

Clone this wiki locally