Multi-Tenancy through Database per Client #749

jeremydmiller · 2017-05-06T14:10:48Z

I wanted to split up #435.

The idea here would be to keep each tenant in a completely separate database. We've already made some significant changes for 2.0 to make this a lot more efficient inside of Marten's internals. Here's what I think still needs to happen:

A tenanting strategy that can lookup the database connection string per tenant
At development time (AutoCreate != AutoCreate.None), be able to spin up a new database on the fly for a tenant
"Know" what all the existing tenants are
Be able to apply all document schema operations on all tenants
Might be an extension to the command line project to apply changes to all tenant databases at a time

jeremydmiller · 2017-06-08T14:54:43Z

I'm dropping this off of the 2.0 release. When we do this, we'll need to change quite a few things in QueryHandlerFactory and the async daemon to not depend on the store's default tenant.

tonykaralis · 2019-10-05T21:54:02Z

Is there currently a way to achieve schema per client? So all your points above but instead of separate db we have separate schema. Would the only way be to spin up separate document stores at runtime?

jeremydmiller · 2019-10-06T12:08:38Z

@tonykaralis If you look, there should be an open issue about multi-tenancy through schemas. That turned out to be extremely complicated to pull off, and was dropped out of the 2.0 release and never done.

You could do it with a DocumentStore per schema/tenant, yes.

tonykaralis · 2019-10-06T23:22:56Z

@jeremydmiller thanks, I spotted that issue shortly after. It's a shame as schema per client would be very powerful feature but realise it's a nightmare to achieve. I am thinking schema per client is going to be too costly too manage with separate document stores per client.

jeremydmiller · 2021-11-24T12:57:15Z

This is coming up much more often, maybe we play this in the next big release

tonykaralis · 2021-11-24T13:53:58Z

Definitely interested in this, phase 2 of our project involves exposing the same multi tenanted data but via an api. The caveat being every call to the api will need to be tenanted. All calls from the api to the db go via Marten to the same database and schema. So for now I just need to figure out how to spin up a custom tenanted session per http request as the http request will have the tenantid to use.

I realise my edge case is not multi tenancy by schema or client but regardless it be a huge feature for us.

jokokko · 2021-11-24T13:54:23Z

Would be a very attractive feature (or at least I'd have strong use cases for it). Technical implementation will be interesting though and not able to make the same guarantees as schema or column level tenancy. E.g. "Be able to apply all document schema operations on all tenants" cannot be transactionally done OOTB, with PG connections being per database.

jaredwri · 2021-11-24T14:24:26Z

I have many services that resolve a connection string at runtime. That is to say that I do NOT have knowledge of the target database until the moment the service is asked to interact with it. For now, a custom ISessionFactory lazy loads a new document store and everything works. However, having one store, that allowed me to vary the connection would be ideal. We do use bulk operations off the store object at times so it became necessary to load and maintain multiple, basically duplicated (aside from the connection string), stores in memory.

jokokko · 2021-11-24T15:05:48Z

@jaredwri I assume you don't have operations that would span multiple databases, i.e. requiring multiple stores at a specific call site ? Just thinking your workaround sounds reasonable. If tenancy by DB was supported, it would only move code around a bit (so how you configure store vs. sessions).

What I think needs to be considered in supporting tenancy by DB is if/how it alters the outputs of Marten. So at the very least a matter of documentation - the same API call can yield very different exceptions or results based on initial configuration.

jaredwri · 2021-11-24T15:35:24Z

@jokoko
You are correct. I do not have operations that span multiple databases. One service invocation/interaction only affects one database.

PhilipRieck · 2021-12-01T00:14:49Z

Glad to see the post on your blog - this proposal would be great to see - I'm currently stuck on version 3.x because of the overhead in multiple document stores in a containerized environment.

The situation I have is service(s) running in a multitenant environment. The service determines which tenant database at the 'last minute', just before performing a query or operation. Currently the workaround I'm using is to have multiple DocumentStores and look them up based on the tenant, but this is painful in Marten due to DocumentStore size and startup time. It's unusable in Marten v4 due to memory usage and generation times.

Some notes on my use case:

All tenant databases have identical schemas
I never have operations that span tenants or databases
The tenant databases vary only by connection string (host, username, etc)
I can supply the connection string, or a key. Ideally I could add a tenant 'dynamically', but if I have to restart my services to add a tenant I can work around that - would really rather not, though.
Each service is small, and without Marten can run in minimal memory containers hosted in kubernetes (under 20mb in many cases). Increasing this costs us money and running above the limit immediately terminates the service, so the memory usage must be at least predictable, ideally minimal.
On upgrade operations I can enumerate the connection strings/tenant keys. I have flexibility here - I can call the cli equivalent once per tenant if need be, or give a list of databases to upgrade. The existing CLI works fine for me as I already intercept schema operations to determine tenancy and then pass through.
I do not require Marten to 'know' what tenants exist - I have the list, I have the resolution. Giving the list to Marten and keeping them in sync just for 'all tenant operation' convenience methods doesn't really interest me, since it's just one more piece of data in two places.

Basically, I just want to use one DocumentStore and have the connection to the database be provided by me and held at the session level (And have memory usage be tiny). However, I'm open to any solution that fits my use case without me also needing to up my pod memory limit to hundreds of megabytes

jeremydmiller · 2021-12-01T14:03:40Z

@PhilipRieck Thank you for taking the time to write that up!

The "generate ahead" model is meant to deal with the memory and cold start issue. I think I'm going to stick something in Marten 5 to make that much easier to use before I stock up on a lot more bourbon and attempt to move to IL generation instead.

"I do not require Marten to 'know' what tenants exist - I have the list, I have the resolution. Giving the list to Marten and keeping them in sync just for 'all tenant operation' convenience methods doesn't really interest me, since it's just one more piece of data in two places." -- Think database migrations. In that case, Marten absolutely has to know what all the databases are in order to do the schema migrations. I want this built in somehow to Marten so that more people can use this. What you've apparently built for yourself, I'd like to have in the box for other folks.

"if I have to restart my services to add a tenant I can work around that - would really rather not, though." -- if we make the tenancy discovery model a little bit pluggable, you could have custom -- in some in the box -- options to automatically spin up a new database for a valid new tenant

VilleHakli · 2021-12-01T17:01:35Z

Support for multitenancy with database per tenant is something that we are really interested in.

Our use case seems to be pretty close to the one described by @PhilipRieck

All databases will have same schema
No need to do operations into multiple databases in same session
No need to automatically create new databases by Marten. We have a separate tool to create and migrate databases which is also used in development.
For the migrations it would be really nice to have possibility to use DocumentStore.Schema per tenant/database. For example having method DocumentStore.Schema.ForTenant("tenant") which would return IDocumentSchema for given tenant.
We have connection strings stored in separate configuration database. For us having pluggable tenancy would be nice as we have the need to spin up new tenants while the app is running. As long as we can use custom tenant identifier => connection string resolution, we should be fine.

These are just some points from the top of my head. I think that our use case is quite simple (famous last words?) and being able to pass connection string when opening new session might be enough for us. At the moment we open database connections and pass those to Marten, so being able to pass the connection string would already simplify our use case.

jeremydmiller · 2021-12-02T16:55:00Z

Hey everyone, I started jotting down notes yesterday about implementing this. I think I want to say that right now we support these models:

One database, no multi-tenancy. Duh.
One database, some subset or all of the document types are "conjoined" in the same database & schema. Just like we do today
Each tenant is in its own database
A hybrid conjoined/separate database model where a single database could contain multiple tenants, and each tenant would belong to exactly one database.

Tenant per schema is still out of scope and really doesn't fit well w/ Marten internals anyway

I think everybody is going to be on board for 1-3, so let's talk more about 4. In the notes I took on potential design, it's not actually going to be any more complicated to assume the hybrid model is a possibility. I also think we have to have some knowledge of what tenants are valid for a given database to do runtime assertions when we add the new separated database model. Further more, 4 is something that would conceivably be valuable for my company where we have clients with individual locations/sub-organizations. I'm not suggesting we try for a full blown tree structure model of tenancy here, but setting the foundation might help.

At a minimum, I want to treat the concept of "Database" and "Tenant" as not necessarily locked together as we do this work regardless.

PhilipRieck · 2021-12-22T19:28:17Z

@jeremydmiller Thanks for continuing to look at this.

I have opinions on 1 and 3 - both models would be used by me. 2 and 4 are not something I or my team would need at all. I don't see a lot of use for them, but others may disagree.

Just to be clear on usage as far as I'm concerned: In a theoretical world where you had a Marten library that allowed me to use a different connection string / selector per Session without rebuilding the DocumentStore, and then you built first-class tenancy notions in a separate library on top of that capability, I would only use the base library. Any way you can give me to separate the ideas of data shape and query building from the physical connection to the database, will be a win for me. If you solve other users tenancy needs at the same time, well that is a huge bonus. In fact, this is basically how I'm trying to work around it now, but it's much harder in the new versions.

Looking at this as a whole, I think you'll require at least some minor architectural changes. Once you have a design you like and find yourself just needing someone's time coding it up, please let me know how I can assist - glad to send PRs your way.

jeremydmiller · 2021-12-22T19:47:48Z

@PhilipRieck In what way is "it's much harder in the new versions"? You can still push a connection string into a session, and that's probably the most efficient way to do database per tenant with the existing Marten V4. And I definitely don't agree with having separate libraries for the multi-tenancy. At this point I think this is being a fair amount of work to enhance the database migrations capabilities in Marten for multiple databases, but hardly any code after that for the 3 & 4 models above. When we introduced 2. in Marten way back when, I thought that we'd also be doing 3., so the internal hooks are actually kinda set for database per client already.

jeremydmiller · 2021-12-22T20:52:08Z

@PhilipRieck And I'm shutting down until at least next week, but I'll get back to you on the PR help. Definitely take you up on that if you're game. Think it's gonna be way more about writing tests than actual code.

PhilipRieck · 2021-12-23T16:30:18Z

@jeremydmiller

In what way is "it's much harder in the new versions"? You can still push a connection string into a session, and that's probably the most efficient way to do database per tenant with the existing Marten V4

It's quite possible (likely even) that the way I'm working around this currently is not the best way. I'll look more into that and create other threads or use gitter to track that down.

And I definitely don't agree with having separate libraries for the multi-tenancy.

I'm sorry - I wasn't suggesting this approach but trying to use it as an illustration. Re-reading it, I think it's more confusing than helpful so please disregard. If you can get the other approaches with minimal work that's great.

jeremydmiller · 2022-01-10T15:49:40Z

Jotting down some implementation notes on what is going to be variable:

Static database to tenant mapping

Where is this information stored? Thinking we support multiple options:

Some sort of IConfiguration based handler
Lookup from a master database. And if we do this, do we need to do migrations for just that database? Punt, and make that pluggable?

Dynamic creation of databases

Marten already has some functionality for spinning up databases with configuration (likely moving to Weasel very soon). In development mode, we could spin these up on the fly based on expected or even new tenants.

elexisvenator · 2022-02-23T22:19:01Z

Static database to tenant mapping

Where is this information stored? Thinking we support multiple options:

Some sort of IConfiguration based handler

Lookup from a master database. And if we do this, do we need to do migrations for just that database? Punt, and make that pluggable?

Would be great if the "master" database didnt have to be a database eg making things like dynamodb/azure tables/cosmosdb be pluggable options here. There shouldnt need to be something as heavy as a full postgres instance needed so marten can map to tenants.

Dynamic creation of databases

Marten already has some functionality for spinning up databases with configuration (likely moving to Weasel very soon). In development mode, we could spin these up on the fly based on expected or even new tenants.

Assuming that databases are not being spun up on the fly by marten, having a way to register a new tenant with marten without needing to restart the application would be invaluable

BradleyBarnett · 2022-02-24T00:38:56Z

There shouldnt need to be something as heavy as a full postgres instance needed so marten can map to tenants.

Well to use marten you likey already have "at least" one postgres instance to stick a central small DB on (it's going to be pretty small, like < 10M likely even for large scale systems) ... So it's not a horrible idea, why would we bring Dynamo into this architecture if it's already postgres based..

Having said that, providing an interface to overide this default master config might be valueable to some.

jeremydmiller · 2022-02-24T01:05:47Z

"Having said that, providing an interface to override this default master config might be valueable to some." -- which is exactly what's already in place. And what you're talking about is maybe an extra schema & one table that isn't accessed very much riding on one of the databases that you're using. From my perspective, it makes no sense to use some other kind of storage.

elexisvenator · 2022-02-24T01:13:00Z

Happy to pull this into a separate issue as its a bit of a rabbit hole.

To take a current real work example I manage (that doesn't use marten). Using AWS infrastructure, we have around 300 tenants in a single region. We put multiple databases on the same RDS servers which is by far the most cost effective option. One challenge is we have to put a cap on how many databases per server due to connections draining server memory (50 per server seems ok). On top of that some tenants are "hotter" than others with much more activity which means either moving them around to load balance or scaling each RDS instance based on demand. Moving individual databases across servers is not fun to do in RDS.

There is a balance between cost and trying to ensure the performance/stability of one tenant doesn't affect others.
Having an additional very small database on one of the servers - or making a tenant database pull double duty as a config db - suddenly makes one server might higher risk than all others. Conversely putting the same config in dynamodb would be much cheaper, much more reliable to access, not use any of your precious connections from your pool, and is not affected by things such as automated db maintenance.

PhilipRieck · 2022-02-28T14:28:54Z

Happy to pull this into a separate issue as its a bit of a rabbit hole.

It is indeed a rabbit hole!

Conversely putting the same config in dynamodb would be much cheaper, much more reliable to access, not use any of your precious connections from your pool, and is not affected by things such as automated db maintenance.

We're not on AWS, so dynamodb wouldn't be our choice, but you have it right - Anything we add that holds state adds risk and management effort. As we are fully kubernetes, our current tenancy storage is a custom resource (CRD) we apply to the etcd database and have a controller managing.

"Having said that, providing an interface to override this default master config might be valueable to some." -- which is exactly what's already in place.

This is perfect. Personally, I'd make the "master config" for tenancy have to be an affirmative choice (as in, you must select the implementation Marten will use to get tenants, translate tenant->connection, etc), rather than having one be the 'blessed' default. But: @jeremydmiller , I know you will need a default implementation to reduce friction for many users and may want to bless one - as long as overriding is clear and performant I'm happy.

One note - I would guess most people will quickly outstrip any default you provide. Perhaps a default provider based on IConfiguration would give you best bang for your buck?

I know my opinion is very much colored by my single use case, so thanks for taking it into consideration on this. Also, thanks so much for the progress on this! (And on MartenDB in general, in case you don't hear it enough).

jeremydmiller · 2022-02-28T14:29:27Z

jeremydmiller · 2022-02-28T14:45:08Z

@PhilipRieck @elexisvenator To all the points:

ITenancy will be 100% pluggable, knock yourself out with whatever creative ideas you come up with -- but there's an outstanding issue of how that will work with the async daemon if client databases can be spun up on the fly. And the answer for right now is to kick that can down the road for just the moment
For 5.0, the only two out of the box options are a model where we assume that the tenant id is the database name within a single Postgresql server instance, and a "static" model where users at configuration time can tell Marten which tenants are stored in which connection string. In both cases, Marten can still perform all the normal database migration functionality, but only in the former model will Marten try to create databases on the fly. The "static" model is not directly tied to IConfiguration in any way

…GH-749

jeremydmiller added this to the 2.0 milestone May 6, 2017

jeremydmiller mentioned this issue May 6, 2017

Support for multi tenancy at documentStore level #435

Closed

jeremydmiller added the multi-tenancy label May 6, 2017

jeremydmiller added the breaking change label May 20, 2017

jeremydmiller modified the milestones: 2.1, 2.0 Jun 8, 2017

jeremydmiller modified the milestone: 2.1 Aug 11, 2017

jeremydmiller added the enhancement label Oct 7, 2017

jeremydmiller added this to the 5.0.0 milestone Jan 10, 2022

jeremydmiller mentioned this issue Feb 27, 2022

Multi Tenancy through Database per Client #2096

Merged

jeremydmiller pushed a commit that referenced this issue Mar 2, 2022

able to use multiple databases within the command line support. Closes …

186bf30

…GH-749

jeremydmiller closed this as completed in db9a783 Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Tenancy through Database per Client #749

Multi-Tenancy through Database per Client #749

jeremydmiller commented May 6, 2017

jeremydmiller commented Jun 8, 2017

tonykaralis commented Oct 5, 2019

jeremydmiller commented Oct 6, 2019

tonykaralis commented Oct 6, 2019

jeremydmiller commented Nov 24, 2021

tonykaralis commented Nov 24, 2021

jokokko commented Nov 24, 2021

jaredwri commented Nov 24, 2021

jokokko commented Nov 24, 2021 •

edited

Loading

jaredwri commented Nov 24, 2021

PhilipRieck commented Dec 1, 2021 •

edited

Loading

jeremydmiller commented Dec 1, 2021

VilleHakli commented Dec 1, 2021

jeremydmiller commented Dec 2, 2021

PhilipRieck commented Dec 22, 2021

jeremydmiller commented Dec 22, 2021

jeremydmiller commented Dec 22, 2021

PhilipRieck commented Dec 23, 2021

jeremydmiller commented Jan 10, 2022 •

edited

Loading

elexisvenator commented Feb 23, 2022 •

edited

Loading

Static database to tenant mapping

Dynamic creation of databases

BradleyBarnett commented Feb 24, 2022

jeremydmiller commented Feb 24, 2022

elexisvenator commented Feb 24, 2022

PhilipRieck commented Feb 28, 2022

jeremydmiller commented Feb 28, 2022 •

edited

Loading

jeremydmiller commented Feb 28, 2022

Multi-Tenancy through Database per Client #749

Multi-Tenancy through Database per Client #749

Comments

jeremydmiller commented May 6, 2017

jeremydmiller commented Jun 8, 2017

tonykaralis commented Oct 5, 2019

jeremydmiller commented Oct 6, 2019

tonykaralis commented Oct 6, 2019

jeremydmiller commented Nov 24, 2021

tonykaralis commented Nov 24, 2021

jokokko commented Nov 24, 2021

jaredwri commented Nov 24, 2021

jokokko commented Nov 24, 2021 • edited Loading

jaredwri commented Nov 24, 2021

PhilipRieck commented Dec 1, 2021 • edited Loading

jeremydmiller commented Dec 1, 2021

VilleHakli commented Dec 1, 2021

jeremydmiller commented Dec 2, 2021

PhilipRieck commented Dec 22, 2021

jeremydmiller commented Dec 22, 2021

jeremydmiller commented Dec 22, 2021

PhilipRieck commented Dec 23, 2021

jeremydmiller commented Jan 10, 2022 • edited Loading

Static database to tenant mapping

Dynamic creation of databases

elexisvenator commented Feb 23, 2022 • edited Loading

Static database to tenant mapping

Dynamic creation of databases

BradleyBarnett commented Feb 24, 2022

jeremydmiller commented Feb 24, 2022

elexisvenator commented Feb 24, 2022

PhilipRieck commented Feb 28, 2022

jeremydmiller commented Feb 28, 2022 • edited Loading

Punchlist

Development Tasks

jeremydmiller commented Feb 28, 2022

jokokko commented Nov 24, 2021 •

edited

Loading

PhilipRieck commented Dec 1, 2021 •

edited

Loading

jeremydmiller commented Jan 10, 2022 •

edited

Loading

elexisvenator commented Feb 23, 2022 •

edited

Loading

jeremydmiller commented Feb 28, 2022 •

edited

Loading