Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Tenancy through Database per Client #749

Closed
jeremydmiller opened this issue May 6, 2017 · 26 comments
Closed

Multi-Tenancy through Database per Client #749

jeremydmiller opened this issue May 6, 2017 · 26 comments

Comments

@jeremydmiller
Copy link
Member

I wanted to split up #435.

The idea here would be to keep each tenant in a completely separate database. We've already made some significant changes for 2.0 to make this a lot more efficient inside of Marten's internals. Here's what I think still needs to happen:

  • A tenanting strategy that can lookup the database connection string per tenant
  • At development time (AutoCreate != AutoCreate.None), be able to spin up a new database on the fly for a tenant
  • "Know" what all the existing tenants are
  • Be able to apply all document schema operations on all tenants
  • Might be an extension to the command line project to apply changes to all tenant databases at a time
@jeremydmiller
Copy link
Member Author

I'm dropping this off of the 2.0 release. When we do this, we'll need to change quite a few things in QueryHandlerFactory and the async daemon to not depend on the store's default tenant.

@jeremydmiller jeremydmiller modified the milestones: 2.1, 2.0 Jun 8, 2017
@jeremydmiller jeremydmiller modified the milestone: 2.1 Aug 11, 2017
@tonykaralis
Copy link
Contributor

Is there currently a way to achieve schema per client? So all your points above but instead of separate db we have separate schema. Would the only way be to spin up separate document stores at runtime?

@jeremydmiller
Copy link
Member Author

@tonykaralis If you look, there should be an open issue about multi-tenancy through schemas. That turned out to be extremely complicated to pull off, and was dropped out of the 2.0 release and never done.

You could do it with a DocumentStore per schema/tenant, yes.

@tonykaralis
Copy link
Contributor

@jeremydmiller thanks, I spotted that issue shortly after. It's a shame as schema per client would be very powerful feature but realise it's a nightmare to achieve. I am thinking schema per client is going to be too costly too manage with separate document stores per client.

@jeremydmiller
Copy link
Member Author

This is coming up much more often, maybe we play this in the next big release

@tonykaralis
Copy link
Contributor

Definitely interested in this, phase 2 of our project involves exposing the same multi tenanted data but via an api. The caveat being every call to the api will need to be tenanted. All calls from the api to the db go via Marten to the same database and schema. So for now I just need to figure out how to spin up a custom tenanted session per http request as the http request will have the tenantid to use.

I realise my edge case is not multi tenancy by schema or client but regardless it be a huge feature for us.

@jokokko
Copy link
Collaborator

jokokko commented Nov 24, 2021

Would be a very attractive feature (or at least I'd have strong use cases for it). Technical implementation will be interesting though and not able to make the same guarantees as schema or column level tenancy. E.g. "Be able to apply all document schema operations on all tenants" cannot be transactionally done OOTB, with PG connections being per database.

@jaredwri
Copy link

I have many services that resolve a connection string at runtime. That is to say that I do NOT have knowledge of the target database until the moment the service is asked to interact with it. For now, a custom ISessionFactory lazy loads a new document store and everything works. However, having one store, that allowed me to vary the connection would be ideal. We do use bulk operations off the store object at times so it became necessary to load and maintain multiple, basically duplicated (aside from the connection string), stores in memory.

@jokokko
Copy link
Collaborator

jokokko commented Nov 24, 2021

@jaredwri I assume you don't have operations that would span multiple databases, i.e. requiring multiple stores at a specific call site ? Just thinking your workaround sounds reasonable. If tenancy by DB was supported, it would only move code around a bit (so how you configure store vs. sessions).

What I think needs to be considered in supporting tenancy by DB is if/how it alters the outputs of Marten. So at the very least a matter of documentation - the same API call can yield very different exceptions or results based on initial configuration.

@jaredwri
Copy link

@jokoko
You are correct. I do not have operations that span multiple databases. One service invocation/interaction only affects one database.

@PhilipRieck
Copy link

PhilipRieck commented Dec 1, 2021

Glad to see the post on your blog - this proposal would be great to see - I'm currently stuck on version 3.x because of the overhead in multiple document stores in a containerized environment.

The situation I have is service(s) running in a multitenant environment. The service determines which tenant database at the 'last minute', just before performing a query or operation. Currently the workaround I'm using is to have multiple DocumentStores and look them up based on the tenant, but this is painful in Marten due to DocumentStore size and startup time. It's unusable in Marten v4 due to memory usage and generation times.

Some notes on my use case:

  • All tenant databases have identical schemas
  • I never have operations that span tenants or databases
  • The tenant databases vary only by connection string (host, username, etc)
  • I can supply the connection string, or a key. Ideally I could add a tenant 'dynamically', but if I have to restart my services to add a tenant I can work around that - would really rather not, though.
  • Each service is small, and without Marten can run in minimal memory containers hosted in kubernetes (under 20mb in many cases). Increasing this costs us money and running above the limit immediately terminates the service, so the memory usage must be at least predictable, ideally minimal.
  • On upgrade operations I can enumerate the connection strings/tenant keys. I have flexibility here - I can call the cli equivalent once per tenant if need be, or give a list of databases to upgrade. The existing CLI works fine for me as I already intercept schema operations to determine tenancy and then pass through.
  • I do not require Marten to 'know' what tenants exist - I have the list, I have the resolution. Giving the list to Marten and keeping them in sync just for 'all tenant operation' convenience methods doesn't really interest me, since it's just one more piece of data in two places.

Basically, I just want to use one DocumentStore and have the connection to the database be provided by me and held at the session level (And have memory usage be tiny). However, I'm open to any solution that fits my use case without me also needing to up my pod memory limit to hundreds of megabytes

@jeremydmiller
Copy link
Member Author

@PhilipRieck Thank you for taking the time to write that up!

The "generate ahead" model is meant to deal with the memory and cold start issue. I think I'm going to stick something in Marten 5 to make that much easier to use before I stock up on a lot more bourbon and attempt to move to IL generation instead.

"I do not require Marten to 'know' what tenants exist - I have the list, I have the resolution. Giving the list to Marten and keeping them in sync just for 'all tenant operation' convenience methods doesn't really interest me, since it's just one more piece of data in two places." -- Think database migrations. In that case, Marten absolutely has to know what all the databases are in order to do the schema migrations. I want this built in somehow to Marten so that more people can use this. What you've apparently built for yourself, I'd like to have in the box for other folks.

"if I have to restart my services to add a tenant I can work around that - would really rather not, though." -- if we make the tenancy discovery model a little bit pluggable, you could have custom -- in some in the box -- options to automatically spin up a new database for a valid new tenant

@VilleHakli
Copy link
Contributor

Support for multitenancy with database per tenant is something that we are really interested in.

Our use case seems to be pretty close to the one described by @PhilipRieck

  • All databases will have same schema
  • No need to do operations into multiple databases in same session
  • No need to automatically create new databases by Marten. We have a separate tool to create and migrate databases which is also used in development.
  • For the migrations it would be really nice to have possibility to use DocumentStore.Schema per tenant/database. For example having method DocumentStore.Schema.ForTenant("tenant") which would return IDocumentSchema for given tenant.
  • We have connection strings stored in separate configuration database. For us having pluggable tenancy would be nice as we have the need to spin up new tenants while the app is running. As long as we can use custom tenant identifier => connection string resolution, we should be fine.

These are just some points from the top of my head. I think that our use case is quite simple (famous last words?) and being able to pass connection string when opening new session might be enough for us. At the moment we open database connections and pass those to Marten, so being able to pass the connection string would already simplify our use case.

@jeremydmiller
Copy link
Member Author

Hey everyone, I started jotting down notes yesterday about implementing this. I think I want to say that right now we support these models:

  1. One database, no multi-tenancy. Duh.
  2. One database, some subset or all of the document types are "conjoined" in the same database & schema. Just like we do today
  3. Each tenant is in its own database
  4. A hybrid conjoined/separate database model where a single database could contain multiple tenants, and each tenant would belong to exactly one database.

Tenant per schema is still out of scope and really doesn't fit well w/ Marten internals anyway

I think everybody is going to be on board for 1-3, so let's talk more about 4. In the notes I took on potential design, it's not actually going to be any more complicated to assume the hybrid model is a possibility. I also think we have to have some knowledge of what tenants are valid for a given database to do runtime assertions when we add the new separated database model. Further more, 4 is something that would conceivably be valuable for my company where we have clients with individual locations/sub-organizations. I'm not suggesting we try for a full blown tree structure model of tenancy here, but setting the foundation might help.

At a minimum, I want to treat the concept of "Database" and "Tenant" as not necessarily locked together as we do this work regardless.

@PhilipRieck
Copy link

@jeremydmiller Thanks for continuing to look at this.

I have opinions on 1 and 3 - both models would be used by me. 2 and 4 are not something I or my team would need at all. I don't see a lot of use for them, but others may disagree.

Just to be clear on usage as far as I'm concerned: In a theoretical world where you had a Marten library that allowed me to use a different connection string / selector per Session without rebuilding the DocumentStore, and then you built first-class tenancy notions in a separate library on top of that capability, I would only use the base library. Any way you can give me to separate the ideas of data shape and query building from the physical connection to the database, will be a win for me. If you solve other users tenancy needs at the same time, well that is a huge bonus. In fact, this is basically how I'm trying to work around it now, but it's much harder in the new versions.

Looking at this as a whole, I think you'll require at least some minor architectural changes. Once you have a design you like and find yourself just needing someone's time coding it up, please let me know how I can assist - glad to send PRs your way.

@jeremydmiller
Copy link
Member Author

@PhilipRieck In what way is "it's much harder in the new versions"? You can still push a connection string into a session, and that's probably the most efficient way to do database per tenant with the existing Marten V4. And I definitely don't agree with having separate libraries for the multi-tenancy. At this point I think this is being a fair amount of work to enhance the database migrations capabilities in Marten for multiple databases, but hardly any code after that for the 3 & 4 models above. When we introduced 2. in Marten way back when, I thought that we'd also be doing 3., so the internal hooks are actually kinda set for database per client already.

@jeremydmiller
Copy link
Member Author

@PhilipRieck And I'm shutting down until at least next week, but I'll get back to you on the PR help. Definitely take you up on that if you're game. Think it's gonna be way more about writing tests than actual code.

@PhilipRieck
Copy link

@jeremydmiller

In what way is "it's much harder in the new versions"? You can still push a connection string into a session, and that's probably the most efficient way to do database per tenant with the existing Marten V4

It's quite possible (likely even) that the way I'm working around this currently is not the best way. I'll look more into that and create other threads or use gitter to track that down.

And I definitely don't agree with having separate libraries for the multi-tenancy.

I'm sorry - I wasn't suggesting this approach but trying to use it as an illustration. Re-reading it, I think it's more confusing than helpful so please disregard. If you can get the other approaches with minimal work that's great.

@jeremydmiller jeremydmiller added this to the 5.0.0 milestone Jan 10, 2022
@jeremydmiller
Copy link
Member Author

jeremydmiller commented Jan 10, 2022

Jotting down some implementation notes on what is going to be variable:

Static database to tenant mapping

Where is this information stored? Thinking we support multiple options:

  1. Some sort of IConfiguration based handler
  2. Lookup from a master database. And if we do this, do we need to do migrations for just that database? Punt, and make that pluggable?

Dynamic creation of databases

Marten already has some functionality for spinning up databases with configuration (likely moving to Weasel very soon). In development mode, we could spin these up on the fly based on expected or even new tenants.

@elexisvenator
Copy link
Contributor

elexisvenator commented Feb 23, 2022

Static database to tenant mapping

Where is this information stored? Thinking we support multiple options:

  1. Some sort of IConfiguration based handler
  2. Lookup from a master database. And if we do this, do we need to do migrations for just that database? Punt, and make that pluggable?

Would be great if the "master" database didnt have to be a database eg making things like dynamodb/azure tables/cosmosdb be pluggable options here. There shouldnt need to be something as heavy as a full postgres instance needed so marten can map to tenants.

Dynamic creation of databases

Marten already has some functionality for spinning up databases with configuration (likely moving to Weasel very soon). In development mode, we could spin these up on the fly based on expected or even new tenants.

Assuming that databases are not being spun up on the fly by marten, having a way to register a new tenant with marten without needing to restart the application would be invaluable

@BradleyBarnett
Copy link

There shouldnt need to be something as heavy as a full postgres instance needed so marten can map to tenants.

Well to use marten you likey already have "at least" one postgres instance to stick a central small DB on (it's going to be pretty small, like < 10M likely even for large scale systems) ... So it's not a horrible idea, why would we bring Dynamo into this architecture if it's already postgres based..

Having said that, providing an interface to overide this default master config might be valueable to some.

@jeremydmiller
Copy link
Member Author

"Having said that, providing an interface to override this default master config might be valueable to some." -- which is exactly what's already in place. And what you're talking about is maybe an extra schema & one table that isn't accessed very much riding on one of the databases that you're using. From my perspective, it makes no sense to use some other kind of storage.

@elexisvenator
Copy link
Contributor

Happy to pull this into a separate issue as its a bit of a rabbit hole.

To take a current real work example I manage (that doesn't use marten). Using AWS infrastructure, we have around 300 tenants in a single region. We put multiple databases on the same RDS servers which is by far the most cost effective option. One challenge is we have to put a cap on how many databases per server due to connections draining server memory (50 per server seems ok). On top of that some tenants are "hotter" than others with much more activity which means either moving them around to load balance or scaling each RDS instance based on demand. Moving individual databases across servers is not fun to do in RDS.

There is a balance between cost and trying to ensure the performance/stability of one tenant doesn't affect others.
Having an additional very small database on one of the servers - or making a tenant database pull double duty as a config db - suddenly makes one server might higher risk than all others. Conversely putting the same config in dynamodb would be much cheaper, much more reliable to access, not use any of your precious connections from your pool, and is not affected by things such as automated db maintenance.

@PhilipRieck
Copy link

Happy to pull this into a separate issue as its a bit of a rabbit hole.

It is indeed a rabbit hole!

Conversely putting the same config in dynamodb would be much cheaper, much more reliable to access, not use any of your precious connections from your pool, and is not affected by things such as automated db maintenance.

We're not on AWS, so dynamodb wouldn't be our choice, but you have it right - Anything we add that holds state adds risk and management effort. As we are fully kubernetes, our current tenancy storage is a custom resource (CRD) we apply to the etcd database and have a controller managing.

"Having said that, providing an interface to override this default master config might be valueable to some." -- which is exactly what's already in place.

This is perfect. Personally, I'd make the "master config" for tenancy have to be an affirmative choice (as in, you must select the implementation Marten will use to get tenants, translate tenant->connection, etc), rather than having one be the 'blessed' default. But: @jeremydmiller , I know you will need a default implementation to reduce friction for many users and may want to bless one - as long as overriding is clear and performant I'm happy.

One note - I would guess most people will quickly outstrip any default you provide. Perhaps a default provider based on IConfiguration would give you best bang for your buck?

I know my opinion is very much colored by my single use case, so thanks for taking it into consideration on this. Also, thanks so much for the progress on this! (And on MartenDB in general, in case you don't hear it enough).

@jeremydmiller
Copy link
Member Author

jeremydmiller commented Feb 28, 2022

Punchlist

  • PLV8 Transform() will need a model where you can specify the tenant. That or hang it off of IMartenDatabase
  • HotColdCoordinator should take in Tenant as a constructor argument
  • ProjectionDaemon should take in Tenant as a constructor argument
  • ShardAgent should take in a Tenant as a constructor argument
  • IEventSlicer methods should accept IMartenDatabase
  • Deal with AdvancedOperations
  • Mark IDocumentStore.Schema as [Obsolete].
  • DocumentStore.BuildProjectionDaemon() needs an overload for tenant id
  • Need an aggregate IDatabaseCleaner
  • ShardAgent cannot use ProjectionProgressFor() out of the box against a supplied tenant. Might be helpful to have a lightweight IMartenSession that can be created for a database. Lift a base class out of QuerySession? Or a new constructor on QuerySession
  • Need a way to access a single database from store for Clean operations -- Expose some version of ITenancy off of IDocumentStore
  • Move anything related to event sourcing in the AdvancedOperations to IMartenDatabase.
  • Review IMartenDatabase. Segregate internal vs public facing things

Development Tasks

  • Test the dotnet marten commands with multiple databases
  • Enable async daemon per database
  • Need an async version of FindTenant() that would be called within IDocumentStore.OpenSessionAsync()

@jeremydmiller
Copy link
Member Author

@PhilipRieck @elexisvenator To all the points:

  • ITenancy will be 100% pluggable, knock yourself out with whatever creative ideas you come up with -- but there's an outstanding issue of how that will work with the async daemon if client databases can be spun up on the fly. And the answer for right now is to kick that can down the road for just the moment
  • For 5.0, the only two out of the box options are a model where we assume that the tenant id is the database name within a single Postgresql server instance, and a "static" model where users at configuration time can tell Marten which tenants are stored in which connection string. In both cases, Marten can still perform all the normal database migration functionality, but only in the former model will Marten try to create databases on the fly. The "static" model is not directly tied to IConfiguration in any way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants