A tutorial for teaching basic messaging patterns
Clean exercices can be found here : Exercices branch
My solutions can be found here : Solutions branch
COM, SOAP, ESB, REST, micro services (throwable SOA (original) - Fred George, actual SOA - James lewis), ...
Enterprise integration pattern
https://github.com/iancooper/Practical-Messaging-Sharp
An n tier system is distributed but not integrated
A distributed system tends to use synchronous communication because the parts are not independent
An integrated system can often use asynchronous communication because the applications are independant
Two processes communicate via the producer writing to a file, and the consumer reading from it
A common data transfer mechanism that can be used by a variety of languages and platforms and feels neutral towards each
Requires agreement; file names, locations, who manage files...
Will create eventual consistency between systems due to periodic nature of publication
Two processes communicate by the producer writing to a databse and the concumer reading from it
Creating a unified schema that can meet the needs of all applications is a challenge
Breaks encapsulation and causes change to ripple across all applications
Often the Db supporting many enterprise-wide apps becomes the bottleneck
Two processes communicate by the client causing a procedure to exectue in another address space belonging to the server, coded as if it were a local function
Integrates functionality, not data
Behavioral coupling can tie the systems together in a knot, particalarly in sequencing ...
Two processes communicate by the producer sending a packet of data to a channel and the sender reading that packet of data from the channel
Async communication does not require bot systems to be up and ready at the same time
Messages can be transformed in transit without sender or receiver knowing
Small messages frequently allows behavioral as well as data collaboration
James White 1976 (RFC 707) - "A High-Level Framework for Network-Based Resource Sharing” describes Remote Procedure Calls (RPC).
White identified that many of the protocols in use for inter-process comm (IPC) such as file transfer protocol (FTP), or Remote Job Execution (RJE) had a common: Command-Response pattern.
command-name parameter
response-number text
The key priciple is that a remote procedure call should be as close as possible to local (single machine) procedure calls
A client should not know the location of the server (location transparency)
RPC frameworks use these modules to generate stubs for the caller and the receivcer
If the call looks like a local procedure call, how do we know where the server is that will service the request?
You need to register services via a distributed key-value store.
The server runtime exports its interface names, along with server name and network location. i.e. service registration
A client requests from the runtime servers that provide the interface and then routes the request (in this case to the closest) i.e. service discovery
/!\ Blocking & what happens if the work failed ?!? Waiting ?!?
High-speed, asynchronous, interprocess communication with reliable delivery
Processes communicate by sending packets of data called messages
Channels are logical pathways between processes
A message is a datastructure
Messaging depends on Message Oriented Middleware (MOM)
- Routes messages between applications
- Co-ordinates sending and receiving of messages
- Sender and receiver have the same availability
- Asynchronous: Send and Forget
- Store and Forward
Without it we are reinventing the wheel in ad-hoc network programming
- Location independence
- Platform coupling
- Temporal coupling
- Behavioral coupling
- Synchronous communication
- Data format
- Connection-oriented
For David Parnas, the criteria for decomposition is information hiding, not application flow
Good:
- Platform independent data format
- Sent to an addressable channel
- Support data transformations
- Remote communication
- Platform/Language integration
- Asynchronous Communication
- Throttling
- Reliable communication
Bad:
- Complex programming model (compensation...)
- Sequencing
- Eventual consistency
- Many moving parts
- Performance
- Lock-In
Difference Command/Event
Command => Knowing of a system executing => expectation
Event => no expectation
Loose coupling
RPC => much higher coupling
Do not send data to a specifinc machine (ip), but to an addressable channel
The channel should queue requests to remove temporal coupling
A producer puts a message onto a queue. A consumer take the message hand consume it
A common distributed system (as opposed to integrated system) pattern
Use deoupled invocation to put the work on a queue, offloading the long running tasks, allowing the web server to respond in time
If too long return 202 and provide location (URI) to monitor completion
Analogy:
RPC => phone call
Messaging => mail or email
- A message has a header and a body
- Message intent: Command, Document (contains data, not just an event), Event (just an tiny little event)
- Request-Reply: needs return channel and correlation identifier
- Break a large message into pieces as a message sequence
- Slow messages: one way to deal with eventual consistency is to create a message expiry
Use a command message to reliability invoke a procedure in another application => expectation
Uses the well-established pattern for encapsulating a request as an object
Use Document message to reliably transfer a datastructure between applications
No Business Data
Where does a consumer of an event message get data it needs to process the message ?
- Push model : the message is combined document/event message
- Pull model : sequence
- update
- state request
- state reply
- Reference Data : Apps exchange out of band messages about operand/collection data? May be a document message, or File transfer etc. (like cache listening messages to stay up to date)
RPC : Command message-Document Message
Query : Command message- document message/sequence message
Contains a message id a sequence part id
Message sequencing is not compatible with competing consumers => only one consumer ! (if not partitioned)
Expiration
Format indicator : version ...
A virtual pipe that connect producer and consumer
Messaging is not a 'bucket'. A consumer can filter according to the type of information it wants.
Logical address
Unidirectionnal
One-to-one or one-to-many
- Point to point
- Datatype channel
- Publish/Subscribe
- Invalid messagechannel
- Deadletter channel
- Quality of service channel
- Message Endpoint
- Messaging Gateway
- Messaging Mapper
- Polling Consumer
Consume every X times
Connection can be idle
Clients are using more CPU
Server is using more CPU
- Event Driven Consumer
Keep connections open
Less CPU on client side
More CPU on the server side
Availability to use back pressure and throttling
- Service Activator
Requestor / Replier (like MassTransit)
We cannot process fast enough and we have high latency
Eventual consistency at high latency looks like network partition
A single consumer is bad - if it fails, that failure cascades to callers
Ordering is an issue as consumers run at different rates. Two main solutions:
- Sorting into order, using the queue, not processing out of order. This can be less performant than having no competing consumers.
- Ensure messages can run when out-of-order i.e. idempotency
If the time taken for a consumer to process a given task is long, there are negative consequences.
The queue backs up, leading to high latency, which appears as a partition
It is difficult to understand progress.
Scaling becomes ‘all or nothing.’
The Parallel Pipelines Pattern uses a Pipes and Filters approach to Decoupled Invocation.
- The task is broken into sub-tasks or filter steps. Usually by the first consumer.
- Communication between filter steps is by a message queue, each filter step runs as an independent process.
- A filter step may use the competing consumers pattern – so resources can be tailored to the needs of each step.
- Monitoring the queue length indicates the performance of relative steps, helping to identify bottlenecks.
- Note that overall latency is increased, you have multiple queues, but you benefit in throughput.
Router decide which routee should receive the message according to the content
Routees inform the router about what they should receive through a control channel
Split a message into many and send each of them to a routee
Avoid big payload, store big content externally and pass reference if necessary (blob storage...)
Opposite of a splitter => aggregate messages
Needs an internal buffer
Reorder correlated messages
Kinda like a pub sub but with multiple channels (like topics)
Route message according to Slip
Each message know its steps (sequence of routes)
Use a central processing unit
Define a workflow on a central point
Low testability
Can use heterogenous tech to implement control bus (monitoring for exemple)
- Write-off (continue)
- Retry
- Compensating
- Transaction coordinator (Prepare/Commit/Feedback)
Master/Mirror
Only communicate with the master (producer or consumer)
Mirrors are only dupplication
Consistency / Availability / Partition tolerance
Only two of them ! => Forfeit one !
Durable
Mirrored queues
Durable not mirrored queues do not make sense
Pause minority is C and P
Non durable non mirrored queue are A and P
A sends message to B
No Ack => 2 possibilities
- message not received by B
- message received by B but ack not received by A
Store outbox messages in A
Store inbox in B
2 strategies :
- Inbox / Outbox
- Change data capture
Breaking changes =>
- Send both versions
- Use message translator
- MassTransit
- NServiceBus
- Brighter
- Celery
- Redis Queue
- Brighter
- Redis
- Sql Server
- Kafka
- RabbitMQ
- Amazon SNS / SQS
- Azure Service Bus
Book : Entreprise integration patterns
Book : Manning : Microservices patterns
Book : Manning : SOA patterns