Traffic Server Layer 7 Working Group ************************************ .. |upstreamer| replace:: strategizer .. |SM| replace:: :code:`HttpSM` My view on the Layer 7 Routing working group meetings in Denver, 25-27 Jul 2018. Configuration ============= The YAML schema will be updated to be more generic. The "primary" and "secondary" rings will be removed and replaced with generic "rings" which will be defined in a specific section. The "strategy" section will be used to create policy that uses the rings. In essence the "ring" section will describe the elements of the CDN and "strategy" will describe the CDN policy for a specific node. To make this easier to deploy, some version of John's include support for YAML (which does not have this natively). |SM| Interaction ================== The various strategies in the L7R configuration files will provide a set of "strategies", each marked with a unique tag. There is a globally configured default strategy tag. During remap, this can be overridden with a different strategy tag. When the |SM| decides it needs to send the request to an upstream target, it instantiates a "|upstreamer|" [#]_ which is a run time stateful object based on a specific strategy. That strategy is selected by the strategy tag. The |upstreamer| is responsible for providing transaction ready sessions to upstream targets. The selection of the target and any handling of layer 3 or 4 connection errors is the resposibility of the |upstreamer|. When the |SM| receives an open session event (:code:`SESSIOn_READY`) it will send the HTTP request to the upstream. The |SM| will handle the result or response header for transaction, then report it to the |upstreamer|. The |upstreamer| then decides on the appropriate next action and informs the |SM| syncrhonously what that action is. The |SM| performs its required tasks and when it has finished, reports that to the |upstreamer|. Note that all decisions about how to proceed from the upstream response is done by the |upstreamer|. This includes not only HTTP responses but also network errors. Even though the session is open at the time the |SM| receives it, that does not guarantee the absence of further network problems. .. uml:: :align: center :caption: Generic Transaction entity L7Policy actor Strategizer actor HttpSM participant Upstream hide footbox HttpSM -\ L7Policy : create "tag" L7Policy -/ HttpSM : return: Strategizer activate Strategizer Strategizer -> HttpSM : SESSION_READY note right : Contains session HttpSM -> Upstream : HTTP Request Upstream -> HttpSM : Upstream Response HttpSM -> HttpSM : Read response. HttpSM -\ Strategizer : result note right : Contains response Strategizer -/ HttpSM : return: ACTION HttpSM -> HttpSM : process response HttpSM -> Strategizer : finish The interaction is split out in this manner to accomodate the variety of synchronous and asynchronous operations. In particular if there is an HTTP reponse from the upstream then the |SM| needs to be able to handle it in an appropriate way. However, that depends on what the next action as decided by the |upstreamer|. Because the |upstreamer| next action can depend on potentially long lived asynchronous operations, waiting for such operations to complete is therefore not feasible. Instead the |upstreamer| must respond synchronously to the response with an action code that describes what the |upstreamer| will do when the |SM| has finished its operations. It is assumed that while the next action may take a long time to perform, *determining* the next action should be a fast computation. The current valid actions are ``DONE`` The result is a final result that should be sent on to the user agent. ``RETRY`` The transaction was a failure of some sort but not a permanent one. The |upstreamer| will prepare another session after the |SM| has finished cleaning up the current transaction. This is clearer with specific examples. First consider the case when the upstream request is successful. It is unacceptable to wait for strategy completion to send data to the user agent - that flow needs to start as soon as possible. The |SM| can no longer know if that is correct without consulting the |upstreamer| therefore that must be a synchronous call. .. uml:: :align: center :caption: Successful Transaction entity L7Policy actor Strategizer actor HttpSM participant Upstream hide footbox HttpSM -\ L7Policy : create "tag" L7Policy -/ HttpSM : return: Strategizer activate Strategizer Strategizer -> HttpSM : SESSION_READY note right : Contains session HttpSM -> Upstream : HTTP Request Upstream -> HttpSM : ""200 OK"" HttpSM -> HttpSM : Read response header. HttpSM -\ Strategizer : result note right : HTTP 200 Response Strategizer -/ HttpSM : return: DONE HttpSM -> HttpSM : Read upstream response,\nsend to user agent. ... HttpSM -> Strategizer : finish Even in an HTTP failure there is still work for the |SM|. In this case, for a ``404`` response status the |upstreamer| will fail over to a different upstream. This behavior is used to probe multiple upstream pods for specific content, a ``404`` indicating the next target should be tried rather than returning the response to the user agent. The |SM| must drain any body from the response but needs to know, while draining, that the body will not be returned to the user agent. .. uml:: :align: center :caption: Fail / Retry Transaction entity L7Policy actor Strategizer actor HttpSM participant Upstream hide footbox HttpSM -\ L7Policy : create "tag" L7Policy -/ HttpSM : return: Strategizer activate Strategizer Strategizer -> HttpSM : SESSION_READY note right : Contains session HttpSM -> Upstream : HTTP Request Upstream -> HttpSM : ""404 NOT FOUND"" HttpSM -> HttpSM : Read response header. HttpSM -\ Strategizer : result note right : HTTP 404 Response Strategizer -/ HttpSM : return: RETRY HttpSM -> HttpSM : Drain upstream response body ... HttpSM -> Strategizer : finish ... Strategizer -> HttpSM : SESSION_READY == As 200 OK case == Another view of the activity demonstrates the "co-routine" like nature of the interaction. Both the |SM| and |upstreamer| perform blocking operations where the other side must wait for an event or signal to indication the operation has completed. The basic logic is .. uml:: :align: center :caption: Upstream session logic partition HttpSM { (*) --> "Determine strategy tag" } partition Strategy { "Determine strategy tag" --> "Create strategizer instance" "Create strategizer instance" -u-> "Connect upstream" "Connect upstream" --> "Event: SESSION_READY" } partition HttpSM { "Event: SESSION_READY" -l-> if "send request" then -->[net error] ===RESULT=== else -->[success] "Read Response Header" -> ===RESULT=== endif } partition Strategy { --> "Compute next action" } partition HttpSM { if "ACTION" then -->[DONE] "Forward to User Agent" else -->[RETRY] "Drain session" endif } partition Strategy { "Drain session" -->[finish()] "Connect upstream" "Forward to User Agent" -down->[finish()] "Destruct" } partition HttpSM { "Destruct" --> (*) } The state sequence in the |SM| is much simplified by this work. In particular, the nemesis of redirection or other upstream connection failures requiring rolling back the |SM| state will be avoided. Instead there is a simpler loop which spans a much smaller set of states. .. uml:: :align: center :caption: State Machine Upstream States [*] -> ServerOpen ServerOpen -> WaitForTransaction WaitForTransaction --> HandleServerTransaction : SESSION_READY HandleServerTransaction --> ReadResponseHeader ReadResponseHeader --> HandleResponseHeader : HTTP response ReadResponseHeader -> ReportServerResponse : network error ReportServerResponse -> DrainResponse : ACTION: RETRY DrainResponse --> WaitForTransaction HandleResponseHeader --> ReportServerResponse ReportServerResponse --> HandleServerResponse : ACTION: DONE state ServerOpen : Create Strategizer HandleServerTransaction : Send request ReadResponseHeader : Wait for upstream response HandleResponseHeader : Read and validate header ReportServerResponse: Report to stategizer\nGet next action HandleServerResponse : Send response to User Agent\nSignal strategizer DrainResponse : Drain reponse\nSignal strategizer Next Steps ========== In my view the key next steps are first, to get :code:`Extendible` committed to master. After that it was agreed the next deliverable would be a plugin or plugins that would emulate the current manual host status marking using :code:`Extendible`. This would enable * Valdating the :code:`Extendible` API. * Experiment with external tool to :code:`Extendible` data mechanisms. That is, how can external data be pused in to host and IP address records? * Validate that :code:`Extendible` data can be used to manipulate upstream selection. I think this will also be the keystone of moving forward with :code:`Extendible` as doing things similar to existing code is much easier than building thing ex nihilo. It might even be reasonable, as a temporary expedient, to have the core use :code:`Extendible` to create the data for these purposes and leave the plugin (which will reuquire much more in the way of infrastructure changes) until later. There is a reasonable chance that the core will end up using :code:`Extendible` internally, because it makes the overall code base much more modular. Open Issues =========== There are still a few open issues which aren't fully resolved. Setting upstream address It is a requirement, for particularly for transparent networking, to be able to explicitly set the IP addres of the upstream, both in the core and using the plugin API call :code:`TSHttpTxnSetTargetAddr`. Self Marking There must be a descriptor that says "the remapped upstream" so that no explicit upstream hosts need to be defined. This is needed for the default routing situation and for forward proxying. Visions ======= :code:`Extendible` was quite a hit and there was a push to use it in other situations, in particular with the |SM|. I pushed back on that because I think updating the |SM| would be a major change and therefore we should bake :code:`Extendible` a bit more to make sure the API is clean and the code reliable. With regard to the use of :code:`Extendible` in |SM| (something Leif and Vijay were giddy about) I have always thought this was in the long run a good idea, but changing the |SM| is always fraught. I prefer to get some experience with :code:`Extendible` before making such a major change. Past that the purpose of this would be to replace the current transaction arguments with :code:`Extendible`. An issue with this is remap plugins. These have two properties that make it more difficult to use :code:`Extendible`. * Different plugins per remap rule using different sets of extended data. * Reloading, which is another ongoing project. For various reasons (including the nature of the proxy allocators) it is not feasible to have instances of a class with different extendible schemas. As a result it will probably be necessary to store a schema per remap rule and dynamically allocate the storage block. This could be justified in that it is better to allocate once per |SM| than once per remap plugin that needs per transaction storage. If this isn't sufficient it might be needful to use an :code:`IOBufferBlock`. Given all the other allocations that go on in the |SM| I suspect reducing that will more than compensate for using :code:`Extendible` in the |SM|, especially if combined with :code:`MemArena` for general allocation. Such a change to |SM| will almost require similar changes to :code:`HttpClientSession` and :code:`AnnotatedConnection`. The reasons for this are the same as for the |SM|, in that each supports an array of :code:`< void * >` which are used by plugins. In these cases the use is for global plugins and therefore can be done statically rather than dynamically. .. rubric:: Footnotes .. [#] This is a stupid name but no one else would provide a better one, which frankly shouldn't be challenging.