Traffic Server Layer 7 Working Group

My view on the Layer 7 Routing working group meetings in Denver, 25-27 Jul 2018.

Configuration

The YAML schema will be updated to be more generic. The “primary” and “secondary” rings will be removed and replaced with generic “rings” which will be defined in a specific section. The “strategy” section will be used to create policy that uses the rings. In essence the “ring” section will describe the elements of the CDN and “strategy” will describe the CDN policy for a specific node. To make this easier to deploy, some version of John’s include support for YAML (which does not have this natively).

HttpSM Interaction

The various strategies in the L7R configuration files will provide a set of “strategies”, each marked with a unique tag. There is a globally configured default strategy tag. During remap, this can be overridden with a different strategy tag.

When the HttpSM decides it needs to send the request to an upstream target, it instantiates a “strategizer” [1] which is a run time stateful object based on a specific strategy. That strategy is selected by the strategy tag.

The strategizer is responsible for providing transaction ready sessions to upstream targets. The selection of the target and any handling of layer 3 or 4 connection errors is the resposibility of the strategizer. When the HttpSM receives an open session event (SESSIOn_READY) it will send the HTTP request to the upstream. The HttpSM will handle the result or response header for transaction, then report it to the strategizer. The strategizer then decides on the appropriate next action and informs the HttpSM syncrhonously what that action is. The HttpSM performs its required tasks and when it has finished, reports that to the strategizer. Note that all decisions about how to proceed from the upstream response is done by the strategizer. This includes not only HTTP responses but also network errors. Even though the session is open at the time the HttpSM receives it, that does not guarantee the absence of further network problems.

entity L7Policy
actor Strategizer
actor HttpSM
participant Upstream

hide footbox

HttpSM -\ L7Policy : create "tag"
L7Policy -/ HttpSM : return: Strategizer
activate Strategizer
Strategizer -> HttpSM : SESSION_READY
note right : Contains session
HttpSM -> Upstream : HTTP Request
Upstream -> HttpSM : Upstream Response
HttpSM -> HttpSM : Read response.
HttpSM -\ Strategizer : result
note right : Contains response
Strategizer -/ HttpSM : return: ACTION
HttpSM -> HttpSM : process response
HttpSM -> Strategizer : finish

Generic Transaction

The interaction is split out in this manner to accomodate the variety of synchronous and asynchronous operations. In particular if there is an HTTP reponse from the upstream then the HttpSM needs to be able to handle it in an appropriate way. However, that depends on what the next action as decided by the strategizer. Because the strategizer next action can depend on potentially long lived asynchronous operations, waiting for such operations to complete is therefore not feasible. Instead the strategizer must respond synchronously to the response with an action code that describes what the strategizer will do when the HttpSM has finished its operations. It is assumed that while the next action may take a long time to perform, determining the next action should be a fast computation.

The current valid actions are

DONE
The result is a final result that should be sent on to the user agent.
RETRY
The transaction was a failure of some sort but not a permanent one. The strategizer will prepare another session after the HttpSM has finished cleaning up the current transaction.

This is clearer with specific examples.

First consider the case when the upstream request is successful. It is unacceptable to wait for strategy completion to send data to the user agent - that flow needs to start as soon as possible. The HttpSM can no longer know if that is correct without consulting the strategizer therefore that must be a synchronous call.

entity L7Policy
actor Strategizer
actor HttpSM
participant Upstream

hide footbox

HttpSM -\ L7Policy : create "tag"
L7Policy -/ HttpSM : return: Strategizer
activate Strategizer
Strategizer -> HttpSM : SESSION_READY
note right : Contains session
HttpSM -> Upstream : HTTP Request
Upstream -> HttpSM : ""200 OK""
HttpSM -> HttpSM : Read response header.
HttpSM -\ Strategizer : result
note right : HTTP 200 Response
Strategizer -/ HttpSM : return: DONE
HttpSM -> HttpSM : Read upstream response,\nsend to user agent.
...
HttpSM -> Strategizer : finish

Successful Transaction

Even in an HTTP failure there is still work for the HttpSM. In this case, for a 404 response status the strategizer will fail over to a different upstream. This behavior is used to probe multiple upstream pods for specific content, a 404 indicating the next target should be tried rather than returning the response to the user agent. The HttpSM must drain any body from the response but needs to know, while draining, that the body will not be returned to the user agent.

entity L7Policy
actor Strategizer
actor HttpSM
participant Upstream

hide footbox

HttpSM -\ L7Policy : create "tag"
L7Policy -/ HttpSM : return: Strategizer
activate Strategizer
Strategizer -> HttpSM : SESSION_READY
note right : Contains session
HttpSM -> Upstream : HTTP Request
Upstream -> HttpSM : ""404 NOT FOUND""
HttpSM -> HttpSM : Read response header.
HttpSM -\ Strategizer : result
note right : HTTP 404 Response
Strategizer -/ HttpSM : return: RETRY
HttpSM -> HttpSM : Drain upstream response body
...
HttpSM -> Strategizer : finish
...
Strategizer -> HttpSM : SESSION_READY
== As 200 OK case ==

Fail / Retry Transaction

Another view of the activity demonstrates the “co-routine” like nature of the interaction. Both the HttpSM and strategizer perform blocking operations where the other side must wait for an event or signal to indication the operation has completed. The basic logic is

partition HttpSM {
   (*) --> "Determine strategy tag"
}

partition Strategy {
   "Determine strategy tag" --> "Create strategizer instance"
   "Create strategizer instance" -u-> "Connect upstream"
   "Connect upstream" --> "Event: SESSION_READY"
}

partition HttpSM {
   "Event: SESSION_READY" -l->  if "send request" then
      -->[net error] ===RESULT===
   else
      -->[success] "Read Response Header"
      -> ===RESULT===
   endif
}

partition Strategy {
   --> "Compute next action"
}

partition HttpSM {
   if "ACTION" then
      -->[DONE] "Forward to User Agent"
   else
      -->[RETRY] "Drain session"
   endif
}

partition Strategy {
   "Drain session" -->[finish()] "Connect upstream"
   "Forward to User Agent" -down->[finish()] "Destruct"
}

partition HttpSM {
   "Destruct" --> (*)
}

Upstream session logic

The state sequence in the HttpSM is much simplified by this work. In particular, the nemesis of redirection or other upstream connection failures requiring rolling back the HttpSM state will be avoided. Instead there is a simpler loop which spans a much smaller set of states.

[*] -> ServerOpen
ServerOpen -> WaitForTransaction
WaitForTransaction --> HandleServerTransaction : SESSION_READY
HandleServerTransaction --> ReadResponseHeader
ReadResponseHeader --> HandleResponseHeader : HTTP response
ReadResponseHeader -> ReportServerResponse : network error
ReportServerResponse -> DrainResponse : ACTION: RETRY
DrainResponse --> WaitForTransaction
HandleResponseHeader --> ReportServerResponse
ReportServerResponse --> HandleServerResponse : ACTION: DONE

state ServerOpen : Create Strategizer
HandleServerTransaction : Send request
ReadResponseHeader : Wait for upstream response
HandleResponseHeader : Read and validate header
ReportServerResponse: Report to stategizer\nGet next action
HandleServerResponse : Send response to User Agent\nSignal strategizer
DrainResponse : Drain reponse\nSignal strategizer

State Machine Upstream States

Next Steps

In my view the key next steps are first, to get Extendible committed to master. After that it was agreed the next deliverable would be a plugin or plugins that would emulate the current manual host status marking using Extendible. This would enable

  • Valdating the Extendible API.
  • Experiment with external tool to Extendible data mechanisms. That is, how can external data be pused in to host and IP address records?
  • Validate that Extendible data can be used to manipulate upstream selection.

I think this will also be the keystone of moving forward with Extendible as doing things similar to existing code is much easier than building thing ex nihilo. It might even be reasonable, as a temporary expedient, to have the core use Extendible to create the data for these purposes and leave the plugin (which will reuquire much more in the way of infrastructure changes) until later. There is a reasonable chance that the core will end up using Extendible internally, because it makes the overall code base much more modular.

Open Issues

There are still a few open issues which aren’t fully resolved.

Setting upstream address
It is a requirement, for particularly for transparent networking, to be able to explicitly set the IP addres of the upstream, both in the core and using the plugin API call TSHttpTxnSetTargetAddr.
Self Marking
There must be a descriptor that says “the remapped upstream” so that no explicit upstream hosts need to be defined. This is needed for the default routing situation and for forward proxying.

Visions

Extendible was quite a hit and there was a push to use it in other situations, in particular with the HttpSM. I pushed back on that because I think updating the HttpSM would be a major change and therefore we should bake Extendible a bit more to make sure the API is clean and the code reliable.

With regard to the use of Extendible in HttpSM (something Leif and Vijay were giddy about) I have always thought this was in the long run a good idea, but changing the HttpSM is always fraught. I prefer to get some experience with Extendible before making such a major change. Past that the purpose of this would be to replace the current transaction arguments with Extendible. An issue with this is remap plugins. These have two properties that make it more difficult to use Extendible.

  • Different plugins per remap rule using different sets of extended data.
  • Reloading, which is another ongoing project.

For various reasons (including the nature of the proxy allocators) it is not feasible to have instances of a class with different extendible schemas. As a result it will probably be necessary to store a schema per remap rule and dynamically allocate the storage block. This could be justified in that it is better to allocate once per HttpSM than once per remap plugin that needs per transaction storage. If this isn’t sufficient it might be needful to use an IOBufferBlock. Given all the other allocations that go on in the HttpSM I suspect reducing that will more than compensate for using Extendible in the HttpSM, especially if combined with MemArena for general allocation.

Such a change to HttpSM will almost require similar changes to HttpClientSession and AnnotatedConnection. The reasons for this are the same as for the HttpSM, in that each supports an array of < void * > which are used by plugins. In these cases the use is for global plugins and therefore can be done statically rather than dynamically.

Footnotes

[1]This is a stupid name but no one else would provide a better one, which frankly shouldn’t be challenging.