2019 Spring Summit¶
These are my notes from the 2019 Spring Summit in Beijing.
Opening remarks¶
The primary message was we as a community need to do better about testing at production levels. Bringing ATS 8.0.x to production quality has been a long struggle and 8.1.x is not going well either. A problem I see is that as the production testing takes longer, the difference between the LTS release and the next release gets larger, which makes the whole process take longer. This is a degenerative spiral that will end badly.
We need to do better about reviewing pull requests, in particular to get reviews across company silos. I think people should do this also because it’s good for the reviewer.
Plugin API and WebAssembly¶
Web assembly is the new fad of a byte code executor which can be run in a web browser. The ostensible point is to subsume the variety of languages currently used in active web pages to a single basic feature 1. Developers would be able to write code in a variety of lanagues, which is condensed to byte code and then executed in a engine based run time environment.
Kit is concerned about the long term viability of the Lua community, given that luajit and other libraries have been abandoned. If ATS has a WebAssembly engine, then plugins could be done the same way.
All of this is still very wide open - many different incompatible tool chains floating around.
Commentary
It occurs to me as I write this that there may be some performance implications. It’s hard to believe the byte code machine can be as fast as native C code. We have some plugins where plugin performance is a significant issue.
QUIC Updates¶
HTTP over QUIC is now ‘HTTP/3”.
Multipath is coming, but final docuemnt not due until May 2020.
Still targeting ATS 9.0.
OpenSSL doesn’t support API required for QUIC (any supported release at this time). Masaori said there are not current plans to ever support this API. This would require a custome openSSL build for ATS to use openSSL and QUIC.
BoringSSL has QUIC API.
Results from IETF 103, 104 QUIC WG tested against 16 other versions.
Depending HTTP/2 outbound for certain client side features.
Working on performance improvements, implementation was very slow.
Cleanup of loading certificates from “ssl_multicert.config”.
Review is going to be hard - already 43K lines changed.
There was a long discussion on how and when to bring this in to master.
A key issue is how many changes are expected in the future. Moving to master will mean a large slow down in commit speed, because reviews will be required for each PR. Currently the QUIC branch has 1300 commits over two years, which is a very high velocity for this project.
One problem is that if QUIC doesn’t go in to ATS 9, it will be another 12-18 months before it is released in a supported version.
QUIC Deployment¶
This was about actually deploying a variant of the ATS QUIC implementation (draft 12) in production.
Reasons for QUIC
RTT costs.
Head of Line blocking
Mobile connection switching vs. QUIC routing. Possibly load balancers can route by DCID embedded in QUIC packets. Unclear this is commerciall available without local changes. DCID contents encode the
ET_NET
thread in this deployment.UDP_NET
threads get packets at random and reschedule to the appropriateET_NET
thread.
Commentary
I’m not sure how of much of a QUIC advantage these things are any more. Most (all?) of them have been migrated back into TCP stacks. E.g. MPTCP.
Using “BRR” congestion control vs. “New Reno” which is the IETF QUIC recommendation. Claims 10x speed improvement for transfering 1MB in 5% packet loss environment. This approximates some third world networks.
Using SSL_OP_NO_ANTI_REPLAY. Getting 79% ~ 98% sessions reuse for 0-RTT.
ATS Internals¶
A discussion of ATS internal architecture (based on 6.0.x).
AOI¶
Asynchronous I/O. This is a thread based module that does blocking I/O internally and provides an asynchronous interface using mutex locking and lockless queues.
Commentary
Should look at using atomics instead of an insert mutex for the AIO queue. That might be problematic for removes.
Commentary
As far as oknet can tell, the AIO priority isn’t used. This implies we may want to look at removing the internal sorted priority queue, since it’s not doing anything useful.
There is also the native Linux AIO system. Structured similarly to epoll.
DNS¶
Table of in flight DNS queries, with de-dupping. Each entry can have a list of continuations waiting
on the results for that query. Can have dedicated DNS thread or share with an ET_NET
thread.
Transform Connections¶
This is a chain of continuations that process data serially. There is a special TransformTerminus
created by the core which anchor the chain.
Commentary
I need to get back to providing some more control in this chain. The idea at one point was to be able to pass hints along about how the transform was modifying the content, particularly with regard to length. There would be potential performance improvements if a transform could mark itself as read only or length preserving.
The diagrams were done in OmniGraffle.
Thread Scheduling¶
The real point of all of this is to support thread affinity. The API is designed to make this
behave in an orderly and expected way. Conceptually each thread can have an affine thread which
belongs to a specific thread pool. This makes it possible to easily schedule a continuation on
the same thread every time, particularly when switching threads. E.g. something does a DNS request
and needs to be scheduled back to ET_NET
from ET_DNS
.
Talk about clean shutdown, primarily to avoid crashes from out of order clean up of static data.
JEMalloc differences are small - order of 1-2% more for JEMalloc. However, having some locking issues under stress, still investigating.
CWiki Changes¶
Generally agreed. The concensus was
If it’s about the software, it goes in Sphinx in the code base.
If it’s about the community, it goes in the Wiki.
Projects go to github projects.
Plugin Reload¶
Goal is to change plugin code by loading new versions of dynamic libraries, cleaning up the old ones. Old code needs to be removed to do cleanup and to release resources.
One issue is the internal dynamic library loader tracks the loaded library files, not reloading if already loaded. This isn’t affected by replacing the dynmaic library file.
This change does Continuation context tracking, using the mechanism from the Plugin Coordination project. That is, each Continuation has a member that is a pointer to a context. When a Continuation is created the context is passed from the executing Continuation to the Continuation being created.
This is only for remap plugins and remap configuration reload.
The context objects are reference counted so that the context is preserved as long as a Continuation is using it. This leads to the problem with plugins that schedules events after a transaction or does a persistent periodic task. In this case it would be possible to accumulate a large number of old plugin context objects. There is also the problem that the plugin could have multiple instances running concurrently. To me this seems to overlap with Fei’s work on clean shutdown. This is fundamentally the same, where a version of a plugin needs to shut down in a clean manner. Because this particular case it’s less of an issue because there is already a hook for shutting down remap plugin instances.
Leif went on at great length about optimizing the reloading of the dynamic libraries. It would be worthwhile to keep a modified time stamp for each dynamic library and only actually reload if that time has changed. This is because it’s quite unusual for the dynamic library to change, it’s a rare but critical case.
In the long run we want to unify global and remap plugins, so there are only plugins. This means being able to reload any plugin, and having hooks for global plugins to indicate load, reload, and shutdown. The plugin context structures would also be unified. In the not so long term, we can look at providing hooks for global plugins to be informed of dynamic library reloads.
Discussion on TSRemapOSResponse
- what is the use of this? Kit said it was because it is
always called, in contrast to ReadResponseHdrHook
which is only called after a successful
connection to the upstream. Using both enables detecting upstream connection failures.
Discussion on whether this should be optional - is that useful? One argument is that it would ease transition. Is it sufficient to just make the “only reload if the timestamp changed” update?
Parent Selection and traffic_manager¶
Project goal is to be able to have more complex selection criteria for upstream selection.
Added ability mark upstream status, via traffic_ctl or plugin API. This includes reasons for the status.
A major problem has been the only data that flows from traffic_server to traffic_manager is statistics, and so that has been used. However this approach does not scale well. This is another aspect of how the internal RPC is messed up.
Working on
updating to YAML for “parent.config” and better integration with “remap.config”.
API support for tweaking selection strategies.
The only thing that traffic_manager provides that is not easily put into traffic_server is the external process RPC. To me, this is once again wrapped up with the overall brokenness of the RPC.
TLS State of the Onion¶
Susan’s slides, I couldn’t take notes.
Masaori talked about refactoring “ssl_multicert.config” based on work with QUIC.
- Require openSSL 1.0.2+ for ATS 9
This means droppping support for CentOS 6 and Ubuntu 14.04. Currently ATS 8 only requires openSSL 0.9.4+ which is far too old.
- Common Mistakes for Configuration
Frequent use of #ifdef instead of #if.
Use of
AC_CHECK_HEADER
definitions without actually calling ;code:AC_CHECK_HEADER.- Cleanup
Replace tokenizer with
TextView
.Drop NPN support for ATS 9?
Agreed that “I_” and “P_” prefixes are obsolete and should be removed.
Need to talk about how to organize include files in order to deprecate the “I_” and “P_” prefixes.
We want to move the “ssl_” prefixed functions from globals to a namespace.
Plugin Review¶
Concern about the process of promoting plugins from experiment to stable. Therefore this is a walk through of current plugins and what should be done with them.
- access_control
Remain experimental.
- acme
Unclear about what this does and what support it has. This supports “Let’s Encrypt” use. “Acme” is a protocol of which “Let’s Encrypt” is an implementation. Remain experimental.
- balancer
Deprecate? This is being moved in to the core.
- buffer_upload
Ancient and unchanging. Should be deprecated or promoted.
- cache_key_genid
What does this do? Look at deprecating, move functionality into another plugin.
- cache_promote
Promote to stable.
- cache_range_requests
Used by traffic_ctl. Promote when documented.
- cachekey
Already promoted.
- certifier
Promote to stable or move to examples. Much of this is being moved in to the core.
- collapsed_forwarding
Check with Comcast. Used by Kit.
- cookie_remap
Keep in experimental.
- custom_redirect
Investigate - does any one use this?
- fastcgi
Remain experimental.
- fq_pacing
Remain experimental.
- geoip_acl
Remain experimental.
- header_freq
Remain experimental.
- header_normalize
Deprecate. Obsolete functionality.
- hipes
Deprecate. Obsolete functionality.
- hook-trace
Experimental
- inliner
Consider deprecating? Functionality covered by
pagespeed
?- ja3_fingerprint
Experimental.
- magick
Experimental.
- memcache
Deprecate. Move to example as a prorotocol plugin example?
- memcached_remap
Deprecate.
- metalink
Remain experimental.
- money_trace
JvD will ask if this is still viable.
- mp4
Remain experimental.
- multiplexer
Promote to stable.
- mysql_remap
Deprecate.
- prefetch
Promote to stable.
- remap_purge
Promote to stable.
- remap_stats
Promote when documented.
- server_push_preload
Remain experimental.
- slicer
Remain experimental.
- webp_transform
Functionality replaced by
magick
. Deprecate when the latter is promoted.- ssl_session_reuse
Remain experimental.
- sslheaders
Remain experimental. Try to move functionality to
header_rewrite
.- stale_while_revalidate
Deprecated already. Should remove code.
- stream_editor
Remain experimental.
- system_stats
Remain experimental.
- tls_bridge
Remain experimental.
- traffic_dump
Remain experimental.
- uri_signing
Remain experimental.
- url_sig
Remain experimental.
Kubernetes¶
Project goal is to support Kubernetes as a general corporate infrastructure. ATS would be used as a routing engine.
Using tmgr_plugin because the frequency of updates is on the same scale as the time delay for traffic_ctl making the latter unusable in production.
ATS is put in a docker container image, using an image built from a local build along with plugins and configuration. This is then deployed as part of a Kubernetes pod.
There is a caching ATS in the Kubernetes infrastructure, then the “parent.config” is updated automatically to forward requests to the caching instance(s). This means that updating the “parent.config” means doing a full configuration reload. It would be nice to be able to do more incremental updates to the upstream selection.
To integrate ATS in to Kubernetes we would need something to process from Kubernetes YAML to ATS configurations. This may already exist, done by unixwitch, but it hasn’t been updated since July 2017.
Remap¶
A descent in to madness. Leif pushed for doing a complete redesign of url rewriting and other configurations. The idea is that remap is restructured along different lines in order to support more complex configurations. We would also incorporate other configurations that are URL dependent, such as “hosting.config”, in to the URL rewrite. In essence URL rewrite becomes something like what was envisioned for Layer 7 Routing, where a request is passed in to a feature extractor and then a decision tree, which yields a data record. That record determines behavior for the request for various modules (e.g. upstream selection, cache properties, etc.).
There was no real resolution, execpt Miles seems to have gotten stuck with leading a working group to do the design. I managed to limit myself to just doing the coding and YAML support.
Although the current project for YAML for “remap.config” is not going to fly, some general comments were
Get rid of named filters entirely. All filters must be defined in line per rule. To save typing, if that’s desired, YAML anchors and references could be used.
Rule 3 2 for filters was grudgingly accepted. It wasn’t liked, but it was agreed it matched the “intuitive” behavior.
Filters should not affect matching.
It’s fine to unify all the rules and use extra tags / options to map to existing types of rules.
The implementation of YAML should be as independent as possible. I want to try to do this as a plugin first. Use txn-box as the platform to experiment.
General Notes¶
Leif wants it to be noted the rule about string manipulation should be
Use std::string_view
unless methods from ts::TextView
are explicitly needed.
There’s general agreement the current internal RPC is messed up and/or broken. It’s also the concensus that a JSON-RPC compatible protocol should be used.
Discussion on how the external connectivity for the RPC - should it be done via the normal proxy ports or keep it as a specialized communication channel (e.g. via UNIX domain sockets). Long discussion about this. I think there is more concern about the complexity of the transport than is warrented. I also think even if the overall implementations are different between a custom channel and an HTTP based channel are different, the core implementation is identical. That part is the request parsing, dispatch, and response generation.
Commentary
I have no doubt a proxy port based approach would be quite nice in many ways. My concern is primarily that the additional complexity due to security concerns will be more than the effort to make it work on a custom channel. I understand what is needed for the latter case but we don’t seem to have any designs for how to deal with the security issues for the former.
Kit brought up the issue of traffic_manager keeping the proxy ports open means the listen queue for the proxy ports is also maintained across traffic_server restarts. It would be possible to write a custom wrapper that does this in very little code. Miles pointed out this can also be handled by fronting load balancers, even though they are on different hardware. From the point of view of the inbound client this is irrelevant.
Commentary
I think I should make an attempt at doing this quickly, to try to get the scope better in mind and determine who is correct about the level of effort. At a minimum I can get the parse, dispatch, response logic working which will be required for any eventuality. Also there have been numerous instances where the poor RPC implementation has been a blocker.
Wrapup¶
Discussion on next summits, for Fall 2019 and Spring 2020. Concensus is to aim for first half of October. For Spring 2020, considering London (Apple, VZM), St. Croix (Go Daddy), Cork (Apple), or Trondheim (Vespa).
For ATS 9
Merge QUIC branch to master. Claim “draft compliance” as an experimental feature. Do it the same way as with SPDY and HTTP/2.
YAML cleanup so everything is composable.
Clean up logging, IP Allow, parent config, SSL server name.
Make sure the YAML is consistent with current best practices.
Incompatible changes
Remove file name configuration variables and rely on layout / runroot.
“ssl_multicert.config” cleanup. Hopefully remove this file.
Leif needs to make a list of things to remove.
We should have a real plan got ATS 10 instead of everyone just going wild. Probably branching at ATS 9 in Jun 2019, release in Oct 2019.
- ATS 9.0
QUIC plumbing / draft
- ATS 9.1
JSON-RPC
- ATS 9.2
Add remap YAML on the side.
Overall we need to focus on reliability over features and reduce the feature velocity on master until we can get stable releases out in a reasonable amount of time.
Threads¶
There is an issue where it’s possible to get Continuations skewed across threads. The primary
example in the discussion is a plugin which has its own thread A and calls (for example
TSNetConnect
) which creates a NetVC. But that NetVC needs to run on an ET_NET
. The
problem is this is done by requiring the Coninuation to be locked and extracting the affine thread
from the mutex. It would be better to use thread affinity to get this information. This
unfortunately will need to be done on an ad hoc basis because the various functions are
idiosyncratic about this. It might also be necessary to have a mechanism to set thread affinity
to a pool before scheduling, so that things such as pre thread proxy allocators can be used
correctly (because a thing can’t be scheduled before it’s allocated).
I’m still not sure I really understand what the problem is, or why it’s a new problem. As far as I can tell it either works, or has been problematic for a long time. It’s been on my list for a while to do better documentation on how threads, mutexes, and Continuations should interact inside plugins.
Footnotes¶
- 1
This reminds me very much of Zend Project. I’m not sure why this wasn’t used, but maybe it was.
- 2
“There is an implicit trailing match all with an action that is the opposite of the last explicit action”