Dynamic Sampling Context (Experimental)
This page is under active development. Specifications are not final and subject to change. Anything that sounds fishy probably is.
Until now, traces sampling was only done through a sample_rate
option in the SDKs.
This has quite a few drawbacks for users of Sentry SDKs:
- Changing the sampling rate involved either redeploying applications (which is problematic in case of applications that are not updated automatically, i.e., mobile apps or physically distributed software) or building complex systems to dynamically fetch a sampling rate.
- Sampling only happened based on a factor of randomness. Employing sampling rules, for example, based on event parameters, is currently very complex. While writing rules for singular transactions is possible, enforcing them on an entire trace is infeasable.
The solution for these problems is Dynamic Sampling. Dynamic Sampling allows users to configure sampling rules directly in the Sentry interface. Important: Sampling rules may also be applied to entire traces.
High-Level Problem Statement
Ingest
Implementing Dynamic Sampling comes with challenges, especially on the ingestion side of things. For Dynamic Sampling, we want to make sampling decisions for entire traces. However, to keep ingestion speedy, Relay only looks at singular transactions in isolation (as opposed to looking at whole traces). This means that we need the exact same decision basis for all transactions belonging to a trace. In other words, all transactions of a trace need to hold all of the information to make a sampling decision, and that information needs to be the same across all transactions of the trace. We call the information we base sampling decisions on "Dynamic Sampling Context" or "DSC".
SDKs
SDKs are responsible for propagating Dynamic Sampling Context across all applications that are part of a trace. This involves:
- Collecting the information that makes up the DSC xor extracting the DSC from incoming requests.
- Propagating DSC to downstream SDKs.
- Sending the DSC to Sentry via an envelope.
Because there are quite a few things to keep in mind for DSC propagation and to avoid every SDK running into the same problems, we defined a unified propagation mechanism (step-by-step instructions) that all SDK implementations should be able to follow.
Baggage
We chose baggage
as the propagation mechanism for DSC. (w3c baggage spec)
Baggage is a standard HTTP header with URI encoded key-value pairs.
For the propagation of DSC, SDKs first read the DSC from the baggage header of incoming requests/messages. To propagate DSC to downstream SDKs/services, we create a baggage header (or modify an existing one) through HTTP request instrumentation.
Other vendors might also be using the baggage
header.
If a baggage
header already exists on an outgoing request, SDKs should aim to be good citizens by only appending Sentry values to the header.
In the case that another vendor added Sentry values to an outgoing request, SDKs may overwrite those values.
SDKs must not add other vendors' baggage from incoming requests to outgoing requests. Sentry SDKs only concern themselves with Sentry baggage.
The following is an example of what a baggage header containing Dynamic Sampling Context may look like:
baggage: other-vendor-value-1=foo;bar;baz, sentry-traceid=771a43a4192642f0b136d5159a501700, sentry-publickey=49d0f7386ad645858ae85020e393bef3; sentry-userid=Am%C3%A9lie, other-vendor-value-2=foo;bar;
See the Payloads section for a complete list of key-value pairs that SDKs should propagate.
Payloads
Dynamic Sampling Context is propagated via a baggage header and sent to Sentry via transaction envelope headers.
Baggage-Header
SDKs may set the following key-value pairs on baggage headers. While all of these values are optional, SDKs should make their best effort to add as many of them to the baggage header as possible when starting a trace.
sentry-traceid
- The original trace ID as generated by the SDKsentry-publickey
- Public key as defined by the user via the DSN in the SDK optionssentry-release
- The release as defined by the user in the SDK optionssentry-environment
- The environment as defined by the user in the SDK optionssentry-transaction
- The name of the trace's origin transaction in unparameterized (raw) formatsentry-userid
- User ID as set by the user withscope.set_user
sentry-usersegment
- User segment as set by the user withscope.set_user
sentry-samplerate
- Sample rate as defined by the user in the SDK options
SDKs must set all of the keys in the form of "sentry-[name]
".
The delimiter "sentry-
" acts to identify key-value pairs set by Sentry SDKs.
Additionally, we chose [name]
to be written in "snake case" without any underscore ( _
) characters. This naming convention is the most language agnostic.
Envelope Header
Dynamic Sampling Context is transferred to Sentry through the transaction envelope headers, keyed by trace
.
It corresponds directly to the definition of Trace Context.
When a transaction is reported to Sentry, the Dynamic Sampling Context must be mapped to Trace Context in the following way:
sentry-release
➝release
sentry-environment
➝environment
sentry-transaction
➝transaction
sentry-userid
➝user.id
sentry-usersegment
➝user.segment
sentry-samplerate
➝sample_rate
sentry-traceid
➝trace_id
sentry-publickey
➝public_key
Unified Propagation Mechanism
SDKs should follow these steps for any incoming and outgoing requests (in python pseudo-code for illustrative purposes):
def collect_dynamic_sampling_context():
# Placeholder function that collects as many values for Dynamic Sampling Context
# as possible and returns a dict
def on_incoming_request(request):
if has_header(request, "sentry-trace") and (not has_header(request, "baggage") or not has_sentry_value_in_baggage_header(request)):
# Request comes from an old SDK which doesn't support Dynamic Sampling Context yet
# --> we don't propagate baggage for this trace
transaction.baggage_locked = true
transaction.baggage = {}
elif has_header(request, "baggage") and has_sentry_value_in_baggage_header(request):
transaction.baggage_locked = true
transaction.baggage = baggage_header_to_dict(request.headers.baggage)
def on_outgoing_request(request):
if not transaction.baggage_locked:
transaction.baggage_locked = true
if not transaction.baggage:
transaction.baggage = {}
transaction.baggage = merge_dicts(collect_dynamic_sampling_context(), transaction.baggage)
if has_header(request, "baggage"):
outgoing_baggage_dict = baggage_header_to_dict(request.headers.baggage)
merged_baggage_dict = merge_dicts(outgoing_baggage_dict, transaction.baggage)
merged_baggage_header = dict_to_baggage_header(merged_baggage_dict)
set_header(request, "baggage", merged_baggage_header)
else:
baggage_header = dict_to_baggage_header(transaction.baggage)
set_header(request, "baggage", baggage_header)
While there is no strict necessity for the transaction.baggage_locked
flag yet, there is a future use case where we need it:
We might want users to be able to set Dynamic Sampling Context values themselves.
The flag becomes relevant after the first propagation, where Dynamic Sampling Context becomes immutable.
When users attempt to set DSC afterwards, our SDKs should make this operation a noop.
Considerations
Todo:
- Why baggage and not trace context https://www.w3.org/TR/trace-context/?
- Why must baggage be immutable before the second transaction has been started?
- Why can't we just make the decision for the whole trace in Relay after the trace is complete?