This repository contains Orchestrator, which performs two main functions:
- Orchestration of monitor runs.
- Forwarding of in-process monitoring data it receives.
The agent is written in Elixir and distributed both as a stand-alone executable and a container.
Based on monitor configurations maintained on the Metrist backend, Orchestrator will schedule runs of checks for certain monitors. It is configured with an instance id that allows the backend to keep track of when something was last run and one or more "run groups" that allow the backend to decide what monitors and checks are configured to run on that particular instance of Orchestrator.
When a monitor is up for its run, it is downloaded from a Metrist-managed S3 bucket so that the latest version of a monitor is always executed; it is then started and the monitor is expected to participate in a simple procotol to exchange configuration data and have Orchestrator drive the monitoring code through the configured scenario. For every step, a timing is obtained and Orchestrator sends that back to the Metrist backend.
Metrist Orchestrator comes with a handler for in-process monitoring. For every Metrist In-Process Agent (IPA) message it receives, it will try to match the messages against its configuration to see whether it needs to be forwarded to the Metrist backend.
IPA messages consist of four fields: the HTTP method, the url, the path, and the time it took for the HTTP transaction to complete. A configuration file can be specified by pointing the environment variable
METRIST_CMA_CONFIG to a Yaml file with contents similar to this snippet:
patterns: braintree.Transaction: method: any host: api.*.braintreegateway.com url: /transaction$
You can specify as many patterns as you like. The key is in the format "monitor-name.check-name", both of which you can obtain from our web UI.
url can both be left out or for clarity specified as
"any" in which case everything matches. All three fields are regular expressions that are matched against the corresponding fields in the IPA message. If it matches, the measured value will be sent to the Metrist backend.
Metrist Orchestrator is configured through environment variables:
METRIST_INSTANCE_ID- this is the instance id used for reporting. It can be any logical name, but should be unique and consistent between runs as the backend will use this to supply the instance with the timings of last monitoring runs.
METRIST_RUN_GROUPS- one or more "run groups" this monitor will schedule. When more than one, a comma-separated list. This can be used to have several instances of monitors run some same set of monitors.
METRIST_CLEANUP_ENABLED- if set, a flag that determines whether to run cleanup actions. Monitors can have a "Cleanup" action that usually is there to remove artefacts of previous runs which these runs could not remove themselves (because of a crash or a provider outage, for example). Because these operations can be expensive, it is best to only schedule them on a subset of instances.
METRIST_SECRETS_SOURCE- when monitors need secrets like API keys, a pointer to the secrets source. Currently only "aws" is supported (and the default), which will try to retrieve secrets from AWS Secrets Manager.
METRIST_CMA_CONFIG- the agent configuration file, currently only used for in-process forwarding patterns as described above.
METRIST_LOGGING_LEVEL- the level to log at; usually the "Info" default is fine but sometimes "Debug" makes sense, and "Error" can be used to make the process less talkative. "Notice", "Warning", "Critical", "Alert" and "Emergency" are also accepted options but will usually not make too much of a difference and might not be supported by all monitors that also interpret this variable.
METRIST_IPA_LOOPBACK_ONLY- whether to open the UDP socket for in-process data only on the loopback/localhost address. This can be used to restrict this sort of traffic to only the local machine. Off by default which means that the "wildcard" address is bound, making the UDP socket accessible to all machines that can route to the instance.
METRIST_IPA_SERVER_PORT- sets a port through which telemetry data is to be sent. The default is port 51712.