vector
"A lightweight, ultra-fast tool for building observability pipelines" - https://vector.dev
You can think of vector as being an replacement for fluentd or fluentbit. It is great for reading inputs, transforming those inputs, and sending those inputs elsewhere. EG: for reading logs and shipping them.
Links
Examples
Show the supported sources, transforms, sinks
I'm not going to paste them here because the list is long and likely would be different depending on your version, but you can view them via:
vector list
The list as of vector 0.22.0 includes things from aws, gcp, splunk, prometheus, kafka, influxdb, elasticsearch, azure, and more.
Spawn a process and handle its stdout and stderr
One problem with reading stdout and stderr in linux is that those are two different file handles, so you have to handle them as such. Having a tool to aggregate them back into a single stream with annotations about what stream they were taken from is great. This example shows how to use vector to spawn a subprocess, remove some fields, and print to stdout:
#!/bin/bash
# Filename: /tmp/stream-test.sh
for _ in {1..5} ; do
echo "This is stdout"
echo "This is stderr" >&2
sleep 0.$(( RANDOM ))
done
The default config file format is toml, but the below example uses yaml because it is my preference. You can convert between them with dasel
.
# Filename: vector.yaml
---
# https://vector.dev/docs/reference/configuration/sources/exec
sources:
exec:
command:
- /tmp/stream-test.sh
decoding:
codec: bytes
mode: streaming
streaming:
respawn_on_exit: False
type: exec
# https://vector.dev/docs/reference/configuration/transforms
transforms:
remove_exec_fields:
inputs:
- exec
# https://vector.dev/docs/reference/vrl/
source: |-
del(.command)
del(.host)
del(.source_type)
type: remap
# https://vector.dev/docs/reference/configuration/sinks/console
sinks:
print:
encoding:
codec: json
inputs:
- remove_exec_fields
type: console
$ vector --config vector.yaml
2022-06-01T21:29:35.914895Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,kube=info"
2022-06-01T21:29:35.915019Z INFO vector::app: Loading configs. paths=["vector.yaml"]
2022-06-01T21:29:35.916968Z INFO vector::topology::running: Running healthchecks.
2022-06-01T21:29:35.917095Z INFO vector: Vector has started. debug="false" version="0.22.0" arch="x86_64" build_id="5e937e3 2022-06-01"
2022-06-01T21:29:35.917138Z INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2022-06-01T21:29:35.917152Z INFO vector::topology::builder: Healthcheck: Passed.
{"message":"This is stderr","pid":2470931,"stream":"stderr","timestamp":"2022-06-01T21:29:35.918778044Z"}
{"message":"This is stdout","pid":2470931,"stream":"stdout","timestamp":"2022-06-01T21:29:35.918821210Z"}
{"message":"This is stderr","pid":2470931,"stream":"stderr","timestamp":"2022-06-01T21:29:36.679150968Z"}
{"message":"This is stdout","pid":2470931,"stream":"stdout","timestamp":"2022-06-01T21:29:36.679193905Z"}
{"message":"This is stderr","pid":2470931,"stream":"stderr","timestamp":"2022-06-01T21:29:36.959284295Z"}
{"message":"This is stdout","pid":2470931,"stream":"stdout","timestamp":"2022-06-01T21:29:36.959315187Z"}
{"message":"This is stdout","pid":2470931,"stream":"stdout","timestamp":"2022-06-01T21:29:37.124459926Z"}
{"message":"This is stderr","pid":2470931,"stream":"stderr","timestamp":"2022-06-01T21:29:37.124598441Z"}
{"message":"This is stderr","pid":2470931,"stream":"stderr","timestamp":"2022-06-01T21:29:37.241035793Z"}
{"message":"This is stdout","pid":2470931,"stream":"stdout","timestamp":"2022-06-01T21:29:37.241074381Z"}
2022-06-01T21:29:37.484711Z INFO vector::shutdown: All sources have finished.
2022-06-01T21:29:37.484751Z INFO vector: Vector has stopped.
Even in the above example you can see how difficult it is to aggregate stdout and stderr with accurate order. In the script, stderr always comes second, but in all but one of these iterations, stderr was handled before stdout. This is not a problem of vector, this is a fundamental posix problem due to stderr and stdout having separate streams. However, vector seems to have a method for handling this when a timestamp shows up in the stream. If I replace echo
with date "+%FT%T%z.%N foo"
in both streams, they are consistently in-order. Of course, another way to handle this is to output logs as structured data with the timestamp right from the source, but you will not always have control over the source log format.
Another aspect of this setup is you can use vector as a type of init system, because you can set sources.exec.streaming.respawn_on_exit = true
which will re-launch the process if it dies for some reason.
Tap a running vector instance
https://vector.dev/guides/level-up/vector-tap-guide/
Vector has a feature called tap
that lets you hook into an running instance and see what is coming through. You can enable this in your vector config via:
# Filename: vector.toml
[api]
enabled = true
Then simply
vector tap
This shows pre-transform inputs, and outputs, which is useful when you are not seeing the output you expect because you can see the before and after right next to each other. There are also some further arguments you can pass to vector tap
that let you filter so you can see specific inputs or outputs. See vector tap --help
for those syntaxes.
Debug syntax using a repl
https://vector.dev/docs/reference/vrl/
Vector has a repl feature that can be use for developing configs and debugging. Launch it with vector vrl
. Once inside, type help
to get guidance on how to proceed.