Logs
CubeAPM supports aggregating logs from a wide range of agents - Elastic (Logstash, Fluent, etc.), Loki, OpenTelemetry, Vector, and more. It can ingest structured as well as unstructured logs.
Each log entry in CubeAPM is stored as an object with string keys and string values, e.g.,
{
"_time": "2024-11-06T13:42:05.234Z",
"service": "order-service",
"host.name": "ip-10-0-129-151",
"host.ip": "10.0.129.151",
"trace_id": "22cba212a1b34d49b009a22fb13b82ee",
"_msg": "order rejected as some items are out of stock"
}
If the incoming log has nested structure, it is flattened. For example, the below incoming object will be converted to the one above:
{
"_time": "2024-11-06T13:42:05.234Z",
"service": "order-service",
"host": {
"name": "ip-10-0-129-151",
"ip": "10.0.129.151"
},
"trace_id": "22cba212a1b34d49b009a22fb13b82ee",
"_msg": "order rejected as some items are out of stock"
}
The keys in the object are called fields of the logs. There can be any arbitrary fields in the log message.
CubeAPM indexes each field in the logs to provide full text search.
Special fields
Some field names have special significance in CubeAPM. These are described below:
_msg
This field provides support for unstructured logs. For example, an unstructured log entry (or some unstructured part within an otherwise structured log) can be ingested as:
{ "_msg": "this is an unstructured log message" }
If the message field has some other name than _msg, it can be specified via _msg_field URL query parameter or via Cube-Msg-Field HTTP header in the data ingestion request. For example, if the log message is located in the event.description field, then specify _msg_field=event.description URL query parameter.
_msg field is shown as Text column in CubeAPM Logs search page.
_time
If the ingested log contains a field with this name, its value is taken as the timestamp of the log entry. Otherwise the ingestion timestamp is used. The value must be formatted in ISO8601, RFC3339, or Unix timestamp (seconds or milliseconds).
If the time field has some other name than _time, it can be specified via _time_field URL query parameter or via Cube-Time-Field HTTP header in the data ingestion request.
Partitions
Internally, CubeAPM organizes logs into partitions. Taking the analogy of filesystems, partitions can be thought of as folders. CubeAPM automatically creates partitions based on calendar dates, e.g., all logs of date 2026-01-24 go in one partition, while all logs of date 2026-01-25 go in a different partition.
While querying logs over a specified time range, CubeAPM automatically decides the corresponding partition(s) and looks for data in only those partitions. Hence, querying one hour of data from 6 months ago is as fast as querying 1 hour of data from today.
Streams
Within each partition, data is further organized into multiple streams. Taking the analogy of filesystems further, partitions can be thought of as folders, streams can be thought of as files and individual log lines can be thought of as lines in the files.
Like partitions, streams are also created automatically by CubeAPM. But unlike partitions, the choice of what constitutes a stream is decided solely by the users. The general guideline is to use streams to organize related logs together. For example, all logs with k8s.namespace.name: production AND service.name: order-service AND log.level: ERROR can be grouped together as one stream.
Stream fields
CubeAPM organizes logs into streams based on the values of the log fields which are designated as stream fields. Fields can be designated as stream fields by specifying them in _stream_fields url query parameter or in Cube-Stream-Fields HTTP Header while sending logs to CubeAPM (see Logs Ingestion section for more details).
For each log entry, a combination of the values of the corresponding designated stream fields uniquely identifies the stream to which the log entry belongs. For example, if service.name, and log.level are designated as stream fields, then {service.name="order-service", log.level="ERROR"} and {service.name="order-service", log.level="DEBUG"} are different streams.
Organizing logs by streams is not mandatory but it is highly recommended as it greatly helps in improving search experience as well as performance. For example:
-
Streams can bring logs with similar contents together. For example, logs generated by one application will be more similar to each other then logs coming from different applications. If logs have similar content, CubeAPM can compress them significantly better while storing them on disk. Better compression directly translates to better performance and lower cost.
-
Referring to the filesystem analogy again, if queries refer to certain streams, then CubeAPM and limit search to select files only, and this speeds up the queries significantly.
-
CubeAPM also shows stream fields and corresponding values in the
Filterssection in the Logs search page, making it very convenient to select streams.
env is always considered a stream field in CubeAPM. So, if using multi-environment capability of CubeAPM, logs belonging to different environments will always go into different streams.
Choosing stream fields
The choice of stream fields can have huge impact on logs search experience as well as performance. So it's important to choose the right stream fields. Choosing the right stream fields is easy. Here are the guidelines:
-
Stream fields should meaningfully group related logs together, so that stream fields are useful in search queries. For example, deployment environment (staging, prod, etc.), application name, and log level are good candidates for stream fields, bacause organize logs in a manner that people naturally relate to.
-
There should ideally be 3-10 stream fields. Having too few causes CubeAPM to scan lots of data for each query, and having too many clutters the
Filterssection on the Logs search page. The limits are not rigid though. For example, CubeAPM will work perfectly fine with even 15 stream fields. But 100 is going to cause serious inconvenience. -
Each stream field should have 5-1000 unique values. Having too many (10k+) unique values in any stream field degrades performance of CubeAPM as CubeAPM ends up spending too much resources in stream management alone (e.g keeping indexes up to date).
Note that it's ok for a stream to have millions (or even billions) of log entries, so streams should be broad and not very granular.
Fields with low cardinality (i.e. less number of distinct values) are good candidates for using as stream fields, e.g., service.name, service.version, etc.
Fields with high cardinality (i.e. large number of unique values) are bad candidates for using as stream fields, e.g., trace_id, ip_address, timestamp, etc. Using high cardinality fields as stream fields can significantly deteriorate the performance of CubeAPM.
Note that CubeAPM is fast at searching over non-stream fields as well, so high cardinality fields can be searched even without designating them as stream fields.