Storing Logs
- Setting up a datastore
- Importing a dataset
- Processing multiple log folders
- Listing and querying datasets
Given the convention of putting all generated data in a Syskit instance log directory, all data that needs to be stored about a given Syskit execution is present in a single folder. Saving that data means copying that folder.
This is the simplest method to save a successful mission's data: copy the whole Syskit log folder to save it.
However, the tools/syskit-log
package also offers a way to normalize data in a
datastore. In a datastore, the common data of a Syskit run (i.e. Syskit event
log, component properties and output ports) are converted in a normalized form,
creating a dataset. Datasets are immutable, given an immutable ID and can safely
be copied across machines.
All commands related to stores are under the syskit ds
command. See syskit help ds
for a list.
Data export and analysis functionality from syskit-log rely on data being converted to a normalized dataset.
Setting up a datastore
A syskit-log datastore is a simple local folder. Just create it. syskit ds
subcommands may be given a datastore explicitly with the --store
option or,
preferrably, one sets a global datastore using the SYSKIT_LOG_STORE environment
variable.
Importing a dataset
To import a dataset, copy the data from your system and process it using syskit
ds import
. Using rsync, it would look like
rsync -r --compress REMOTE_URL:/path/to/logs/current .
syskit ds import current "Description of this dataset" \
--tags a list of tags to refer to the dataset later
Within the store, datasets themselves are stored in the core/
folder, under
their full ID. Each dataset has a syskit-dataset.yml
file that contains the
identity information for that set (i.e. the hash of the files are used to create
the set ID) as well as the Syskit event log. A pocolog
folder contains the
output log files, normalized to a single file per stream, named as
TASK_NAME::PORT_NAME.0.log
.
All other files that were contained in the original folder(s) are stored either
in the text/
folder (if they are text files) or in the ignored/
folder.
Processing multiple log folders
Each Syskit run creates a new dataset folder. During a day of operation, it is often the case that multiple datasets have been created. Let's assume you have copied them all in a single (originally empty) local folder with:
rsync -r --compress REMOTE_URL:/path/to/logs/ .
You may decide to import them all separately in a single run using the --auto
parameter. This will create one dataset per subfolder.
syskit ds import --auto .
Alternatively, all created datasets from the same Syskit app can be imported and
processed together using the --merge
option to import
. It will create a single
dataset that can be analyzed as a single one later.
syskit ds import --merge .
Listing and querying datasets
The syskit ds list
command will list all datasets currently present in the store,
listed by increasing date (oldest first). The command also accepts ways to restrict
the datasets using its QUERY parameter.
The query is a list of keyOPvalue
arguments, where key
is one of the metadata
keys (as shown by list
without arguments) and OP
is either =
for strict
equality or ~
for matching (in which case value
is interpreted as a regular
expression).
Metadata of note is roby:time
, which is a timestamp of the form
YYYYMMDD-HHMM
. One can for instance show all datasets from August 2020 with
syskit ds list roby:time~202010
Once you narrowed down the list of datasets to show, the --pocolog
argument
will display all the data streams available within the dataset.
Alternatively to list
, find-streams
allows to look for specific data streams.
For instance, to look for all /base/Time
streams do
syskit ds find-streams type=/base/Time
The --ds-filter
argument allows to filter datasets the same way than list
does, i.e. to see all /base/samples/RigidBodyState
streams generated during
August 2020,
syskit ds find-streams type~/base/samples/RigidBodyState --ds-filter roby:time~202010
Opaque types, and types that are
derived from them (e.g. structure that have opaques as fields), are stored under
the type name of the intermediate type and not the original type name. For
instance, /base/samples/RigidBodyState
is actually stored as
/base/samples/RigidBodyState_m
in the log files. If a call to find-streams
does not return any result, check a single dataset to find out whether you
are referring to the right type.
See syskit ds help find-streams
for more details.