Infrastructure Integration¶
Configuration¶
-
Create a read-only user administrator for Epoch. Administrator privileges are required to collect complete server statistics. In the mongo shell, run:
use admin db.auth("admin", "admin-password") db.addUser("epoch", " <Generate Password> ", true)
echo "db.auth('epoch', ' Generate Password ')" | mongo admin | grep -E "(Authentication failed)|(auth fails)" && echo -e "\033[0;31mepoch user - Missing\033[0m" || echo -e "\033[0;32mepoch user - OK\033[0m"
Refer to the MongoDB documentation if you need to create and manage MongoDB users.
-
Configure the agent by editing
/etc/nutanix/epoch-dd-agent/conf.d/tokumx.yaml
in the collectors. Example:init_config: instances: # Specify the MongoDB URI, with database to use for reporting (defaults to "admin") - server: mongodb://localhost:27017 # tags: # - optional_tag1 # - optional_tag2 # Optional SSL parameters, see https://github.com/mongodb/mongo-python-driver/blob/2.6.3/pymongo/mongo_client.py#L193-L212 # for more details # # ssl: False # Optional (default to False) # ssl_keyfile: # Path to the private keyfile used to identify the local # ssl_certfile: # Path to the certificate file used to identify the local connection against mongod. # ssl_cert_reqs: # Specifies whether a certificate is required from the other side of the connection, and whether it will be validated if provided. # ssl_ca_certs: # Path to the ca_certs file
-
Check and make sure that all yaml files are valid with following command:
/etc/init.d/epoch-collectors configcheck
-
Restart the Agent using the following command:
/etc/init.d/epoch-collectors restart
-
Execute the info command to verify that the integration check has passed:
/etc/init.d/epoch-collectors info
The output of the command should contain a section similar to the following:
Checks
======
[...]
tokumx
------
- instance #0 [OK]
- Collected 8 metrics & 0 events
Infrastructure Datasources¶
Datasource | Available Aggregations | Unit | Description |
---|---|---|---|
tokumx.asserts.msgps | avg max min sum |
assertion/second | The number of message assertions raised per second. |
tokumx.asserts.regularps | avg max min sum |
assertion/second | The number of regular assertions raised per second. |
tokumx.asserts.rolloversps | avg max min sum |
assertion/second | The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions. |
tokumx.asserts.userps | avg max min sum |
assertion/second | The number of user assertions raised per second. |
tokumx.asserts.warningps | avg max min sum |
assertion/second | The number of warnings raised per second. |
tokumx.connections.available | avg max min sum |
connection | The number of unused available incoming connections the database can provide. |
tokumx.connections.current | avg max min sum |
connection | The number of connections to the database server from clients. |
tokumx.cursors.timedOut | avg max min sum |
cursor | The total number of cursors that have timed out since the server process started. |
tokumx.cursors.totalOpen | avg max min sum |
cursor | The number of cursors that tokumx is maintaining for clients. |
tokumx.ft.alerts.checkpointFailures | avg max min sum |
event | The number of checkpoints that have failed for any reason. |
tokumx.ft.alerts.locktreeRequestsPending | avg max min sum |
request | The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks. |
tokumx.ft.alerts.longWaitEvents.cachePressure.countps | avg max min sum |
event/second | Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. |
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps | avg max min sum |
event/second | Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick). |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. |
tokumx.ft.alerts.longWaitEvents.fsync.countps | avg max min sum |
event/second | Rate at which fsync operations took more than 1 second. |
tokumx.ft.alerts.longWaitEvents.fsync.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second. |
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps | avg max min sum |
event/second | Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree. |
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree. |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps | avg max min sum |
event/second | Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. |
tokumx.ft.alerts.longWaitEvents.logBufferWaitps | avg max min sum |
event/second | Rate at which a writing client had to wait more than 100ms for access to the log buffer. |
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps | avg max min sum |
byte/second | Rate of full evictions of leaf nodes. |
tokumx.ft.cachetable.evictions.full.leaf.clean.countps | avg max min sum |
event/second | Rate of full evictions of leaf nodes. |
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps | avg max min sum |
byte/second | Rate of full evictions of leaf nodes that need to be written back to disk. |
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps | avg max min sum |
event/second | Rate of full evictions of leaf nodes that need to be written back to disk. |
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps | avg max min sum |
byte/second | Rate of full evictions of nonleaf nodes. |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps | avg max min sum |
event/second | Rate of full evictions of nonleaf nodes. |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps | avg max min sum |
byte/second | Rate of full evictions of nonleaf nodes that need to be written back to disk. |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps | avg max min sum |
event/second | Rate of full evictions of nonleaf nodes that need to be written back to disk. |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. |
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps | avg max min sum |
byte/second | Rate of partial evictions of leaf nodes. |
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps | avg max min sum |
event/second | Rate of partial evictions of leaf nodes. |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps | avg max min sum |
byte/second | Rate of partial evictions of nonleaf nodes. |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps | avg max min sum |
event/second | Rate of partial evictions of nonleaf nodes. |
tokumx.ft.cachetable.miss.countps | avg max min sum |
miss/second | Rate of internal cache misses. This metric is similar to MongoDB’s btree misses and page faults. |
tokumx.ft.cachetable.miss.full.countps | avg max min sum |
miss/second | Rate of full internal cache misses. |
tokumx.ft.cachetable.miss.full.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss. |
tokumx.ft.cachetable.miss.partial.countps | avg max min sum |
miss/second | Rate of partial internal cache misses. |
tokumx.ft.cachetable.miss.partial.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss. |
tokumx.ft.cachetable.miss.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses. |
tokumx.ft.cachetable.size.current | avg max min sum |
byte | Total amount of uncompressed data currently in the database's internal cache. |
tokumx.ft.cachetable.size.limit | avg max min sum |
byte | Total amount of uncompressed data that will fit in TokuMX’s internal cache. |
tokumx.ft.cachetable.size.writing | avg max min sum |
byte | Total size of nodes that are currently queued up to be written to disk for eviction. |
tokumx.ft.checkpoint.begin.timeps | avg max min sum |
fraction | Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. |
tokumx.ft.checkpoint.countps | avg max min sum |
event/second | Rate at which checkpoints are completed. |
tokumx.ft.checkpoint.lastComplete.time | avg max min sum |
second | The time spent, in seconds, by the most recently completed checkpoint. |
tokumx.ft.checkpoint.timeps | avg max min sum |
fraction | Fraction of time (seconds/second) spent doing checkpoints. |
tokumx.ft.checkpoint.write.leaf.bytes.compressedps | avg max min sum |
byte/second | The rate at which leaf nodes are written to disk during checkpoints, after compression. |
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps | avg max min sum |
byte/second | The rate at which leaf nodes are written to disk during checkpoints, before compression. |
tokumx.ft.checkpoint.write.leaf.countps | avg max min sum |
write/second | The rate at which leaf nodes are written to disk during checkpoints. |
tokumx.ft.checkpoint.write.leaf.timeps | avg max min sum |
fraction | The fraction of time spent writing leaf nodes to disk during checkpoints. |
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps | avg max min sum |
byte/second | The rate at which nonleaf nodes are written to disk during checkpoints, after compression. |
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps | avg max min sum |
byte/second | The rate at which nonleaf nodes are written to disk during checkpoints, before compression. |
tokumx.ft.checkpoint.write.nonleaf.countps | avg max min sum |
write/second | The rate at which nonleaf nodes are written to disk during checkpoints. |
tokumx.ft.checkpoint.write.nonleaf.timeps | avg max min sum |
fraction | The fraction of time spent writing nonleaf nodes to disk during checkpoints. |
tokumx.ft.compressionRatio.leaf | avg max min sum |
fraction | The size ratio of leaf nodes before and after compression. |
tokumx.ft.compressionRatio.nonleaf | avg max min sum |
fraction | The size ratio of nonleaf nodes before and after compression. |
tokumx.ft.compressionRatio.overall | avg max min sum |
fraction | The size ratio of nodes before and after compression. |
tokumx.ft.fsync.countps | avg max min sum |
operation/second | The rate at which the database flushed the operating system’s file buffers to disk. |
tokumx.ft.fsync.timeps | avg max min sum |
fraction | The fraction of time (microseconds/second) used to fsync to disk. |
tokumx.ft.locktree.size.current | avg max min sum |
byte | Total memory the locktree is currently using. |
tokumx.ft.locktree.size.limit | avg max min sum |
byte | Maximum number of bytes that the locktree is allowed to use. |
tokumx.ft.log.bytesps | avg max min sum |
byte/second | The rate at which the logger writes to disk. |
tokumx.ft.log.countps | avg max min sum |
write/second | The rate of of individual log writes. |
tokumx.ft.log.timeps | avg max min sum |
fraction | The fraction of time spent performing log writes. |
tokumx.ft.serializeTime.leaf.compressps | avg max min sum |
fraction | Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
tokumx.ft.serializeTime.leaf.decompressps | avg max min sum |
fraction | Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
tokumx.ft.serializeTime.leaf.deserializeps | avg max min sum |
fraction | Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk. |
tokumx.ft.serializeTime.leaf.serializeps | avg max min sum |
fraction | Fraction of time spent serializing leaf nodes and their partitions after reading them off disk. |
tokumx.ft.serializeTime.nonleaf.compressps | avg max min sum |
fraction | Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
tokumx.ft.serializeTime.nonleaf.decompressps | avg max min sum |
fraction | Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
tokumx.ft.serializeTime.nonleaf.deserializeps | avg max min sum |
fraction | Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk. |
tokumx.ft.serializeTime.nonleaf.serializeps | avg max min sum |
fraction | Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk. |
tokumx.mem.resident | avg max min sum |
mebibyte | The amount of memory currently used by the database process. |
tokumx.mem.virtual | avg max min sum |
mebibyte | The amount of virtual memory used by the database process. |
tokumx.metrics.document.deletedps | avg max min sum |
document/second | The number of documents deleted per second. |
tokumx.metrics.document.insertedps | avg max min sum |
document/second | The number of documents inserted per second. |
tokumx.metrics.document.returnedps | avg max min sum |
document/second | The number of documents returned by queries per second. |
tokumx.metrics.document.updatedps | avg max min sum |
document/second | The number of documents updated per second. |
tokumx.metrics.getLastError.wtime.numps | avg max min sum |
operation/second | The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. |
tokumx.metrics.getLastError.wtime.totalMillisps | avg max min sum |
event/second | The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError. |
tokumx.metrics.getLastError.wtimeoutsps | avg max min sum |
fraction | The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. |
tokumx.metrics.operation.idhackps | avg max min sum |
query/second | The rate of queries that contain the _id field. |
tokumx.metrics.operation.scanAndOrderps | avg max min sum |
query/second | The rate of queries that return sorted numbers that cannot perform the sort operation using an index. |
tokumx.metrics.queryExecutor.scannedps | avg max min sum |
operation/second | The rate of index items scanned during queries and query-plan evaluation. |
tokumx.metrics.repl.apply.batches.numps | avg max min sum |
operation/second | The number of batches applied across all databases per second. |
tokumx.metrics.repl.apply.batches.totalMillisps | avg max min sum |
fraction | The fraction of time (ms/s) spent applying operations from the oplog. |
tokumx.metrics.repl.apply.opsps | avg max min sum |
operation/second | The rate of oplog operations. |
tokumx.metrics.repl.buffer.count | avg max min sum |
operation | The number of operations in the oplog buffer. |
tokumx.metrics.repl.buffer.sizeBytes | avg max min sum |
byte | The current size of the contents of the oplog buffer. |
tokumx.metrics.repl.network.bytesps | avg max min sum |
byte/second | The rate at which data is read from the replication sync source. |
tokumx.metrics.repl.network.getmores.numps | avg max min sum |
operation/second | The rate of getmore operations. |
tokumx.metrics.repl.network.getmores.totalMillisps | avg max min sum |
fraction | The fraction of time (ms/s) spent collecting data from getmore operations. |
tokumx.metrics.repl.network.opsps | avg max min sum |
operation/second | The rate of operations read from the replication source. |
tokumx.metrics.repl.network.readersCreatedps | avg max min sum |
process/second | The rate at which oplog query processes are created. |
tokumx.metrics.repl.oplog.insert.numps | avg max min sum |
operation/second | The rate at which operations are inserted into the oplog. |
tokumx.metrics.repl.oplog.insert.totalMillisps | avg max min sum |
fraction | The fraction of time (ms/s) spent inserting operations into the oplog. |
tokumx.metrics.repl.oplog.insertBytesps | avg max min sum |
byte/second | The rate (in bytes) at which data is inserted into the oplog. |
tokumx.metrics.ttl.deletedDocumentsps | avg max min sum |
document/second | The rate at which documents are deleted from collections with a ttl index. |
tokumx.metrics.ttl.passesps | avg max min sum |
event/second | The number of times per second the background process removes documents from collections with a ttl index. |
tokumx.opcounters.commandps | avg max min sum |
command/second | The total number of commands per second issued to the database. |
tokumx.opcounters.deleteps | avg max min sum |
operation/second | The number of delete operations per second. |
tokumx.opcounters.getmoreps | avg max min sum |
operation/second | The number of getmore operations per second. |
tokumx.opcounters.insertps | avg max min sum |
operation/second | The number of insert operations per second. |
tokumx.opcounters.queryps | avg max min sum |
query/second | The total number of queries per second. |
tokumx.opcounters.updateps | avg max min sum |
operation/second | The number of update operations per second. |
tokumx.opcountersRepl.commandps | avg max min sum |
command/second | The total number of replicated commands issued to the database per second. |
tokumx.opcountersRepl.deleteps | avg max min sum |
operation/second | The number of replicated delete operations per second. |
tokumx.opcountersRepl.getmoreps | avg max min sum |
operation/second | The number of replicated getmore operations per second. |
tokumx.opcountersRepl.insertps | avg max min sum |
operation/second | The number of replicated insert operations per second. |
tokumx.opcountersRepl.queryps | avg max min sum |
query/second | The total number of replicated queries per second. |
tokumx.opcountersRepl.updateps | avg max min sum |
operation/second | The number of replicated update operations per second. |
tokumx.stats.coll.count | avg max min sum |
document | The number of objects or documents in this collection. |
tokumx.stats.coll.nindexes | avg max min sum |
index | The number of indexes on this collection. |
tokumx.stats.coll.nindexesbeingbuilt | avg max min sum |
index | The number of indexes currently being built. |
tokumx.stats.coll.size | avg max min sum |
byte | The total size in memory of all records in a collection. Does not include the record header, but does include the record’s padding. Does not include the size of any indexes associated with the collection. |
tokumx.stats.coll.storageSize | avg max min sum |
byte | The total amount of storage allocated to this collection for document storage. |
tokumx.stats.coll.totalIndexSize | avg max min sum |
byte | The total size of all indexes on this collection. |
tokumx.stats.coll.totalIndexStorageSize | avg max min sum |
byte | The total size on disk of all indexes on this collection (after compression). |
tokumx.stats.dataSize | avg max min sum |
byte | The total size of the data held in this database including the padding factor. |
tokumx.stats.db.avgObjSize | avg max min sum |
byte | The average size of each document. |
tokumx.stats.db.collections | avg max min sum |
The number of collections in the database. | |
tokumx.stats.db.dataSize | avg max min sum |
byte | The total size of the data held in this database including the padding factor. |
tokumx.stats.db.indexes | avg max min sum |
index | The total number of indexes across all collections in the database. |
tokumx.stats.db.indexSize | avg max min sum |
byte | The total size of all indexes created on this database. |
tokumx.stats.db.indexStorageSize | avg max min sum |
byte | The total size on disk of all indexes created on this database (after compression). |
tokumx.stats.db.objects | avg max min sum |
document | The number of documents in the database across all collections. |
tokumx.stats.db.storageSize | avg max min sum |
byte | The total amount of space allocated to collections in this database for document storage. |
tokumx.stats.idx.avgObjSize | avg max min sum |
byte | The average size of each index entry. |
tokumx.stats.idx.count | avg max min sum |
index | The number of documents in this index. |
tokumx.stats.idx.deletes | avg max min sum |
operation | The number of delete operations performed on this index. |
tokumx.stats.idx.inserts | avg max min sum |
operation | The number of insert operations performed on this index. |
tokumx.stats.idx.nscanned | avg max min sum |
index | The number of index entries scanned for queries using this index. |
tokumx.stats.idx.nscannedObjects | avg max min sum |
object | The number of collection objects examined after scanning an index entry for a query using this index. |
tokumx.stats.idx.queries | avg max min sum |
query | The number of query operations performed using this index. |
tokumx.stats.idx.size | avg max min sum |
byte | The total size of this index. |
tokumx.stats.idx.storageSize | avg max min sum |
byte | The total size on disk of this index (after compression). |
tokumx.stats.indexes | avg max min sum |
index | The total number of indexes across all collections in the database. |
tokumx.stats.indexSize | avg max min sum |
byte | The total size of all indexes created on this database. |
tokumx.stats.objects | avg max min sum |
document | The number of documents in the database across all collections. |
tokumx.stats.storageSize | avg max min sum |
byte | The total amount of space allocated to collections in this database for document storage. |
tokumx.uptime | avg max min sum |
second | The time that the tokumx process has been active. |