Skip to content

Infrastructure Integration

Configuration

  1. Create a read-only user administrator for Epoch. Administrator privileges are required to collect complete server statistics. In the mongo shell, run:

    use admin
    db.auth("admin", "admin-password")
    db.addUser("epoch", "
        <Generate Password>
    ", true)
    


    echo "db.auth('epoch', '
        Generate Password
    ')" | mongo admin | grep -E "(Authentication failed)|(auth fails)" &&
    echo -e "\033[0;31mepoch user - Missing\033[0m" || echo -e "\033[0;32mepoch user - OK\033[0m"
    

Refer to the MongoDB documentation if you need to create and manage MongoDB users.

  1. Configure the agent by editing /etc/nutanix/epoch-dd-agent/conf.d/tokumx.yamlin the collectors. Example:

    init_config:
    
    instances:
      # Specify the MongoDB URI, with database to use for reporting (defaults to "admin")
      - server: mongodb://localhost:27017
        # tags:
        #   - optional_tag1
        #   - optional_tag2
    
        # Optional SSL parameters, see https://github.com/mongodb/mongo-python-driver/blob/2.6.3/pymongo/mongo_client.py#L193-L212
        # for more details
        #
        # ssl: False # Optional (default to False)
        # ssl_keyfile: # Path to the private keyfile used to identify the local
        # ssl_certfile: # Path to the certificate file used to identify the local connection against mongod.
        # ssl_cert_reqs: # Specifies whether a certificate is required from the other side of the connection, and whether it will be validated if provided.
        # ssl_ca_certs: #  Path to the ca_certs file
    
  2. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/epoch-collectors configcheck
    
  3. Restart the Agent using the following command:

    /etc/init.d/epoch-collectors restart
    
  4. Execute the info command to verify that the integration check has passed:

    /etc/init.d/epoch-collectors info
    

The output of the command should contain a section similar to the following:

    Checks
    ======

      [...]

      tokumx
      ------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

Infrastructure Datasources

Datasource Available Aggregations Unit Description
tokumx.asserts.msgps avg max min sum assertion/second The number of message assertions raised per second.
tokumx.asserts.regularps avg max min sum assertion/second The number of regular assertions raised per second.
tokumx.asserts.rolloversps avg max min sum assertion/second The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions.
tokumx.asserts.userps avg max min sum assertion/second The number of user assertions raised per second.
tokumx.asserts.warningps avg max min sum assertion/second The number of warnings raised per second.
tokumx.connections.available avg max min sum connection The number of unused available incoming connections the database can provide.
tokumx.connections.current avg max min sum connection The number of connections to the database server from clients.
tokumx.cursors.timedOut avg max min sum cursor The total number of cursors that have timed out since the server process started.
tokumx.cursors.totalOpen avg max min sum cursor The number of cursors that tokumx is maintaining for clients.
tokumx.ft.alerts.checkpointFailures avg max min sum event The number of checkpoints that have failed for any reason.
tokumx.ft.alerts.locktreeRequestsPending avg max min sum request The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks.
tokumx.ft.alerts.longWaitEvents.cachePressure.countps avg max min sum event/second Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps avg max min sum fraction Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps avg max min sum event/second Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick).
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps avg max min sum fraction Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
tokumx.ft.alerts.longWaitEvents.fsync.countps avg max min sum event/second Rate at which fsync operations took more than 1 second.
tokumx.ft.alerts.longWaitEvents.fsync.timeps avg max min sum fraction Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second.
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps avg max min sum event/second Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree.
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps avg max min sum fraction Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree.
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps avg max min sum event/second Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps avg max min sum fraction Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
tokumx.ft.alerts.longWaitEvents.logBufferWaitps avg max min sum event/second Rate at which a writing client had to wait more than 100ms for access to the log buffer.
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps avg max min sum byte/second Rate of full evictions of leaf nodes.
tokumx.ft.cachetable.evictions.full.leaf.clean.countps avg max min sum event/second Rate of full evictions of leaf nodes.
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps avg max min sum byte/second Rate of full evictions of leaf nodes that need to be written back to disk.
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps avg max min sum event/second Rate of full evictions of leaf nodes that need to be written back to disk.
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps avg max min sum fraction Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps avg max min sum byte/second Rate of full evictions of nonleaf nodes.
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps avg max min sum event/second Rate of full evictions of nonleaf nodes.
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps avg max min sum byte/second Rate of full evictions of nonleaf nodes that need to be written back to disk.
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps avg max min sum event/second Rate of full evictions of nonleaf nodes that need to be written back to disk.
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps avg max min sum fraction Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps avg max min sum byte/second Rate of partial evictions of leaf nodes.
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps avg max min sum event/second Rate of partial evictions of leaf nodes.
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps avg max min sum byte/second Rate of partial evictions of nonleaf nodes.
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps avg max min sum event/second Rate of partial evictions of nonleaf nodes.
tokumx.ft.cachetable.miss.countps avg max min sum miss/second Rate of internal cache misses. This metric is similar to MongoDB’s btree misses and page faults.
tokumx.ft.cachetable.miss.full.countps avg max min sum miss/second Rate of full internal cache misses.
tokumx.ft.cachetable.miss.full.timeps avg max min sum fraction Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss.
tokumx.ft.cachetable.miss.partial.countps avg max min sum miss/second Rate of partial internal cache misses.
tokumx.ft.cachetable.miss.partial.timeps avg max min sum fraction Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss.
tokumx.ft.cachetable.miss.timeps avg max min sum fraction Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses.
tokumx.ft.cachetable.size.current avg max min sum byte Total amount of uncompressed data currently in the database's internal cache.
tokumx.ft.cachetable.size.limit avg max min sum byte Total amount of uncompressed data that will fit in TokuMX’s internal cache.
tokumx.ft.cachetable.size.writing avg max min sum byte Total size of nodes that are currently queued up to be written to disk for eviction.
tokumx.ft.checkpoint.begin.timeps avg max min sum fraction Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
tokumx.ft.checkpoint.countps avg max min sum event/second Rate at which checkpoints are completed.
tokumx.ft.checkpoint.lastComplete.time avg max min sum second The time spent, in seconds, by the most recently completed checkpoint.
tokumx.ft.checkpoint.timeps avg max min sum fraction Fraction of time (seconds/second) spent doing checkpoints.
tokumx.ft.checkpoint.write.leaf.bytes.compressedps avg max min sum byte/second The rate at which leaf nodes are written to disk during checkpoints, after compression.
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps avg max min sum byte/second The rate at which leaf nodes are written to disk during checkpoints, before compression.
tokumx.ft.checkpoint.write.leaf.countps avg max min sum write/second The rate at which leaf nodes are written to disk during checkpoints.
tokumx.ft.checkpoint.write.leaf.timeps avg max min sum fraction The fraction of time spent writing leaf nodes to disk during checkpoints.
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps avg max min sum byte/second The rate at which nonleaf nodes are written to disk during checkpoints, after compression.
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps avg max min sum byte/second The rate at which nonleaf nodes are written to disk during checkpoints, before compression.
tokumx.ft.checkpoint.write.nonleaf.countps avg max min sum write/second The rate at which nonleaf nodes are written to disk during checkpoints.
tokumx.ft.checkpoint.write.nonleaf.timeps avg max min sum fraction The fraction of time spent writing nonleaf nodes to disk during checkpoints.
tokumx.ft.compressionRatio.leaf avg max min sum fraction The size ratio of leaf nodes before and after compression.
tokumx.ft.compressionRatio.nonleaf avg max min sum fraction The size ratio of nonleaf nodes before and after compression.
tokumx.ft.compressionRatio.overall avg max min sum fraction The size ratio of nodes before and after compression.
tokumx.ft.fsync.countps avg max min sum operation/second The rate at which the database flushed the operating system’s file buffers to disk.
tokumx.ft.fsync.timeps avg max min sum fraction The fraction of time (microseconds/second) used to fsync to disk.
tokumx.ft.locktree.size.current avg max min sum byte Total memory the locktree is currently using.
tokumx.ft.locktree.size.limit avg max min sum byte Maximum number of bytes that the locktree is allowed to use.
tokumx.ft.log.bytesps avg max min sum byte/second The rate at which the logger writes to disk.
tokumx.ft.log.countps avg max min sum write/second The rate of of individual log writes.
tokumx.ft.log.timeps avg max min sum fraction The fraction of time spent performing log writes.
tokumx.ft.serializeTime.leaf.compressps avg max min sum fraction Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
tokumx.ft.serializeTime.leaf.decompressps avg max min sum fraction Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
tokumx.ft.serializeTime.leaf.deserializeps avg max min sum fraction Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk.
tokumx.ft.serializeTime.leaf.serializeps avg max min sum fraction Fraction of time spent serializing leaf nodes and their partitions after reading them off disk.
tokumx.ft.serializeTime.nonleaf.compressps avg max min sum fraction Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
tokumx.ft.serializeTime.nonleaf.decompressps avg max min sum fraction Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
tokumx.ft.serializeTime.nonleaf.deserializeps avg max min sum fraction Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk.
tokumx.ft.serializeTime.nonleaf.serializeps avg max min sum fraction Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk.
tokumx.mem.resident avg max min sum mebibyte The amount of memory currently used by the database process.
tokumx.mem.virtual avg max min sum mebibyte The amount of virtual memory used by the database process.
tokumx.metrics.document.deletedps avg max min sum document/second The number of documents deleted per second.
tokumx.metrics.document.insertedps avg max min sum document/second The number of documents inserted per second.
tokumx.metrics.document.returnedps avg max min sum document/second The number of documents returned by queries per second.
tokumx.metrics.document.updatedps avg max min sum document/second The number of documents updated per second.
tokumx.metrics.getLastError.wtime.numps avg max min sum operation/second The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
tokumx.metrics.getLastError.wtime.totalMillisps avg max min sum event/second The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError.
tokumx.metrics.getLastError.wtimeoutsps avg max min sum fraction The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
tokumx.metrics.operation.idhackps avg max min sum query/second The rate of queries that contain the _id field.
tokumx.metrics.operation.scanAndOrderps avg max min sum query/second The rate of queries that return sorted numbers that cannot perform the sort operation using an index.
tokumx.metrics.queryExecutor.scannedps avg max min sum operation/second The rate of index items scanned during queries and query-plan evaluation.
tokumx.metrics.repl.apply.batches.numps avg max min sum operation/second The number of batches applied across all databases per second.
tokumx.metrics.repl.apply.batches.totalMillisps avg max min sum fraction The fraction of time (ms/s) spent applying operations from the oplog.
tokumx.metrics.repl.apply.opsps avg max min sum operation/second The rate of oplog operations.
tokumx.metrics.repl.buffer.count avg max min sum operation The number of operations in the oplog buffer.
tokumx.metrics.repl.buffer.sizeBytes avg max min sum byte The current size of the contents of the oplog buffer.
tokumx.metrics.repl.network.bytesps avg max min sum byte/second The rate at which data is read from the replication sync source.
tokumx.metrics.repl.network.getmores.numps avg max min sum operation/second The rate of getmore operations.
tokumx.metrics.repl.network.getmores.totalMillisps avg max min sum fraction The fraction of time (ms/s) spent collecting data from getmore operations.
tokumx.metrics.repl.network.opsps avg max min sum operation/second The rate of operations read from the replication source.
tokumx.metrics.repl.network.readersCreatedps avg max min sum process/second The rate at which oplog query processes are created.
tokumx.metrics.repl.oplog.insert.numps avg max min sum operation/second The rate at which operations are inserted into the oplog.
tokumx.metrics.repl.oplog.insert.totalMillisps avg max min sum fraction The fraction of time (ms/s) spent inserting operations into the oplog.
tokumx.metrics.repl.oplog.insertBytesps avg max min sum byte/second The rate (in bytes) at which data is inserted into the oplog.
tokumx.metrics.ttl.deletedDocumentsps avg max min sum document/second The rate at which documents are deleted from collections with a ttl index.
tokumx.metrics.ttl.passesps avg max min sum event/second The number of times per second the background process removes documents from collections with a ttl index.
tokumx.opcounters.commandps avg max min sum command/second The total number of commands per second issued to the database.
tokumx.opcounters.deleteps avg max min sum operation/second The number of delete operations per second.
tokumx.opcounters.getmoreps avg max min sum operation/second The number of getmore operations per second.
tokumx.opcounters.insertps avg max min sum operation/second The number of insert operations per second.
tokumx.opcounters.queryps avg max min sum query/second The total number of queries per second.
tokumx.opcounters.updateps avg max min sum operation/second The number of update operations per second.
tokumx.opcountersRepl.commandps avg max min sum command/second The total number of replicated commands issued to the database per second.
tokumx.opcountersRepl.deleteps avg max min sum operation/second The number of replicated delete operations per second.
tokumx.opcountersRepl.getmoreps avg max min sum operation/second The number of replicated getmore operations per second.
tokumx.opcountersRepl.insertps avg max min sum operation/second The number of replicated insert operations per second.
tokumx.opcountersRepl.queryps avg max min sum query/second The total number of replicated queries per second.
tokumx.opcountersRepl.updateps avg max min sum operation/second The number of replicated update operations per second.
tokumx.stats.coll.count avg max min sum document The number of objects or documents in this collection.
tokumx.stats.coll.nindexes avg max min sum index The number of indexes on this collection.
tokumx.stats.coll.nindexesbeingbuilt avg max min sum index The number of indexes currently being built.
tokumx.stats.coll.size avg max min sum byte The total size in memory of all records in a collection. Does not include the record header, but does include the record’s padding. Does not include the size of any indexes associated with the collection.
tokumx.stats.coll.storageSize avg max min sum byte The total amount of storage allocated to this collection for document storage.
tokumx.stats.coll.totalIndexSize avg max min sum byte The total size of all indexes on this collection.
tokumx.stats.coll.totalIndexStorageSize avg max min sum byte The total size on disk of all indexes on this collection (after compression).
tokumx.stats.dataSize avg max min sum byte The total size of the data held in this database including the padding factor.
tokumx.stats.db.avgObjSize avg max min sum byte The average size of each document.
tokumx.stats.db.collections avg max min sum The number of collections in the database.
tokumx.stats.db.dataSize avg max min sum byte The total size of the data held in this database including the padding factor.
tokumx.stats.db.indexes avg max min sum index The total number of indexes across all collections in the database.
tokumx.stats.db.indexSize avg max min sum byte The total size of all indexes created on this database.
tokumx.stats.db.indexStorageSize avg max min sum byte The total size on disk of all indexes created on this database (after compression).
tokumx.stats.db.objects avg max min sum document The number of documents in the database across all collections.
tokumx.stats.db.storageSize avg max min sum byte The total amount of space allocated to collections in this database for document storage.
tokumx.stats.idx.avgObjSize avg max min sum byte The average size of each index entry.
tokumx.stats.idx.count avg max min sum index The number of documents in this index.
tokumx.stats.idx.deletes avg max min sum operation The number of delete operations performed on this index.
tokumx.stats.idx.inserts avg max min sum operation The number of insert operations performed on this index.
tokumx.stats.idx.nscanned avg max min sum index The number of index entries scanned for queries using this index.
tokumx.stats.idx.nscannedObjects avg max min sum object The number of collection objects examined after scanning an index entry for a query using this index.
tokumx.stats.idx.queries avg max min sum query The number of query operations performed using this index.
tokumx.stats.idx.size avg max min sum byte The total size of this index.
tokumx.stats.idx.storageSize avg max min sum byte The total size on disk of this index (after compression).
tokumx.stats.indexes avg max min sum index The total number of indexes across all collections in the database.
tokumx.stats.indexSize avg max min sum byte The total size of all indexes created on this database.
tokumx.stats.objects avg max min sum document The number of documents in the database across all collections.
tokumx.stats.storageSize avg max min sum byte The total amount of space allocated to collections in this database for document storage.
tokumx.uptime avg max min sum second The time that the tokumx process has been active.