Term
|
Definition
| Smallest accessible unit in MapFS. Volumes are logical partitions of the filesystem. Each volume is comprised of one or more data containers. Also contains metadata container about data in all other containers |
|
|
Term
| MetaData Container (Logic of location) |
|
Definition
| Container held in each volume providing infomration about the data in other containers in volume. Using container in volume ensure meta data is replicated across all nodes in volume |
|
|
Term
|
Definition
| a service that runs on all nodes to manage monitor and report on services in EACH node |
|
|
Term
|
Definition
| A service used to coordinate services running across multiple nodes. Zookeeper prevents service conflicts by enforcing rules and conditions to determin which instance of service is the master |
|
|
Term
|
Definition
| Warden will not start a process on zookeeper until a quorom of sookeepers are active |
|
|
Term
| Does MapR require HBASE master or region servers? |
|
Definition
| No, if MAPR-DB is using only relational files |
|
|
Term
|
Definition
| "Container Location Database" service that runs across multiple cluster nodes, provides directory of container locations |
|
|
Term
| Can MapR-DB be used for structured and unstructured data? |
|
Definition
| Yes, and allows both in a single cluster. MapR-DB available in both community and Enterprise editions |
|
|
Term
|
Definition
| Mapreduce Service. Hadoop Task tracker starts and tracks mapreduce tasks on a node. Task Tracker service receives task assignments from the job tracker service and manages task execution |
|
|
Term
|
Definition
| YARN Hadoop MapReduce Service. Manages node resources and monitors health of node works with ResourceManager to manage YARN containers that run on node |
|
|
Term
|
Definition
| MapR Service. Manages Disk storage to Mapr-FS and Mapr-DB on each node. |
|
|
Term
|
Definition
| Coordinates data storage services among MapR-FS fileserver nodes, MapR NFS Gateways, and MapR clients |
|
|
Term
|
Definition
| MapR Service, provides Read-Write MapR Direct Access NFS access to the cluster |
|
|
Term
|
Definition
| Provides access to MapR-DB tables via HBase APIs. Required on all nodes that will access table data n MapR-FS. Typically all tracker nodes and edge nodes for accessing table data. |
|
|
Term
|
Definition
| Hadoop Mapreduce Management Service. Coordinates execution of MapReduce jobs by assigning tasks to task tracker nodes and monitoring execution |
|
|
Term
|
Definition
| Hadoop YARN Management Service. manages cluster resources. Tracks resource usage and node health |
|
|
Term
|
Definition
| Enables HA and Fault Tolerance for MapR clusters by providing coordination |
|
|
Term
|
Definition
| Manages region servers that make up HBase table storage. Only required for native Apache HBase applications, not required for MapR-DB |
|
|
Term
|
Definition
| Runs MapR Control System (MCS) |
|
|
Term
| Metrics (service required on which nodes) |
|
Definition
| Optional Real-Time analytics data on cluster and job performance through Analyzing Job Metrics interface. If used, Metric service required on all JobTracker and WebServer nodes. |
|
|
Term
| Services Typically Running on a MapR Data Node |
|
Definition
| FileServer, TaskTracker, NodeManager |
|
|
Term
| When running multiple NICs on node, is it necessary to bond or trunk together? |
|
Definition
| No MapR is able to handle multiple NICs transparently |
|
|
Term
|
Definition
| Min 2 Instances, Master/slave for failover |
|
|
Term
|
Definition
| Majority of nodes (Quorum) must be up. Min 3 instances (2/3 must be up to function). Should run odd number of instances, setting up more than 5 is not recommended. |
|
|
Term
| HA for JobTracker, ResourceManager, and HBaseMaster |
|
Definition
| All Active/Standby. If active instance fails, standby takes over. |
|
|
Term
|
Definition
| VIPS or Virtual IP Addresses can be used for load balancing and HA in NFS, as well as providing access into NFS gateway through firewall via loadbalancer |
|
|