Translations of this page:
Table of Contents

Introduction

Osmius suggests a distributed architecture along the network connecting the systems we want to monitor. It is based in the recollection of events from different points and its centralization in a unique database so that they can be displayed and developed, through the GUI, by the operators in charge of the monitoring.

This section introduces the infrastructure elements, like the Agents specialized in gathering the events for a particular type of Instance, or the Master Agents in a particular Workstation, which control a group of this agents and send the events back to the Central Servers to storage the information in the database and finally reach the Control Panel, capable not only of displaying the events and configuring the instances to monitor, but even of processing the maintenance, configuration and surveillance of every element belonging to the infrastructure, sending the specific commands for each task to the Master Agents.

Arquitectura

This distributed scheme starts when Osmius is installed with a basic configuration of only one Master Agent and the corresponding agents all running in one Workstation, and the later creation of as many Master Agents as needed. The original Master Agent is required to monitor the performance of the Osmius system by means of the OsmiusSV, although it can be used to monitor many Instances too.

Only the root user can carry out all the infrastructure maintenance functionalities: Agents, Master Agents, Instances and Tasks, while any user with enough authorizations over some instances could supervise their correct performance.

Concepts

  • Events: Each of the messages returned by an instance as a response to an event type. It is mainly integrated by a value and a text. E.g.: In response to the ‘number of users’ event type, an instance has returned an event integrated by a value (112) and a text (112 connected users). More info available in: Events
  • Type of Event: it could be understood as a generic question ready to be answered by a given instance type. E.g.: For the Oracle type instances (Oracle databases) there is an event type questioning the instance about the number of existing unsaved redo logs. More info available in: Tipos de eventos
  • Instances: an instance is a particular element connected and accessible through the network which Osmius is able to monitor. E.g.: the Apache server from the Apache official website. For more information: Instancias
  • Type of Instance: defines the shared characteristics that an instance must have to belong to such type. E.g.: Apache instance type includes all the Apache servers from version 1.3 onwards plus the mod_info and mod_status flags activated. More info available in: Tipos de instancia


Arquitectura Osmius

  • Agents: An Agent is a software process that runs in an engine and that is capable of obtaining Events of a certain Type of Instance to monitor its status. We say that the Type of Agent is the same than the Instances that it’s capable of monitoring. Therefore we will have Linux, HP-UX, Windows, MySQL, MSQL, HTTP, etc. An up to date list of the Agents which are offered with the different Osmius distributions can be consulted at Osmius Official Documentation.

It’s important to highlight that an Agent can only be executed through the Master Agent in charge of the agent's performance. On the other hand, it can be the case that an Agent needs to be executed compulsory in the same machine where the instance to monitor is located –for example the agents monitoring the Operative System-. Generally an Agent is able of monitoring “anything” that is connected to the same network.

The Agents can have different status (colors): Verde Started, Gris Stopped and Rojo Error, which means that the real status of the Agent is not the desired one; this can occur due to an error (it has stopped or it has been initiated without a direct order from the console) or because the status of the Master Agent on which it depends on is not the desired one.


When the Master Agent starts or restarts an Agent, this reads a setup file containing the parameters for the execution and the information about the instances to be monitored.


  • Master Agents: A Master Agent is a software process that runs in a machine and that is capable of Managing (this is: start, stop, change the setup, etc.) the Agents which depend on it (always executed on the same machine) and on receiving all the events that each one obtains sending them to the Central Server. The Master Agents are the only ones capable of communicating with the Central Server to send it the events, as we have mentioned before, and to receive orders, in the form of tasks, for its set up and the management of its Agents.

A Master Agent is defined by the host name where it’s being executed and by the IP address, therefore there can only be one Master Agent being executed in the same machine. Each one of them has also a unique Master Agent code which identifies them internally in the system and that is automatically assigned when the Deployment of a New Master Agent takes place. This code is the same even though the Master Agent is restarted manually Restart the Master Agent.

The Agents can be in different status (colors): Verde Started, Naranja Paused, Gris Stopped and Rojo Error, which means that the real status of the Agent is not the desired one; this can occur due to an error (it has been stopped or started without a direct order from the control panel) or because it has unsettled tasks to execute. Each Master Agent has associated a set up file to it, which it reads every time it’s started or restarted, and that indicates its execution parameters and the Agents that must be started o stopped.

  • Master Agent Proxy: A Master Agent Proxy is a normal Master Agent with all its features but that can also make proxy functionalities for other Master Agents that send its events through it. For those Master Agents, Master Agent Proxy is the “Central Server”.They send their events to it and receive their configurations from it, so it is transparent for their configuration if they are communicating with a Master Agent Proxy or with the real Central Server.


  • Tasks: Any functionality that is to be carried out in the Osmius infrastructure generates a Task in the Server. Such Task is always associated to a unique Master Agent. The Server’s Task Administrator treats them periodically and sends the appropriate command to the Master Agent that must execute it. This is done locally (in the machine where it is running) obtaining the expected result over it or its Agents. All the tasks are based on the updates of the setup files and on the restarting of the processes, which is going to enable us to undertake all the system functionalities with a small group of processes: Update of the Agent Set Up, Update of the Master Agent Set Up, Restart of the Master Agent, Pause a Master Agent, Stop a Master Agent, Consult the Status of the Master Agent and Consult the Status of the Agent.


  • Central Server: The Central Server is formed by a series of processes that are executed together in one machine and that centralize all the actions and all the monitored data::
    Events Administrator: It receives all the events from each of the Master Agents distributed in the infrastructure, correlating and storing the information in a Central Database.
    Tasks Administrator: It processes the tasks periodically, sending the pertinent command to the Master Agent and waiting for a response, which it's used to update the result of the task. From the moment a user gives an order through the control panel producing the task, to the moment the Task Administrator processes this task and the associated Master Agent runs the execution, the time in between depends, amongst other things, on the setup of the Task Administrator.
    Infrastructure Status Administrator: It verifies periodically the status of each one of the Master Agents to see if they are being executed accordingly to how they have been started from the control panel, if they are stopped and should be executed or vice versa. If there is an error it warns about it and it demands the Master Agent to correct it if possible. Every time there are pendant tasks for the Master Agent, the status will not be Ok until they are processed and the mentioned Master Agent executes them.
    Data Mining Administrator: It calculates periodically the global mark of the system and of all the needed data aggregations for its latter exploitation.
    Notifications: Periodically process the changes in the services, instances, SLA, marks, etc, and sends notifications to the subscribed users through email, SMS, etc.


  • Robustness of the infrastructure gets Osmius recover from any failures that occur throughout all layers of its distributed architecture:
    Agentes: Each Master Agent is responsible for agents who depend on it. The Master Agents start and stop their agents. The Master Agents send to them new instances to be monitored. And even the Master Agents restart their agents if Agents have problems or have fallen.
    Master Agents: If the Master Agents lose the connection with the Central Server, they work locally for a time specified by parameter keeping all the events produced by its agents. These events will be sent when the central server is available again. If that time is exceeded without connection, the Master Agent will be paused.
    Central Server: The central server is capable of functioning without having connection with master agent, and when it recovers the connection, Central Server will restart all Master Agents that has been on hold for the loss thereof.


  • Control Panel: It’s the graphic interface from which we can undertake all the actions in Osmius and monitor the entire infrastructure through the OsmiusAG Instance and the OsmiusSV Service

Functionality

 
en/usuario/infraestructura/indice.txt · Last modified: 2010/12/22 13:22 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki