Translations of this page:

Osmius Architecture

 Osmius Arch

As we now know, Osmius monitors the Instances of our installation, asking for Types of Events periodically. Once the events are received in the Central Server, they are correlated and the status of the Instances and the Services are updated.

Let’s see the elements integrating this process.

Agents

They are the ones in charge of executing periodically the actions associated to each Type of Event. They are the ones which know how to collect the percentage of the CPU use in a Linux machine or the number of users connected to a Database.

In Osmius, an agent is the one responsible for collecting events of only one type of Instance. Therefore, we will have Agents for instances of Oracle or Windows, or Values of the Stock Market that would normally use the API provided by the different manufacturers of each type of instance. The Osmius Agents are built on C++ and use the ACE Framework (ADAPTIVE Communication Environment) as well as the Osmius own Framework, which enables the reutilization of nearly all the code when we create a new agent, so that we can concentrate on the recollection of the new types of events needed.

Each agent, when started, reads its own setup file, parses it and starts to monitor the defined instances found. For each defined instance in the setup file, reads the setup events and its values and execution periods, and starts to monitor. Every time it reads a new value and a text for a type of event for a specific instance, an event that will be sent to the master agent is created.

Arquitectura de un Agente y su relación con el resto de procesos

The appearance of a Linux Agent setup file is as follows:

[OSMIUS_AGENT]

#Use Error event criticity when no connection occurs instead of Critical. [0-Use 
ERRCON = 0
#Local Port for listening to commands from Master Agent
PORTCM = 11982
#Timeout in seconds for network operations. Don't change
TIMOUT = 130

[OSMIUS_INSTANCES\OSM_Host]

# Instance Type
TYPE = LINUX001
# Connection string  used by the agent to connect to the instance
CONNECTION_INFO = No
[OSMIUS_INSTANCES\OSM_Host\EVENTS]
	OSNUMPRC = -t 3600 -c 0 -w 300 -a 500 -T "" 
	OSPRCCPU = -t 300 -c 0 -w 80 -a 95 -T "" 
	OSPRCMEM = -t 600 -c 0 -w 80 -a 90 -T "" 
	OSPRCSWP = -t 3600 -c 0 -w 10 -a 30 -T ""  -s 
	OSPRCUFS = -t 3600 -c 0 -w 80 -a 95 -T ""  -s -L "/"
	OSUPTIME = -t 300 -c 1 -w 500 -a 300 -T ""'' 

The first part refers to the generic parameters of the agent (TIMOUT = 130 which is the waiting time for the net operations), and in the second part we can see the setup for the instance called OSM_Host, instance type LINUX001.

After this, we can see the events setup for their monitoring. For example, we see that the CPU percentage (OSPRCCPU) is being monitored every 5 minutes (-t 300), and that we will receive a warning if it’s over 80 (-w 80) and a critical alarm, if it’s over 95 (-a 95).

The Osmius agents can work separately from the Osmius group, that is, with no master agent supervising them. We can set the “Stand Alone” mode, so they can send the events using a script that, at the same time, can send an email or connect with another Monitoring System.

You can find the Osmius agents complete handbook. here.

Master Agents

Each Master Agent is in charge of controlling and managing a set of Osmius Agents within its own server. The Master Agents are the ones in charge of:

  • Receiving the Agent Events to resend them through the network (secure protocol) to the Central Server to be treated.
  • Starting and stopping the Osmius Agents by request of the Central Service.
  • Processing the tasks which they receive from the Central Server. Starting and stopping agents, the changes in the events or the instances that are to be monitored, update the setup of an agent.
  • Monitoring the Status of the Agents and informing of the failure or malfunction in the same way than they monitor the rest of the instances.

Basically, the Master Agents are in charge and gather all the managerial tasks to maintain the monitoring infrastructure. In this way, the Agents do what they have to do: “Monitor instances and send events”, and it’s easier and more flexible to manage big infrastructures.

The truth is that with a Master Agent and a selection of Osmius Agents we can monitor many instances remotely, but there will be the case when we need to have the agents locally installed in every server.

 Arquitectura de un Agente Maestro y su relación con el resto de procesos de Osmius

The setup file of a Master Agent looks like the following:

[OSMIUS_MASTER_AGENT]

#Master Agent Unique Code
CODMST = MASTER01
#Osmius server IP
IPSRV1 = 127.0.0.1
#Osmius Server Message receiver Port
PORTS1 = 2000
#Master Host IP used to accept commands
IPADCM = 127.0.0.1
#Master Port used to accept commands from the Central Server
PORTCM = 1970
#Server IP Address to send Master Agent commands
IPSDCM = 127.0.0.1
#Server Port Addressto send Master Agent commands
PORSCM = 1971
#Master Port used to receive messages and events from our agents
PORTAG = 1950
#Timeout in seconds for network operations. Don't change
TIMOUT = 160
#Debug all commands sent to the master agent
DEBUGA = 0

[OSMIUS_AGENTS\APACHE01]

# Show debug info 0-No 1-Yes
DEBINF = 1
# Start this agent when the master start working 0-No 1-Yes
STARTA = 1
# Additional command line if needed
ACMDLN = 

The first part refers to the general parameters for the Master Agent and the second one, to the specific parameters that indicate if an Agent should be started or not, or if it must be launched in a debug mode.

Central Server

The Central Server is the one that receives the events of every Agent from the monitoring infrastructure that we have setup in our installations, and is also the one in charge of sending the required tasks to be processed by the Master Agents.

In fact, the Central Server is composed of a set of processes:

  • Message Manager: it receives and correlates the events and then writes them in the Database, updating the status of Instances and Events.
  • Task Manager: Ensures that the information from the database is sent to the different setup files and that the instances are monitored as defined by the Osmius Administrators through the Control Panel.
  • State Manager: Check the status of each of the master agent.
  • Data Mining Manager: Make regular data mining calculations.
  • Notification Manager: it’s in charge of the follow-up of who is subscribed and what’s the subscription for, and send them the notifications in the most convenient way (e-mail, SMS, etc).
  • Discovery Manager: Discovery new instances of any type.

 Arquitectura del Servidor Central de Osmius

These processes act as an interface between the Database, which is updated with the actions of the users in the Control Panel, and all the infrastructure of the Master Agents and the Osmius Agents.

Configuration files of these process:

[OSMIUS_CENTRAL_SERVER]

# IP Address in which listen for incomming messages from Master Agents
# If 0.0.0.0 server will listen in all available interfaces.
IPMAMS  = 192.168.1.2
# Local port in which listen for incomming messages from Master Agents. Default 2001.
PORTMS  = 2001
# Time out for network and queue operations in seconds.
TIMOUT  = 120
# Maximum number of retries to connect to server.
RECONN  = 4
# Osmius Repository Database Parameters.
DBUSER  = osmius
DBPASS  = osmius
DBPORT  = 3306
DBNAME  = osmius
DBHOST  = localhost

[OSMIUS_TASK_MANAGER]

# IP Address used to accept commands.
IPSDCM  = 192.168.1.2
PORSCM  = 2002
# Maximum number of retries to connect to server. 
RECONN  = 4
# Osmius Repository Database Parameters.
DBNAME  = osmius
DBHOST  = localhost 
DBPORT  = 3306
DBUSER  = osmius
DBPASS  = osmius
# Number of retries to process one task.
RETRIS  = 50
# Interval Time to process tasks
TIMTSK = 10
# Number of tasks to process in one interval
NUMTSK = 10
# Print all command sent and received or not
DEBUGA = 0

[OSMIUS_STATE_MANAGER]

# IP Address of the state manager.
IPSTMG  = 192.168.1.2
#Osmius server IP
IPSRV1 =  192.168.1.2 
#Osmius Server Message receiver Port Default 2001
PORTS1 =  2001
# Maximum number of retries to connect to server.
RECONN  = 4
# Osmius Repository Database Parameters.
DBNAME  = osmius
DBHOST  = localhost
DBPORT  = 3306
DBUSER  = osmius
DBPASS  = osmius
# Number of retries to process one task.
RETRIS  = 50
# Time to Data Warehouse calculations
TIMDWH = 300
# Interval Time to calculate global note
TIMGLN = 10
# Interval Time to test infrastructure state
TIMLIF = 300
# Number of master to process in one test interval
NUMLIF = 100
# Print all command sent and received or not
DEBUGA = 0

[OSMIUS_NOTIFI_MANAGER]

# Maximum number of retries to connect to server.
RECONN  = 4
# Osmius Repository Database Parameters.
DBNAME  = osmius
DBHOST  = localhost
DBPORT  = 3306
DBUSER  = osmius
DBPASS  = osmius
# Number of hours in which a notification is valid.
NHOURS  =  8
# Interval Time to process notifications
TIMNTF  = 60
DEBUGA  = 0

[OSMIUS_DISCOVERY_MANAGER]

#Server Unique Code
  CODSVR = SERVER01
# Osmius Repository Database Parameters.
  DBNAME  = osmius
  DBHOST  = localhost
  DBPORT  = 3306
  DBUSER  = osmius
  DBPASS  = osmius
# Timeout to abort the discovery process
  TIMOUT = 1800
# Script to search the net finding alive hosts
  SCANNT = nmap.sh
# Print all command sent and received or not
  DEBUGA = 0

Database

The Osmius Database stores all kind of information:

  • Events received through the Agents. This is a very valuable information and adequately exploited it informs us about how the instances behave themselves, which events are the most active ones, when we tend to have more or less problems, in which servers we have agents, what instances have resource problems, etc.
  • Inventory of Instances. Which are the instances that we monitor sorted by type and which is the configuration of the connection and the setup of every event we want to monitor
  • Service Inventory. How the instances are organized to create Services with their SLAs, enabling to take full advantage of all the information from a business orientated point of view.
  • Infrastructure. Which agents we have and where. How the notifications are defined and in what ways we can warn the system.
  • Data Warehouse. Historic Information about status changes and availability that together with what has been mentioned before enables the creation of the Decision Support processes.

Management Control Panel

Any user with a Browser (preferably Firefox) and with the adequate authorizations can connect to the control panel and start to monitor and manage the entire infrastructure.

The Osmius Control Panel is a Web application built in Java and based on various Frameworks that is executed in a Tomcat Server with database connection. When a user changes the setup of an element it is stored in the database and, if necessary, the tasks that will be processed by the Central Service processes are generated, and they will probably change the behavior of an agent in a remote server.

 
en/usuario/arquitectura.txt · Last modified: 2010/12/22 13:22 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki