| The Managed Resource Interface: The
Managed Resource Interface
|
The following chapter introduces the Managed Resource
Interface. It looks at the different aspects involved in the information,
function, organization and communication models. It then gives an example
over parts of the information and organization model for a typical telecom
application. Readers not familiar with Erlang are recommended to read
appendix A prior to continuing. An example of the MRI from a real project
is included in Appendix B. The MRI is formed by a set of functions, generic to all applications. These functions take and return proprietary data structures that describe a specific managed entity within the application. The data structures, together with the generic functions, provide the infrastructure needed to manage the application. The MRI attempts to portray the network element as a relational database where each table contains a set of record types. Each record type has a set of fields that represent a unique key in that table. All other fields are used to store specific data related to that instance. The record type, together with the key, represents a unique physical or logical managed entity. The client using the MRI is allowed to retrieve information in two ways. It can either retrieve the complete list of keys associated to a specific table, or through a specific key, retrieve a specific instance of a record. The client is only allowed to write elements that do not affect the logical state of the NE. All state changes of a logical element are achieved through a set of actions specific to that managed element. These actions influence the network element, resulting in a create, write or delete operation of the record describing the managed entity. The MRI provides functionality that allows spontaneous messages originating from the network element to be forwarded to the clients. In order for clients to receive these notifications, they have to subscribe through the MRI providing a call back function. The communication model Should the MRI be used as a proprietary protocol directly towards the element manager, the specification for the Erlang distribution can act as a simple communication model. When standardized management protocols are placed as an extra layer on the MRI, no communication model is needed. The model based on Erlang distribution would be a connection oriented IP protocol based on TCP/IP [1]. The only security provided is based on Erlang cookies, where an Erlang node must be aware of other nodes’ cookies in order to communicate. This might be suitable for networks behind firewalls [2], but in open networks, security extensions are needed. A possibility would be to incorporate encryption in the Erlang distribution model. The crypto library module provides such support.If the distributed model is not extended, an extension has to be implemented on top of the MRI as an independent agent [3]. It could contain user session functionality, passwords, and access rights for different types of users. Another possibility is using the secure socket layer provided with OTP. Socket and port communication would have to be used between the management system, often called the element manager (EM), and the processor controlling the network element, often called the Central Processor (CP). Communication servers would convert the Erlang terms to binaries, transferring them between the nodes, and converting them back to the original terms. Built-in functions for these operations are provided. If the properties and security provided by the Erlang
distribution is sufficient, the most common mean for an element manager
to access the network element would be through the use of Erlang remote
procedure calls. It would almost transparently allow the element manager
to call the exported functions in the MRI, and receive the results as
if the function had been executed on the Erlang node running the element
manager [4]
. The MRI data types could be used in the element manager, avoiding conversions
and unnecessary layering.
Message passing between servers could also be used. It would however require an internal communication system (ICS) between the nodes. ICS could be implemented with generic server or finite state machine behaviors on the CP and EM sides, forwarding messages and keeping track over which links are up. The Erlang distribution allows the monitoring of the CP Erlang nodes controlled by the EM Erlang node through the monitor_node/2 BIF. The BIF creates notifications as soon as the communication between nodes is interrupted. The notification is generated regardless of if it is the TCP/IP connection that goes down, if it is the Erlang run time system which stops running due to a power shortage, a software crash [5], or some other unforeseen cause. Location transparency is another key feature, hiding the location of the CP in regards to the EM. As long as the CP is connected to the same network partition as the EM, no high level knowledge of the location or IP address is necessary [6]. The CP can operate regardless of the underlying hardware and operating systems, allowing hybrid systems using different operating systems to be connected together. Scalability can occur during runtime, and new CPs and EMs can be added without having to restart the system. No additions need to be made to allow several EMs to access one CP. The Information model – Function definition The functions provided within the information model of the MRI provide mechanisms for inspecting and changing the state of the managed entities. These functions are the most basic in the whole system, and it should not be possible to achieve the same state change through different set of operations. Complex operations and functionality specific to standardized protocols is built based on this model on top of the MRI. Getting Data The accessing of data is achieved through the get_info/1 function call. The only parameter to this function is the record representing the table of the managed entity where the data has to be retrieved. In this managed entity record, only the keys need to be instantiated. All other fields are ignored. The CP extracts the keys and uses them to retrieve values that make up the MRI record, either from the CP databases, or by reading them from some hardware device. These values are formatted in accordance with the information model, returning a MRI record where all the fields are set. If this specific instance of the record we are trying to look up is not represented in the CP, an error tuple is returned.
In order to read a specific record, the keys have to be retrieved. Such is done through the get_keys/1 function call. This call takes an MRI record as a parameter, taking only the record type into account. All fields in the record are ignored. This function returns a list of lists. Each list consists of the keys defined in the records, positioned in the same order as they are defined.
Setting Data Data that can be set through the MRI is divided into two groups, namely management data and logical CP data. Management data has no meaning to the CP, nor does it affect the CP’s logical behavior. The CP just stores it in a table, providing a mapping to an existing MRI record. Secondly, there is data that affects the logical state of the CP when set through an action. The setting of such data often involves an operation or query on some hardware device, or some logical check of the data being passed, and can thus fail returning an error code other than the instance one. Setting Management Data Management systems are supposed to be stateless, and are thus not allowed to maintain their own tables. For this reason, all records defined in the information model of the MRI have a field called mgt. This field stores data that is specific for the managed entity denoted by the record and only needed and used by the management system. It might include strings displayed to the operator, identifiers denoting the physical location of the hardware, or possibly, standardized protocol information needed to send or retrieve other data.
In the set operation, a MRI record in which only the keys and the mgt fields are set is passed to the call. These fields are extracted and stored by a subsystem in the CP. Their contents are never inspected. The return value of the function call is the MRI record where all the fields are set. Should the instance of the record not exist, an error tuple is returned. Whenever an action results in the record being deleted, the contents of the mgt field will also be deleted. There are cases where a restart [7] of the system is necessary, either due to software or hardware updates, or because of some bug or hardware malfunction. The information model describes which managed entities are considered persistent and need to survive such a restart. The contents of the management fields should also survive these restarts and be included in any future retrieval of the MRI records, without the data having to be reset. Management fields associated with volatile MRI records do not need to survive a restart, as these entities are recreated after the restart. If the mgt field is never set, it will by default have the value of the atom undefined. Actions Actions in the MRI are the only means to create, delete or manipulate instances of records. These operations have to be atomic transactions, either succeeding and updating the internal CP state, or failing and leaving it untouched. Often, they involve some action towards existing hardware. To ensure the validity of this action, extensive logical checks on the current state of the system have to be made.
The function call action takes two parameters. The first is an atom denoting the action and can be anything from create or delete to configure or disable. The second argument, an MRI record, denotes the managed entity the action is being applied on. The keys always have to be set. If one or more fields in the record are needed for the action, they are specified in the information model and set in the record being passed. Notification of events So far, the functionality provided has only dealt with the flow of information from the management system towards the CP. The MRI is also able to send notifications originating in the CP to the management system. These notifications usually involve alarms or events raised in the system, but can be used for other purposes as well. Any application in the Erlang node can subscribe to these notifications by giving the name of a call back module, a function, and a list of arguments. Whenever a notification is generated, all subscriber functions are called passing the notification as one of the arguments.
As the functions defined above will be the same in all systems using the MRI, regardless of the supported functionality. What will differentiate the various subsystems are the MRI definitions where the semantics of all system specific data models are defined. The MRI model requires a detailed description of a set of records, each representing a managed entity, or part thereof. These records are then associated with a set of actions that, when executed on the managed entities, change their state. Defining records MRI records contain the data needed by the management system relative to a specific managed entity. All possible types that may be assigned to a field must be specified. They may contain any valid Erlang term, allowing related data to be grouped together in tuples, lists or other records. In the case of values consisting of integers, real numbers, or strings, it is usually sufficient to state the type, unless upper and lower bounds defined by some other standard exist. In the case of atoms, all combinations have to be specified. Process Identifiers and References, even if allowed, should be avoided and whenever possible replaced with counters and static process names. All type definitions of the record fields, as well as possible upper and lower bounds, are usually specified as Erlang comments in the include file where the records are defined. A key in a MRI record may be contained in one or more fields. These are identified by setting the default value to the atom key in the record definition. A set of keys associated to a MRI record type must be unique and point out only one instance. Records without keys may exist, resulting in one such record instance on every CP. It is common to describe the managed entities in a hierarchical fashion. A record will have a set of keys. The records on the next level will contain the same set of keys together with an extra one. It is also common to abstract wherever possible, creating a generic record describing different managed entities of a specific type. If these managed entities have properties that do not fit in the abstract record definition, extensions can be defined. Extensions may either be included as data structures in the record, or exist as an additional MRI record sharing the same keys and life spans. The actual MRI records should not be stored in the CP. They should be instantiated only when needed, copying the data from other internal CP records or by reading it from hardware devices. When converting from one record type to another, a 1-1 mapping between the CP and the MRI records should where possible be kept. The CP record will contain the MRI specific fields, and in addition to those, other fields used only in the CP. These fields are ignored in the mapping, and never transferred. The same can be said about the layer above the MRI, where the MRI records are mapped to another record, extracting data stored in the management field. This mapping would probably be the internal format used by the element manager or by the subsystem providing the translation to the standardized management protocol being used. The correct definition of records is the key to successfully implementing a MRI. These records should follow the same principles used when defining and optimizing tables in a relational database. Data should not be replicated, and records should be divided up and optimized in regards to generic and specific data. If a mapping between primary and secondary keys exists, it is feasible to place it in an independent record defined for solely that purpose. Defining Actions MRI action definitions deal with changing the internal CP state, yielding a high-level definition of how the system should be configured and managed. The execution of these actions triggers events that change the state of the software and hardware. Actions may be general to a family of MRI records, or possibly be specific to a single one. Common generic actions include create, delete, and configure. Action definitions specify which fields in the MRI records are to be used in the action. All other fields are ignored. Abnormal behaviors and side effects should also be documented. Abnormal behaviors might include switchovers, where standby equipment becomes active, and active equipment becomes standby. All of this is done in one operation, just sending a system node record as an argument to the action. The side effects, however, involve numerous boards in the active and standby tables. The records in which the hardware is represented change without an operation being applied directly on the managed instances. Successful actions which are executed through the MRI either return the atom ok [8] or a fully instantiated copy of the MRI record involved. Should an error occur, an error tuple is returned. Possible errors are defined together with the actions. Some error types are general to many managed entities, and could include instance, instance_exists or hw_error. Others are specific to certain managed objects, and are returned only within the subset of the MRI the actions are defined in. Actions should be atomic, and no two actions dependent on each other should be combined. In telecom systems, an example would be setting the management status of a slot to unmanaged. This means that any hardware inserted or already in this slot will be ignored. Disabling a slot has the precondition that if a board is inserted in the slot, the board also has to be disabled. Boards can be enabled and disabled regardless on if the operator wants to disable a slot, and a slot’s management status may be disabled regardless of if there is a board inserted or not. It would thus be incorrect to incorporate the disabling of the board and of the slot in one operation. The only documentation of such a dependency should be an error returned in case the board in still enabled. Care should be taken in avoiding to insert fields in MRI records to determine the type of action. Assuming a MRI board record has a field denoting a state taking the values active, inactive, and create. Calling an action set_state, and then relying on the value of the field to create, or set board state to active or inactive is inappropriate[9] because the state field of the record will never have the value create when it is retrieved. One should instead have two actions called create and set_state, or preferably go a step further and call the actions create, activate and deactivate, not relying on the record field at all. The MRI record would then tell what is being created, activated or deactivated, making the code more readable and the information model more straight forward. Alarms & Other Events The information model contains a list of all alarms and events supported by the system. These alarm definitions contain the originator type, the category and severity. This specification is needed to simplify the mapping from the MRI to other standardized management protocols, ensuring the correctness of the mapping. The type system in Erlang allows for alarms and state changes
to be treated in the same manner, the only difference being that an active
alarm list is kept for alarms that have not been cleared. All alarms and
state changes are encapsulated in the same notification record, and forwarded
up to the subscribing functions. Alarms and events stored in the log,
as well as active alarms use a unique index as a key, which in turn is
retrieved with the get_keys operation. The MRI was built to follow a manager agent paradigm. This paradigm can be easily adapted to fulfill specifications required by other standardized management protocols. Some protocols have few and specific network elements, each dealing with their specific requirements. Such can easily be achieved by defining a specific information model for that specific functionality. The management system will have the knowledge of which network
element provides what functionality, and execute the specific MRI operations
accordingly. In the case of homogeneous network elements, the solution
is even simpler. Other functionality that can be implemented with the
MRI is a central processor distributed on several Erlang nodes and boards,
providing fault tolerance, reliability, and high availability. No functional model has been defined for the MRI. Defining functionality for provisioning, performance and fault management, accounting, billing and security in the model would have greatly complicated the mapping of the MRI to other standardized management protocols. The complexity belonging to the functional model is instead contained in the information model. The above, however, does not imply that a model cannot
exist. A MRI functional model can be defined in terms of MRI records and
actions. MRI records and actions have in fact been reused in different
systems, resulting in the first defacto functional model. [See appendix
B, A Managed Resource Interface Example].
[1] Other protocols can be used if implemented, but no other implementation exists today. [2] Messages are today sent unencrypted on the network. [3] The agent can run on the same Erlang system as the Central Processor.[4] This would also allow the element manager to execute on the same Erlang node as the CP. [5] A crash of the Erlang virtual machine, not the Erlang code executing. [6] Default network configuration of the underlying OS is sufficient. [7] Often referred to as hot and cold restarts. [8] Used only if the action is delete, as no instance of the managed object will remaining in the CP. [9] This is a common mistake done by people that have worked extensively with SNMP. |