Storage Performance - OpenStack Architecture Design Guide.pdf

When considering performance of OpenStack Block Storage, hardware and architecture choice is important. Block Storage can use enterprise backend systems such as NetApp or EMC, use scale out storage such as GlusterFS and Ceph, or simply use the capabilities of directly attached storage in the nodes themselves. Block Storage may be deployed so that traffic traverses the host network, which could affect, and be adversely affected by, the front-side API traffic performance. As such, consider using a dedicated data storage network with dedicated interfaces on the Controller and Compute hosts.

When considering performance of OpenStack Object Storage, a number of design choices will affect performance. A user’s access to the Object Storage is through the proxy services, which typically sit behind hardware load balancers. By the very nature of a highly resilient storage system, replication of the data would affect performance of the overall system. In this case, 10 GbE (or better) networking is recommended throughout the storage network architecture.

Availability

In OpenStack, the infrastructure is integral to providing services and should always be available, especially when operating with SLAs. Ensuring network availability is

accomplished by designing the network architecture so that no single point of failure exists.

A consideration of the number of switches, routes and redundancies of power should be factored into core infrastructure, as well as the associated bonding of networks to provide diverse routes to your highly available switch infrastructure.

The OpenStack services themselves should be deployed across multiple servers that do not represent a single point of failure. Ensuring API availability can be achieved by placing these services behind highly available load balancers that have multiple OpenStack servers as members.

OpenStack lends itself to deployment in a highly available manner where it is expected that at least 2 servers be utilized. These can run all the services involved from the message queuing service, for example RabbitMQ or QPID, and an appropriately deployed database service such as MySQL or MariaDB. As services in the cloud are scaled out, backend services will need to scale too. Monitoring and reporting on server utilization and response times, as well as load testing your systems, will help determine scale out decisions.

Care must be taken when deciding network functionality. Currently, OpenStack supports both the legacy Nova-network system and the newer, extensible OpenStack Networking.

Both have their pros and cons when it comes to providing highly available access. Nova-network, which provides networking access maintained in the OpenStack Compute code, provides a feature that removes a single point of failure when it comes to routing, and this feature is currently missing in OpenStack Networking. The effect of Nova network’s Multi-Host functionality restricts failure domains to the host running that instance.

On the other hand, when using OpenStack Networking, the OpenStack controller servers or separate OpenStack Networking hosts handle routing. For a deployment that

requires features available in only OpenStack Networking, it is possible to remove this

restriction by using third party software that helps maintain highly available L3 routes. Doing so allows for common APIs to control network hardware, or to provide complex multi-tier web applications in a secure manner. It is also possible to completely remove routing

from OpenStack Networking, and instead rely on hardware routing capabilities. In this case, the switching infrastructure must support L3 routing.

OpenStack Networking (Neutron) and Nova Network both have their advantages and disadvantages. They are both valid and supported options that fit different use cases as described in the following table.

Nova Network OpenStack Networking

OpenStack Networking vs Nova Network

Simple, single agent Complex, multiple agents More mature, established Newer, maturing

Flat or VLAN Flat, VLAN, Overlays, L2-L3, SDN No plugin support Plugin support for 3rd parties

Scales well Scaling requires 3rd party plugins No multi-tier topologies Multi-tier topologies

Ensure your deployment has adequate back-up capabilities. As an example, in a deployment that has two infrastructure controller nodes, the design should include controller

availability. In the event of the loss of a single controller, cloud services will run from a single controller in the event of failure. Where the design has higher availability requirements, it is important to meet those requirements by designing the proper redundancy and availability of controller nodes.

Application design must also be factored into the capabilities of the underlying cloud

infrastructure. If the compute hosts do not provide a seamless live migration capability, then it must be expected that when a compute host fails, that instance and any data local to that instance will be deleted. Conversely, when providing an expectation to users that instances have a high-level of uptime guarantees, the infrastructure must be deployed in a way

that eliminates any single point of failure when a compute host disappears. This may include utilizing shared file systems on enterprise storage or OpenStack Block storage to provide a level of guarantee to match service features.

For more information on HA in OpenStack, see the OpenStack High Availablility Guide found at http://docs.openstack.org/high-availability-guide.

Security

A security domain comprises users, applications, servers or networks that share common trust requirements and expectations within a system. Typically they have the same

authentication and authorization requirements and users.

These security domains are:

Public Guest

Management Data

These security domains can be mapped to an OpenStack deployment individually, or combined. For example, some deployment topologies combine both guest and data

domains onto one physical network, whereas in other cases these networks are physically separated. In each case, the cloud operator should be aware of the appropriate security concerns. Security domains should be mapped out against your specific OpenStack

deployment topology. The domains and their trust requirements depend upon whether the cloud instance is public, private, or hybrid.

The public security domain is an entirely untrusted area of the cloud infrastructure. It can refer to the Internet as a whole or simply to networks over which you have no

authority. This domain should always be considered untrusted.

Typically used for compute instance-to-instance traffic, the guest security domain handles compute data generated by instances on the cloud but not services that support the

operation of the cloud, such as API calls. Public cloud providers and private cloud providers who do not have stringent controls on instance use or who allow unrestricted internet access to instances should consider this domain to be untrusted. Private cloud providers may want to consider this network as internal and therefore trusted only if they have controls in place to assert that they trust instances and all their tenants.

The management security domain is where services interact. Sometimes referred to as the

"control plane", the networks in this domain transport confidential data such as

configuration parameters, usernames, and passwords. In most deployments this domain is considered trusted.

The data security domain is concerned primarily with information pertaining to the storage services within OpenStack. Much of the data that crosses this network has high integrity and confidentiality requirements and, depending on the type of deployment, may also

have strong availability requirements. The trust level of this network is heavily dependent on other deployment decisions.

When deploying OpenStack in an enterprise as a private cloud it is usually behind the firewall and within the trusted network alongside existing systems. Users of the cloud are, traditionally, employees that are bound by the security requirements set forth by the

company. This tends to push most of the security domains towards a more trusted model.

However, when deploying OpenStack in a public facing role, no assumptions can be made and the attack vectors significantly increase. For example, the API endpoints, along with the software behind them, become vulnerable to bad actors wanting to gain unauthorized

access or prevent access to services, which could lead to loss of data, functionality, and reputation. These services must be protected against through auditing and appropriate filtering.

Consideration must be taken when managing the users of the system for both public

and private clouds. The identity service allows for LDAP to be part of the authentication process. Including such systems in an OpenStack deployment may ease user management if integrating into existing systems.

It's important to understand that user authentication requests include sensitive information including usernames, passwords and authentication tokens. For this reason, placing the API services behind hardware that performs SSL termination is strongly recommended.

For more information OpenStack Security, see the OpenStack Security Guide, at http://docs.openstack.org/security-guide/.

In document OpenStack Architecture Design Guide.pdf (Page 29-34)