Skip to content

xp idle

RFish ⚓ edited this page Oct 10, 2016 · 2 revisions

Idle Experimentation

Configuration

physical nodes
1 control, 1 network, 20 computes.
control
neutron_server, nova_conductor, nova_scheduler, nova_novncproxy, nova_consoleauth, nova_api, glance_api, glance_registry, keystone, rabbitmq, mariadb, memcached, cron, kolla_toolbox, heka, cadvisor, grafana, influx, docker_registry, collectd
network
neutron_metadata_agent, neutron_l3_agent, neutron_dhcp_agent, neutron_openvswitch_agent, neutron_openvswitch_agent, openvswitch_db, keepalived, haproxy, cron, kolla_toolbox, heka, cadvisor
compute
nova_ssh, nova_libvirt, nova_compute_fake_1, …, nova_compute_fake_#fake, openvswitch_db, openvswitch_vswitchd, neutron_openvswitch_agent, neutron_openvswitch_agent_fake_1, …, neutron_openvswitch_agent_fake_#fake, cron, kolla_toolbox, heka, cadvisor

Monitored information

collectd
memcached (hit, miss, set), mysql (total insert/delete/update), docker (contextswitch), tcpconns (#socket open on a specific port).
cadvisor
networkIn/networkOut, mysql cpu/mem, rabbit cpu/mem.

cadvisor (each 10 sec log into influx, monitor one physical compute node)

Get results

First, find the name of the host machine.

cd results
vagrant up idle
XPHOST=`vagrant ssh-config idle | grep HostName | awk '{print $2}'`

Then, create ssh tunnels

# Get an access to the grafana
ssh -NL 3000:${XPHOST}:3000 rennes.g5k
# Get an access to the nginx with kolla logs
ssh -NL 8000:${XPHOST}:8000 rennes.g5k

Results

Nova API

Purpose: Nova API is the user-facing interface to all OS services. Nova API processes client REST requests, which typically involve database reads/writes and optionally sending RPC messages to other Nova services via the oslo.messaging queue [fn:os-archi].

Nova API 5 10 25 50
Mem max (GB) 1.60 1.60 1.72 1.60

Remarks: Using max for memory makes sense since the curve instantly reaches the max and never decreases.

Trend: The amount of used memory is constant whatever the number of node is.

Findings: TODO

[fn:os-archi] http://docs.openstack.org/developer/nova/architecture.html

Nova Conductor

Purpose: Nova conductor acts as a database proxy between computes and the database. Such proxy makes possible the communication between an upgraded system and an old version nova compute. The conductor handles requests that need reconfiguration and objects that need conversion.

Nova Conductor 5 10 25 50
Mem max (MB) 2.47 2.47 2.45 2.47
CPU avg (#usage) 1.22 2.00 3.68 7.00
Rx avg 122kB/s 161kB/s 560kB/s 1.1MB/s
Tx avg 85kB/s 242kB/s 360kB/s 750kB/s

Remarks: For memory, using max seems OK since the curve quickly (10 min) reaches the max and never decreases.

Trend: Mem is constant and CPU is linear in number of nodes.

Findings: TODO

Nova Scheduler

Purpose: Nova scheduler decides which hosts gets each instance.

Nova Scheduler 5 10 25 50
Mem max (MB) 100.30 104.00 106.30 109.60

Remarks: Using max makes sense since the curve instantly reaches the max and never decreases. Is this linear with the number of nodes?

Findings: TODO

Keystone

Keystone is the OS identity service. It provides API for client authentication, service discovery and multi-tenant authorisation. Nova api talks to Keystone thought an http connextion.

Keystone 5 10 25 50
Mem max (MB) 774 771 772 780

Remarks: None

Trends: The max usage of memory seems constant in the number of nodes.

What about the CPU? Keystone is called 4 times during the deployment, and every time keystone is called there is a CPU peak. The value of theses peaks are the same for all experimentation. However, the time between the second and the third peak increases with the number of node. The time is of 5 minutes in the first experiment, 7 minutes in the third one and 9.3 in the last one. Thus, OS does something after the second peak that takes more times if the number of nodes increases. Time between the third and forth peak increases in the same manner.

Findings: TODO

Neutron server

Purpose: Neutron Nova api and nova compute talk to neutron throught an http connexion.

Neutron Server 5 10 25 50
Mem max (MB) 419 420 359 441
CPU avg (#usage) 0.14 0.21 0.39 0.69

Remarks: The trend of memory usage is not clear, especially because of the third and fourth experiment results. It could be constant or linear.

Trends: I don’t know for the memory usage (see remarks below). The trend of CPU usage is clearer. It appears that CPU increases linearly with the number of nodes.

haproxy

TODO: Purpose

Haproxy 5 10 25 50
Mem max (MB) 6.27 6.32 7.04 8.71
Mem avg (MB) 5.60 5.54 5.99 7.22
CPU avg (#usage) 0.11 0.18 0.33 0.49

Remarks: Mem & CPU seem linear.

TODO: What we could say

rabbitmq

TODO: Purpose

rabbitmq 5 10 25 50
Mem max (GB) 1.59 2.52 5.08 11.25
CPU avg (#usage) 1 1 3 5
#connection avg (K) 1.32 2.35 5.53 10.05
#connection max (stationnary value) (K) 1.5 2.93 6.89 13.5

Remarks: Mem, CPU and #connection seem linear.

TODO: What we could say

mariadb

TODO: Purpose

mariadb 5 10 25 50
Mem max (MB) 502 546 570 594
CPU avg 0.03 0.06 0.13 0.21
#connection avg 68 72 100 132
#connection tot (more stationnary) 79 85 120 170
#select / s todo
#update / s todo

Remarks: Lineare.

TODO: What we could say

memcached

Memcached is constant in number of get/hit/miss/set.

Mariadb periodic requests

nova-compute → nova-scheduler

computes periodically sends their instances uuid

See https://github.com/openstack/nova/blob/35b2132723cf2412e42bb5e52f72abaef31dadbd/nova/scheduler/host_manager.py#L682

Controlled by the configuration parameter : scheduler_instance_sync_interval : https://github.com/openstack/nova/blob/407e659eb9c228eb1ec06ec49864279aeab0a1a1/nova/conf/compute.py#L438

  • 100 computes -> 5039 times in 1h48 -> 46 per minutes
  • 200 computes -> 4285 times in 1h25 -> 50 per minutes (do we really have 200 computes here ?) EDIT: there was a misconfiguration in the results we have this is fixed (TODO put the right number here)
  • 500 computes -> 18223 times in 1h14 -> 246 per minutes
  • 1000 computes -> 34996 times in 1h11 -> 500 per minutes -> 8.3 /s

Those number are consistent since by default instances are synced every 120s.

nova-scheduler runs some periodic tasks (but the frequency shouldn’t depend on the number of c nodes)

See https://github.com/openstack/nova/blob/35b2132723cf2412e42bb5e52f72abaef31dadbd/nova/scheduler/manager.py#L79

nova-conductor

From the conductor logs (connection_debug=100)

Each nova-conductor updates its state in the db

  • 1 UPDATE every 10s
  • 25 SELECT / min pour conductor service

Nova-compute updates its state in the db

  • 1 UPDATE every 10s
  • 1 SELECT approx less frequency as above for nova compute

For each compute we have periodic tasks about the state (instances, …)

-> if we try to check every match that aren’t that doesn’t deal with conductor :

cat /var/lib/docker/volumes/kolla_logs/_data/nova/nova-conductor.log | grep -e "-\] (0," | grep -v -e "(0, 7, 1)" | grep -v -e 'nova-conductor'| wc -l

we got something that increases linearly with the number of computes with 12 matches/min/compute

one match is in this context at least one select and sometimes one (select + one update : service update).

Every nova service is reporing its state periodically : it’s controlled by the param report_interval in nova.conf https://github.com/openstack/nova/blob/407e659eb9c228eb1ec06ec49864279aeab0a1a1/nova/conf/service.py#L24

neutron-server

From the neutron-server logs (connection_debug=100)

Agents status are updated periodically thus the load increases linearly with the number of openvswitch agents.

Observation: 1 SELECT and 1 UPDATE / 25s / agent

Low level access to all mariadb queries

This could be used to see what’s missing: https://mariadb.com/kb/en/mariadb/general-query-log/

Notes on connection pooling

Pool of connections are used for the connections to the

  • DB and
  • Messaging middleware

For the DB it relies on the sqlalchemy pooling system : http://docs.sqlalchemy.org/en/latest/core/pooling.html This can be controlled on the openstack side from the following parameters :

  • (nova default) max_overflow (= None),max_pool_size (= None), pool_timeout (= None)
  • (sqlalchemy default) max_overflow (= 10),max_pool_size (= 5), pool_timeout (= 30)
  • (kolla default)max_overflow (= 1000),max_pool_size (= 50), pool_timeout (= -1)

ref : http://docs.openstack.org/mitaka/config-reference/compute/config-options.html#nova-common