Landscape's Openstack deployment fails at Configure Availability Zones

Using the current Landscape's “OpenStack Beta” option to deploy OpenStack on my MAAS setup. I get to 98% completion, with 1 failure on “Configure availability zones”. My settings utilized KVM, Open vSwitch, and I currently utilize Ceph for both object and block storage. When I look at the /var/log/landscape/job-handler-1.log on the landscape machine is see over 100 errors about:

2015-03-05 21:18:38 INFO root RetryingCall for '_get_nova_info' failed, trying 103 more time(s): 2015-03-05 21:18:38 INFO root Traceback: : Missing 4 nova-compute units
/usr/lib/python2.7/threading.py:783:__bootstrap
/usr/lib/python2.7/threading.py:810:__bootstrap_inner
/usr/lib/python2.7/threading.py:763:run
--- < exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:191:_worker
/usr/lib/python2.7/dist-packages/twisted/python/context.py:118:callWithContext
/usr/lib/python2.7/dist-packages/twisted/python/context.py:81:callWithContext
/usr/lib/python2.7/dist-packages/storm/twisted/transact.py:76:_wrap
/opt/canonical/landscape/canonical/landscape/model/openstack/jobs.py:751:_get_nova_info


NOTE: The line number in jobs.py is off as I've added some print statements for debugging. It's the assert in the _get_nova_info() function near line #741 (if memory serves), and yes I'm using the newest version of landscape as of today from the landscape ppa for trusty.

So I modified /opt/canonical/landscape/canonical/landscape/model/openstack/jobs.py's _get_nova_info() function to print out the length of the nova_compute_hostnames and I got zero. So I chased that into /opt/canonical/landscape/canonical/landscape/model/openstack/region.py's get_nova_compute_hostnames() and found that self.juju_environment.get_computer_ids().count() was also zero. So I added a call to self.juju_environment.has_computers() and got false. Then I ran self.juju_environment.get_juju_home() and got /var/lib/landscape/juju-homes/20. (Yes this is my 20th attempt on my 2nd rebuild of the landscape box, I've been at this for awhile). So I ran juju status utilizing the juju home mentioned above and all looked well. All 5 machines and services were started, no pending or error states. (including the 4 nova-compute nodes) Any ideas? I'm somewhat new to landscape, MAAS, JUJU, & python so my debugging is a bit slow.


UPDATE 1:

Per the request I've got the 2 logs (although my home is now #23) juju status and broker.log . I think I now know what my problem is per the snippet of broker.log below. (Thanks dpb for pointing me there) My MAAS machine is giving out the DHCP address to my landscape LXC, but my landscape LXC is not in the MAAS controlled DNS as it's not provisioned by MAAS. Therefore the provisioned machines cannot connect to the landscape server by name.

So that leads me to a related question, is there a good way to have MAAS auto update the DNS with machines that aren't provisioned (or under MAAS control)? If not I'll have to give it a static IP outside my DHCP range and manually set the DNS.

2015-03-06 17:09:50,665 INFO [MainThread] Broker started with config /etc/landscape/client.conf
2015-03-06 17:09:52,382 INFO [MainThread] Starting urgent message exchange with https://landscape/message-system.
2015-03-06 17:09:52,389 ERROR [PoolThread-twisted.internet.reactor-1] Error contacting the server at https://landscape/message-system.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/landscape/broker/transport.py", line 71, in exchange
message_api)
File "/usr/lib/python2.7/dist-packages/landscape/broker/transport.py", line 45, in _curl
headers=headers, cainfo=self._pubkey, curl=curl))
File "/usr/lib/python2.7/dist-packages/landscape/lib/fetch.py", line 109, in fetch
raise PyCurlError(e.args[0], e.args1)
PyCurlError: Error 6: Could not resolve host: landscape
2015-03-06 17:09:52,390 INFO [MainThread] Message exchange failed.
2015-03-06 17:09:52,391 INFO [MainThread] Message exchange completed in 0.01s.


UPDATE 2:

My setup is a bit limited as I was only given 6 machines (5 nodes and 1 controller) to show the capabilities of OpenStack/Landscape so I can't use a dedicated machine for landscape. I was using the landscape-server-quickstart in a LXC on my MAAS controller so I can quickly blow it away and start over fresh.

so I blew away the landscape setup and set the LXC to a static IP, then modified the DNS (controlled by MAAS) to have the static DNS entry for my landscape server. Then I installed Landscape Dedicated Server on the LXC using the landscape-server-quickstart method mentioned above.

After this re-install (mainly to clean out all my debug mess) I was finally able to install OpenStack though landscape. Thanks.


Solution 1:

The "Missing N nova-compute units" message is about landscape-client agents registered back to landscape, Check /var/log/landscape/broker.log on the missing units.

UPDATE:

As you have correctly identified, things work smoothest if LDS (Landscape Dedicated Server) is installed to the same MAAS where your openstack will live, mostly because of network routing and DNS. However, countless variations exist of a valid topology with routes between networks, etc.

Some suggestions on things to try, please read them all. In the end you will need to determine your deployment topology:

  • For a test, deploy LDS to the same MAAS where your openstack will be -- just to check if things are working there. Use the openstack-install tool, or the landscape-dense-maas bundle with juju-quickstart directly to facilitate this.

  • Your clients need to be able to reach LDS, as you have stated. If they can route by IP to where LDS is deployed, you can tear down the openstack install, change your apache servername setting and try again. juju set apache2 servername=IP_ADDRESS. After doing this, follow juju debug-log, make sure all comes up OK, and make sure you can browse to the LDS GUI at that https://IP_ADDRESS/ URL.