how to convert an ansible playbook (yaml) to a python data structure

First of all, this is my very first question on SO - it might not entirely conform SO standards.

I am trying to figure out how to convert an ansible playbook-file to a python data structure as mentioned in this document from docs.ansible.com.

    # create data structure that represents our play, including tasks, this is basically what our YAML loader does internally.
play_source = dict(
    name="Ansible Play",
    hosts=host_list,
    gather_facts='no',
    tasks=[
        dict(action=dict(module='shell', args='ls'), register='shell_out'),
        dict(action=dict(module='debug', args=dict(msg='{{shell_out.stdout}}'))),
        dict(action=dict(module='command', args=dict(cmd='/usr/bin/uptime'))),
    ]
)

The reason I want to do this is, is that Play() does not accept a raw yaml-file and that the example nicely hooks to the provided ResultsCollectorJSONCallback() which gives me a very good way to capture output.. I'm very well aware there is a Playbook Executor but doesn't quite cut it because all output is dumped to stdout.

This piece of code make it possible to capture it to file for each host (also from the documentation):

print("UP ***********")
for host, result in results_callback.host_ok.items():
    print('{0} >>> {1}'.format(host, result._result['stdout']))

print("FAILED *******")
for host, result in results_callback.host_failed.items():
    print('{0} >>> {1}'.format(host, result._result['msg']))

print("DOWN *********")
for host, result in results_callback.host_unreachable.items():
    print('{0} >>> {1}'.format(host, result._result['msg']))

I tried to find any documentation where ansible converts a yaml to that python data structure. The comment clearly says "this is basically what our YAML loader does internally." but I can't figure out how they do it, even tried to figure out how the PlaybookExecutor does the trick but it's quite complicated to see what's really happening. I was hoping to find a yaml_to_datastructure function somewhere in the ansible's yaml-parsing routines but failed to find it. Anyone experience with this?

My playbook just for testing purposes:

---

- hosts: all
  become: no
  tasks:
    - name: create folders
      file:
        path: /tmp/playbook_user
        state: directory
        owner: playbook_user

    - shell: "uname -a"
      register: output

    - name: give output on screen
      debug:
        var: output.stdout_lines

    - name: save output to local directory
      copy:
        content: "{{ output.stdout | replace('\\n', '\n') }}"
        dest: "/tmp/playbook_user/test_{{ ansible_date_time.date }}_{{ inventory_hostname }}.txt"

    - local_action:
        module: copy
        content: "{{ output.stdout | replace('\\n', '\n') }}"
        dest: /tmp/show_cmd_ouput_{{ inventory_hostname }}.txt
      run_once: true    

Regards, Sjoerd


Solution 1:

The data passed to Play().load is just the content of a single play from a playbook. That is, if I have a playbook that looks like:

- hosts: localhost
  tasks:
    - debug:
        msg: "This is a test"

I can load it like this:

>>> import yaml
>>> from ansible.inventory.manager import InventoryManager
>>> from ansible.parsing.dataloader import DataLoader
>>> from ansible.vars.manager import VariableManager
>>> from ansible.playbook.play import Play
>>> loader = DataLoader()
>>> inventory = InventoryManager(loader=loader)
[WARNING]: No inventory was parsed, only implicit localhost is available
>>> variable_manager = VariableManager(loader=loader, inventory=inventory)
>>> with open('playbook.yml') as fd:
...     playbook = yaml.safe_load(fd)
...
>>> play = Play().load(playbook[0], variable_manager=variable_manager, loader=loader)

Etc.

Note where I'm passing playbook[0] to Play().load (the first play in the playbook), rather than the entire playbook.

But if your goal is to run Ansible playbooks using Python, you might be better of using ansible-runner. That looks like:

>>> import ansible_runner
>>> res = ansible_runner.interface.run(private_data_dir='.',playbook='playbook.yml')
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "This is a test"
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

At this point, res.events contains the individual events generated by the playbook run and probably has all the data you could want.