How can I mirror a yum repository but download only the newest versions of each package?

I would like to mirror the following Yum/RPM repositories at http://yum.puppetlabs.com/ :

  • http://yum.puppetlabs.com/el/6/products/
  • http://yum.puppetlabs.com/el/6/dependencies/
  • http://yum.puppetlabs.com/el/5/products
  • http://yum.puppetlabs.com/el/5/dependencies/

The Puppet repository contains every Puppet product ever released and is quite large at about 8GB. I only need to mirror the newest versions of the files.

I have tried to mirror the repository using reposync --newest-only:

reposync --config=puppetlabs.repo.el6 --repoid=puppetlabs-products --repoid=puppetlabs-deps --newest-only --download_path=el/6 --quiet --downloadcomps

and this downloads the newest packages like I need. However, reposync doesn't automatically create the regular directory structure (x86_64, noarch, SRPMS, etc.) and doesn't mirror repodata.xml. As a result, my yum clients get errors like this:

[root@web1 ~]# yum --quiet install puppet
http://mirrors.example.org/pub/puppet/el/6/puppetlabs-deps/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: puppetlabs-deps. Please verify its path and try again
[root@web1 ~]# 

Is there a way to programmatically mirror only the new files from a Yum repo and follow the standard repository directory structure?


Solution 1:

reposync is the only reliable way to do this. You will need to create a small bash script and use reposync parameters (-a) to download each architecture in a separate folder and then run createrepo to generate the metadata.

Here is a small script that I have (it is running on Ubuntu but doesn't matter, you get the idea):

cat sync-repos

#!/bin/bash

reposync -n -c /etc/yum/yum.conf -p /repos/centos6 -d -r base -r updates -r extras -r centosplus -r contrib
createrepo -g /repos/centos6/base/repodata/comps.xml /repos/centos6/base
createrepo /repos/centos6/updates
createrepo /repos/centos6/extras
createrepo /repos/centos6/centosplus

reposync -n -c /etc/yum/yum.conf -p /repos -d -r vmware -r home_xtreemfs
createrepo /repos/vmware
createrepo /repos/home_xtreemfs

reposync -n -c /etc/yum/yum.conf -p /repos/vz -d -r openvz-utils -r openvz-kernel-rhel6
createrepo /repos/vz/openvz-utils
createrepo /repos/vz/openvz-kernel-rhel6

reposync -n -c /etc/yum/yum.conf -p /repos/nginx -d -r nginx-stable -r nginx-mainline
createrepo /repos/nginx/nginx-stable
createrepo /repos/nginx/nginx-mainline

Solution 2:

You can do this with pulp and the yum rpm distributor plugin.

When congifguring a new repo, to get only one verison of each rpm, set the retain_old_count retain_old_count parameter

retain_old_count
Count indicating how many old rpm versions to retain; by default it will 
download all versions available.

So something along the line of:

$ pulp-admin rpm repo create \
          --repo-id=rhel6-puppet-products \
          --relative-url=rhel6-puppet-products \
          --feed=http://yum.puppetlabs.com/el/6/products/ \
          --retain-old-count 1
$ pulp-admin rpm repo sync run  \
          --repo-id=rhel6-puppet-products \

Should achieve what you want. There is a quick start guide which should give you an idea of how the thing works, in case you have not tried it before.