Evolution data migration (on update) missed all but the first "accounts" (mail, addressbook, calendar...)

The above PHP got me started. But I had problems with junk in the cards.

I am a Python guy, so I wrote a new program in Python. I used Python 2.x to run this.

This uses a Python dictionary to keep track of email addresses already seen from address cards, and it only stores a given address once. This solved another problem I had, of duplicate card records.

I think the binary junk between the cards was used by Evolution to figure out which cards were valid. This program just uses a couple of rules of thumb: if a card has garbage binary characters in it, or isn't properly terminated, or doesn't have an email address in it, it's a bad card and goes in the "bad" output file.

After you are done converting, you can check the "bad" output file and see if there is anything in there that isn't in the .vcf output file. In my case, there wasn't; this program got all the good cards for me.

#!/usr/bin/python

import re
import sys

def bad_chars():
    for n in range(0, 32):
        if n not in (9, 10, 13):
            yield chr(n)
    for n in range(128, 256):
        yield chr(n)

def has_bad(s):
    return any(ch in s for ch in bad_chars())

def get_email(card):
    lst = card.split('\n')
    for line in lst:
        if "EMAIL" in line:
            _, _, email = line.partition(':')
            return email.strip()
    else:
        return ''

if len(sys.argv) != 4:
    print("Usage: cvt.py <input_old_address_book> <output.vcf> <bad.txt>")
    sys.exit(1)


with open(sys.argv[1], "rb") as in_f:
    s = in_f.read()

s_start = "BEGIN:VCARD"
s_end = "END:VCARD"

cards = {}
lst_bad = []

while s:
    i_start = s.find(s_start)
    if i_start == -1:
        break

    i_next = s.find(s_start, i_start + len(s_start))
    if i_next == -1:
        i_next = len(s) - 1

    i_end = s.find(s_end, i_start + len(s_start))
    if i_end == -1:
        i_end = len(s) - 1
    else:
        i_end += len(s_end)

    if i_next < i_end:
        i_end = i_next

    card = s[i_start:i_end+1].strip()
    s = s[i_end:]

    card = card.replace('\r', '')
    card = card.replace('\0', '')
    if not card:
        continue

    key = get_email(card)
    if has_bad(card) or s_end not in card or not key:
        lst_bad.append(card)
        continue

    if key not in cards or len(card) > len(cards[key]):
        cards[key] = card

with open(sys.argv[2], "w") as out_f:
    for key in sorted(cards.keys()):
        out_f.write(cards[key] + "\n\n")

with open(sys.argv[3], "w") as bad_f:
    for s in lst_bad:
        bad_f.write(s + "\n\n")