Delete duplicates from over 150,000 bookmarks on Google Chrome
Solution 1:
I'm in the process of dealing with this myself. I had 260,000 bookmarks, the vast majority of which were "phantoms" -- folders with no name, many duplicates, etc. Every time I started Chrome, it would consume many gigabytes of RAM, to the point of really impacting my productivity.
I decided to delete all bad bookmarks, but I had to wipe out the cloud copy of my Chrome sync data entirely to get the change to "stick". I don't have a quick solution, but I believe the following works.
I started with a computer that had a complete copy of my Chrome data. I backed up the profile directory with all that data. (See Where are the user profile directories of Google Chrome located in?).
I started Chrome, waited for it to settle, went to settings in Chrome and turned off sync. Then I went to https://www.google.com/settings/chrome/sync and clicked "Stop and Clear", which disables sync and deletes all your Chrome profile data (including all the duplicate and phantom bookmarks) from the Google Cloud, but it should still be stored your Chrome profile on this computer.
I used the bookmark manager to manually delete all the phantom bookmarks. Luckily, most of mine were organized into duplicate folders so I only had a dozen or so things to delete. It still took a long time. The mass of phantom bookmarks brought Chrome to a crawl -- I'd right click on one of these duplicate folders and it was sometimes minutes before the menu with the "delete" option appeared.
So after getting rid of all those bookmarks on that one machine, I exited Chrome just to give it a chance to recover. I restarted Chrome, went into settings and turned sync back on. It uploaded the remaining bookmarks plus the passwords etc. that are still saved on that computer.
Now on each other computer, I exited Chrome, moved my Chrome profile data to the trash (because those copies of the profile still have all the phantom bookmarks), restarted Chrome, signed in, and just waited until sync could restore all my information.
FYI: I've been looking all over for a way to force Chrome to sync everything right this minute. I've found plenty of reasonable suggestions, but so far none of them work. It sometimes takes minutes or hours before the sync is complete, go figure.
Solution 2:
Having tested all of the suggestions, it seems using bookmark manager to manually delete all the duplicate bookmarks is the most reliable (same behaviour and resolution as detailed in Garrett Mitchener's response above.)
The main sticking point was to ensure only the duplicates were deleted. In other words, get a list of unique bookmarks in the bookmark manager, to compare after the clean-up.
This worked quite well using standard Linux tools on Ubuntu Trusty:
Back up the bookmarks file in case a unique folder accidentally gets deleted:
$ cp -av .config/google-chrome/Default/Bookmarks{,.orig}
‘.config/google-chrome/Default/Bookmarks’ -> ‘.config/google-chrome/Default/Bookmarks.orig’
Get a count of all URLs:
$ grep -c '"url": ' .config/google-chrome/Default/Bookmarks
Get a count of all unique URLs:
$ grep '"url": ' .config/google-chrome/Default/Bookmarks | awk '{print $2}' | sort | uniq | wc -l
Piping grep into awk is a great deal faster than awk matching alone, and awk has to be piped into sort in order to accurately get unique entries.
Stick them all in a file, may as well trim off the extraneous double-quotes while we're at it:
$ grep '"url": ' .config/google-chrome/Default/Bookmarks | awk '{print $2}' | sort | uniq | sed 's/^"//;s/"$//' > Bookmarks-Original.txt
Perform the clean-up in the Bookmark manager, then extract all unique URLs from the bookmarks file:
$ grep '"url": ' .config/google-chrome/Default/Bookmarks | awk '{print $2}' | sort | uniq | sed 's/^"//;s/"$//' > Bookmarks-New.txt
Run the comparison:
$ for URL in $(cat Bookmarks-Original.txt); do grep -q $URL Bookmarks-New.txt || echo $URL; done > Bookmarks-Discrep.txt
Now it's possible to search the original Bookmarks file, extract the metadata for the original and carefully add back to the new Bookmarks file (taking a backup of the newest file first), e.g.
{
"date_added": "13026268601621410",
[...]
"url": "https://wiki.mozilla.org/Security/Server_Side_TLS"
},
If the metadata are unimportant, it's easier just to create new bookmarks for each in the Bookmark manager and move into the relevant folder.