What exactly does the number listed under "places.history.expiration.transient_current_max_pages" represent in Firefox?
From the docs: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Places_Expiration
places.history.expiration.max_pages: The maximum number of pages that may be retained in the database before starting to expire. Default value is calculated on startup and put into the places.history.expiration.transient_current_max_pages preference. This transient version of the preference is just mirroring the current value used by expiration, setting it won't have any effect.
https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Places_Expiration
Points to the actual source code.
The interesting parts are here: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/places/nsPlacesExpiration.js#143
":max_uris" in the sql queries is replaced with the value of places.history.expiration.max_pages
We see that "regular" pages are only removed if the number of moz_places
exceeds places.history.expiration.max_pages
. (search for the places.sqlite file if you want the check your current value)
BUT this query also seems to be active:
// Some visits can be expired more often than others, cause they are less
// useful to the user and can pollute awesomebar results:
// 1. urls over 255 chars
// 2. redirect sources and downloads
// Note: due to the REPLACE option, this should be executed before
// QUERY_FIND_VISITS_TO_EXPIRE, that has a more complete result.
QUERY_FIND_EXOTIC_VISITS_TO_EXPIRE: {
sql: `INSERT INTO expiration_notify (v_id, url, guid, visit_date, reason)
SELECT v.id, h.url, h.guid, v.visit_date, "exotic"
FROM moz_historyvisits v
JOIN moz_places h ON h.id = v.place_id
WHERE visit_date < strftime('%s','now','localtime','start of day','-60 days','utc') * 1000000
AND ( LENGTH(h.url) > 255 OR v.visit_type = 7 )
ORDER BY v.visit_date ASC
LIMIT :limit_visits`,
actions: ACTION.TIMED_OVERLIMIT | ACTION.IDLE_DIRTY | ACTION.IDLE_DAILY |
ACTION.DEBUG,
},
The action list indicate that this runs daily.
It does not depend on expiration.max_pages
and if I read the code correctly it removes visits (not the actual url, but a record of how the url was visited) that are redirects or belongs to pages with urls above 255 chars.
And this:
// Finds orphan URIs in the database.
// Notice we won't notify single removed URIs on History.clear(), so we don't
// run this query in such a case, but just delete URIs.
// This could run in the middle of adding a visit or bookmark to a new page.
// In such a case since it is async, could end up expiring the orphan page
// before it actually gets the new visit or bookmark.
// Thus, since new pages get frecency -1, we filter on that.
QUERY_FIND_URIS_TO_EXPIRE: {
sql: `INSERT INTO expiration_notify (p_id, url, guid, visit_date)
SELECT h.id, h.url, h.guid, h.last_visit_date
FROM moz_places h
LEFT JOIN moz_historyvisits v ON h.id = v.place_id
WHERE h.last_visit_date IS NULL
AND h.foreign_count = 0
AND v.id IS NULL
AND frecency <> -1
LIMIT :limit_uris`,
actions: ACTION.TIMED | ACTION.TIMED_OVERLIMIT | ACTION.SHUTDOWN_DIRTY |
ACTION.IDLE_DIRTY | ACTION.IDLE_DAILY | ACTION.DEBUG,
},
indicates that urls (aka a "place") belong exclusively to such visits will be purged later.. (not sure what h.foreign_count is)
The h.last_visit_date IS NULL
would seemingly save most places, but I've have a bunch of places with "null last_visit_date" that I've most certainly visited.
In conclusion:
Firefox will remove history even when places.history.expiration.max_pages
is not exceeded...
In particular urls longer than 255 chars and download urls. (the url of this page is 119 chars long)
Update: I've verified based on a previous backup of places.sqlite that my firefox installation (100k places, max_pages
set to 500k) have deleted 325 places in the last three months.
Most of the missing entries are garbage. eg. "tracker urls" that end up redirecting to a shorter url that's kept (facebook, google, etc. are the primary "offenders")
The problem is not that these tracker urls are gone, but their visits is also gone, breaking the chain.
Example:
A: Url when I clicked the tracker url: google.com/search?q=give-me-news
B: Removed tracker url: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&so......
C: Actual url: some-newspaper.com/articleX
When I clicked on B two visits was created, A -> B and B -> C
That enables me to later find out that I read articleX because I searched for "give-me-news"
Firefox removes visit A -> B because the url of B is garbage and not interesting to keep around and suddenly it has become much harder to trace back to the source. Still possible to make a good guess but it's no longer a simple SQL query.
If firefox insists on removing such urls (which might very well be the correct thing to do, it would be nice if they could either leave the visit or modify the affect visits. Ie. modify B -> C to be A -> C, possibly keeping a record that a link in the chain has been removed.
Last: Why they insist on removing downloads I don't get - a lot of my downloads have meaningful filenames in the url and would sometimes be helpful to get as suggestions in the omnibar. (eg. quarterly reports)
Making a backup every 60 days seems to be sufficient to keep all history. sqlite doesn't reuse old ids (I think?) so merging the backups shouldn't be too hard.