How to manage log in session through headless chrome?

I want to create a scraper that:

  1. opens a headless browser,
  2. goes to a url,
  3. logs in (there is steam oauth),
  4. fills some inputs,
  5. and clicks 2 buttons.

My problem is that every new instance of headless browser clears my login session, and then I need to login again and again...

How to save it through instances? (using puppeteer with headless chrome)

Or how can I open already logged in chrome headless instance? (if I have already logged in in my main chrome window)


There is an option to save user data using the userDataDir option when launching puppeteer. This stores the session and other things related to launching chrome.

puppeteer.launch({
  userDataDir: "./user_data"
});

It doesn't go into great detail but here's a link to the docs for it: https://pptr.dev/#?product=Puppeteer&version=v1.6.1&show=api-puppeteerlaunchoptions


In puppeter you have access to the session cookies through page.cookies().

So once you log in, you could get every cookie and save it in a json file:

const fs = require(fs);
const cookiesFilePath = 'cookies.json';
// Save Session Cookies
const cookiesObject = await page.cookies()
// Write cookies to temp file to be used in other profile pages
fs.writeFile(cookiesFilePath, JSON.stringify(cookiesObject),
 function(err) { 
  if (err) {
  console.log('The file could not be written.', err)
  }
  console.log('Session has been successfully saved')
})

Then, on your next iteration right before using page.goto() you can call page.setCookie() to load the cookies from the file one by one:

const previousSession = fs.existsSync(cookiesFilePath)
if (previousSession) {
  // If file exist load the cookies
  const cookiesString = fs.readFileSync(cookiesFilePath);
  const parsedCookies = JSON.parse(cookiesString);
  if (parsedCookies.length !== 0) {
    for (let cookie of parsedCookies) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Checkout the docs:

  • https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagecookiesurls
  • https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetcookiecookies

For a version of the above solution that actually works and doesn't rely on jsonfile (instead using the more standard fs) check this out:

Setup:

const fs = require('fs');
const cookiesPath = "cookies.txt";

Reading the cookies (put this code first):

// If the cookies file exists, read the cookies.
const previousSession = fs.existsSync(cookiesPath)
if (previousSession) {
  const content = fs.readFileSync(cookiesPath);
  const cookiesArr = JSON.parse(content);
  if (cookiesArr.length !== 0) {
    for (let cookie of cookiesArr) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Writing the cookies:

// Write Cookies
const cookiesObject = await page.cookies()
fs.writeFileSync(cookiesPath, JSON.stringify(cookiesObject));
console.log('Session has been saved to ' + cookiesPath);