Google Colab: how to read data from my google drive?

The problem is simple: I have some data on gDrive, for example at /projects/my_project/my_data*.

Also I have a simple notebook in gColab.

So, I would like to do something like:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

Unfortunately, all examples (like this - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb, for example) suggests to only mainly load all necessary data to notebook.

But, if I have a lot of pieces of data, it can be quite complicated. Is there any opportunities to solve this issue?

Thanks for help!


Solution 1:

Edit: As of February, 2020, there's now a first-class UI for automatically mounting Drive.

First, open the file browser on the left hand side. It will show a 'Mount Drive' button. Once clicked, you'll see a permissions prompt to mount Drive, and afterwards your Drive files will be present with no setup when you return to the notebook. The completed flow looks like so:

Drive auto mount example

The original answer follows, below. (This will also still work for shared notebooks.)

You can mount your Google Drive files by running the following code snippet:

from google.colab import drive
drive.mount('/content/drive')

Then, you can interact with your Drive files in the file browser side panel or using command-line utilities.

Here's an example notebook

Solution 2:

Good news, PyDrive has first class support on CoLab! PyDrive is a wrapper for the Google Drive python client. Here is an example on how you would download ALL files from a folder, similar to using glob + *:

!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
  os.makedirs(local_download_path)
except: pass

# 2. Auto-iterate using the query syntax
#    https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
    {'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)


with open(fname, 'r') as f:
  print(f.read())

Notice that the arguments to drive.ListFile is a dictionary that coincides with the parameters used by Google Drive HTTP API (you can customize the q parameter to be tuned to your use-case).

Know that in all cases, files/folders are encoded by id's (peep the 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk) on Google Drive. This requires that you search Google Drive for the specific id corresponding to the folder you want to root your search in.

For example, navigate to the folder "/projects/my_project/my_data" that is located in your Google Drive.

Google Drive

See that it contains some files, in which we want to download to CoLab. To get the id of the folder in order to use it by PyDrive, look at the url and extract the id parameter. In this case, the url corresponding to the folder was:

https://drive.google.com/drive/folders/1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Where the id is the last piece of the url: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk.

Solution 3:

Thanks for the great answers! Fastest way to get a few one-off files to Colab from Google drive: Load the Drive helper and mount

from google.colab import drive

This will prompt for authorization.

drive.mount('/content/drive')

Open the link in a new tab-> you will get a code - copy that back into the prompt you now have access to google drive check:

!ls "/content/drive/My Drive"

then copy file(s) as needed:

!cp "/content/drive/My Drive/xy.py" "xy.py"

confirm that files were copied:

!ls

Solution 4:

What I have done is first:

from google.colab import drive
drive.mount('/content/drive/')

Then

%cd /content/drive/My Drive/Colab Notebooks/

After I can for example read csv files with

df = pd.read_csv("data_example.csv")

If you have different locations for the files just add the correct path after My Drive