Why is there Internal Stage in Snowflake?

Solution 1:

An external stage is managed by you - the customer - and you can arrange files / secure the files in them however you like. Then, when you want to load data from an external stage into Snowflake you just reference those external stages.

An internal stage is managed by Snowflake and you can PUT files into them, everything else about them is managed by Snowflake. The storage of Snowflake internal stages is abstracted away from you. When I say PUT, this is a command that you can run using the Snowflake CLI that takes a local file and uploads it into an internal stage.

As to why internal stages exist? I suppose something along the lines of:

  • For flexibility, you can use Snowflake's internal blob storage (whatever that may be) or you can use your own storage to stage your data.

  • You can use Snowflake and load data into tables quickly without having blob storage of your own.

  • It makes it easier for non-administrator users. End-users of Snowflake can load data into their own tables without having to know how to use s3/azure blob/GCS etc. Each user gets their own little internal stage area at ~ like a home directory. Also, each table gets their own internal stage that you can put into.

Solution 2:

Unique to Snowflake is the concept of stage, it is the last place before data is loaded to a target table.

  1. User Stage, @~ (internal) every user has his own and you can load your file into there using a PUT command from your desktop, i.e. on prem to cloud. This location also happens to be the place where your worksheets are stored!
  2. Table stage, @% (internal) every table has one, you can PUT files there ready for loading to tables
  3. Named stage, @ (internal or external), when external the files are landed in S3 buckets in whatever method you used, maybe PUT. When internal they are loaded into Snowflake's own S3 bucket, using PUT. The difference? You manage the content completely externally if the file is there before loading, that includes encryption and compression. Snowflake does all that management if it is internally.

All content hosted as files externally or internally must be copied into a Snowflake table (COPY command) to take advantage of Snowflake's proprietary micropartitions storage mechanism and things like zero copy cloning. Alternatively you could still keep the files in an S3 bucket as an external stage but register the file as an external table to Snowflake and be able to run SQL on it. These are csv, parquet, avro, orc and json. Of course you don't get the benefits as listed above.

Basically, everything is a file before loading to Snowflake tables (which by the way, with improved compression algorithms achieves better compression too)

For your reading: https://docs.snowflake.com/en/user-guide/data-load-overview.html