What is difference between AWS S3 Select and AWS Athena?
Solution 1:
Also looks like we are missing one major thing:
S3 Select operates on only one object while Athena to run queries across multiple paths, which will include all files within that path.
Solution 2:
You can think about AWS S3 Select as a cost-efficient storage optimization that allows retrieving data that matches the predicate in S3 and glacier aka push down filtering.
AWS Athena is fully managed analytical service that allows running arbitrary ANSI SQL compliant queries - group by, having, window and geo functions, SQL DDL and DML.
Solution 3:
Athena is (from the little I've used it) more intended as a business reporting or analysis tool backed by S3.
S3 select appears to use the same sort of technology, but I would guess it's aimed more at direct use by applications to filter or shard their data sets.
Solution 4:
Amazon Athena : Amazon Athena is a query service that makes it easy to analyze data stored in S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, pay only for the queries. It scales automatically – executing queries in parallel, this makes it to produce faster results, even with large datasets and complex queries.
use cases : Athena can be used to process logs, perform ad-hoc analysis, and run interactive queries and joins. it run queries across multiple paths which include all the files under that path.
S3 Select : S3 Select is an S3 feature designed It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object, which can be up to 5 terabytes in size. s3 select runs queries on a single object at a time in the s3 bucket.
Conclusion :
Athena can used for complex queries on the files, and span multiple folders under S3 bucket.
S3 Select can used for simple queries based in a single object.
Solution 5:
S3 Select makes it easy to retrieve specific data from the contents of an object using simple SQL expressions. There is no need to retrieve the entire object. This can be used with Lambda to build serverless apps and can tied up with Big Data frameworks like Apache Spark and Presto. Can improve the performance up to 400%.
Amazon Athena is an interactive query service. It is serverless. No need to load data into Athena. Built on Presto and runs standard SQL. Mainly used to analyze Big Data.