Handling very large data with mysql

Solution 1:

Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.
Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being ~~nearly 6~~ 8 years old, and ~~on the verge of~~ no longer being supported.
Best data store for billions of rows -- If you mean 'Engine', then InnoDB.
How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.
Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.
Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.
(for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes
having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this critical area.
partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog
MySQL Sharding -- Currently this is DIY
MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.
Partitioning does not support FOREIGN KEY or "global" UNIQUE.
UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.
Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative CREATE TABLE and how you intend to feed the data in.
Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.
Do build summary tables
2,3 million transactions per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.
deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.
Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify fuel_type. There are multiple 1-byte approaches.

Solution 2:

When collecting billions of rows, it is better (when possible) to consolidate, process, summarize, whatever, the data before storing. Keep the raw data in a file if you think you need to get back to it.

Doing that will eliminate most of your questions and concerns, plus speed up the processing.

Is there any reason to use std::map::emplace() instead of try_emplace() in C++1z?

What does "Linking Dependencies" during npm / yarn install really do?

addFontWeightStyle NoSuchMethodException on TextView

Approaches to function SFINAE in C++

How can I graph the Lines of Code history for git repo?

JQuery is waiting for images to load before executing document.ready

Change Default Namespace when creating Class in Folder (Visual Studio)

What can you do in C without "std" includes? Are they part of "C," or just libraries?

How can I add a specific folder from a git repo as a git submodule?

How can I make short random unique keys, like YouTube video IDs, in PHP? [closed]

SQL Transaction Error: The current transaction cannot be committed and cannot support operations that write to the log file