How can I import a large (14 GB) MySQL dump file into a new MySQL database?
How can I import a large (14 GB) MySQL dump file into a new MySQL database?
I've searched around, and only this solution helped me:
mysql -u root -p
set global net_buffer_length=1000000; --Set network buffer length to a large byte number
set global max_allowed_packet=1000000000; --Set maximum allowed packet size to a large byte number
SET foreign_key_checks = 0; --Disable foreign key checking to avoid delays,errors and unwanted behaviour
source file.sql --Import your sql dump file
SET foreign_key_checks = 1; --Remember to enable foreign key checks when procedure is complete!
The answer is found here.
Have you tried just using the mysql
command line client directly?
mysql -u username -p -h hostname databasename < dump.sql
If you can't do that, there are any number of utilities you can find by Googling that help you import a large dump into MySQL, like BigDump
On a recent project we had the challenge of working with and manipulating a large collection of data. Our client provided us with a 50 CSV files ranging from 30 MB to 350 MB in size and all in all containing approximately 20 million rows of data and 15 columns of data. Our end goal was to import and manipulate the data into a MySQL relational database to be used to power a front-end PHP script that we also developed. Now, working with a dataset this large or larger is not the simplest of tasks and in working on it we wanted to take a moment to share some of the things you should consider and know when working with large datasets like this.
- Analyze Your Dataset Pre-Import
I can’t stress this first step enough! Make sure that you take the time to analyze the data you are working with before importing it at all. Getting an understand of what all of the data represents, what columns related to what and what type of manipulation you need to will end up saving you time in the long run.
- LOAD DATA INFILE is Your Friend
Importing large data files like the ones we worked with (and larger ones) can be tough to do if you go ahead and try a regular CSV insert via a tool like PHPMyAdmin. Not only will it fail in many cases because your server won’t be able to handle a file upload as large as some of your data files due to upload size restrictions and server timeouts, but even if it does succeed, the process could take hours depending our your hardware. The SQL function LOAD DATA INFILE was created to handle these large datasets and will significantly reduce the time it takes to handle the import process. Of note, this can be executed through PHPMyAdmin, but you may still have file upload issues. In that case you can upload the files manually to your server and then execute from PHPMyAdmin (see their manual for more info) or execute the command via your SSH console (assuming you have your own server)
LOAD DATA INFILE '/mylargefile.csv' INTO TABLE temp_data FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
- MYISAM vs InnoDB
Large or small database it’s always good to take a little time to consider which database engine you are going to use for your project. The two main engines you are going to read about are MYISAM and InnoDB and each has their own benefits and drawbacks. In brief the things to consider (in general) are as follows:
MYISAM
- Lower Memory Usage
- Allows for Full-Text Searching
- Table Level Locking – Locks Entire Table on Write
- Great for Read-Intensive Applications
InnoDB
- List item
- Uses More Memory
- No Full-Text Search Support
- Faster Performance
- Row Level Locking – Locks Single Row on Write
- Great for Read/Write Intensive Applications
- Plan Your Design Carefully
MySQL AnalyzeYour databases design/structure is going to be a large factor in how it performs. Take your time when it comes to planning out the different fields and analyze the data to figure out what the best field types, defaults and field length. You want to accommodate for the right amounts of data and try to avoid varchar columns and overly large data types when the data doesn’t warrant it. As an additional step after you are done with your database, you make want to see what MySQL suggests as field types for all of your different fields. You can do this by executing the following SQL command:
ANALYZE TABLE my_big_table
The result will be a description of each columns information along with a recommendation for what type of datatype it should be along with a proper length. Now you don’t necessarily need to follow the recommendations as they are based solely on existing data, but it may help put you on the right track and get you thinking
- To Index or Not to Index
For a dataset as large as this it’s infinitely important to create proper indexes on your data based off of what you need to do with the data on the front-end, BUT if you plan to manipulate the data beforehand refrain from placing too many indexes on the data. Not only will it will make your SQL table larger, but it will also slow down certain operations like column additions, subtractions and additional indexing. With our dataset we needed to take the information we just imported and break it into several different tables to create a relational structure as well as take certain columns and split the information into additional columns. We placed an index on the bare minimum of columns that we knew would help us with the manipulation. All in all, we took 1 large table consisting of 20 million rows of data and split its information into 6 different tables with pieces of the main data in them along with newly created data based off the existing content. We did all of this by writing small PHP scripts to parse and move the data around.
- Finding a Balance
A big part of working with large databases from a programming perspective is speed and efficiency. Getting all of the data into your database is great, but if the script you write to access the data is slow, what’s the point? When working with large datasets it’s extremely important that you take the time to understand all of the queries that your script is performing and to create indexes to help those queries where possible. One such way to analyze what your queries are doing is by executing the following SQL command:
EXPLAIN SELECT some_field FROM my_big_table WHERE another_field='MyCustomField';
By adding EXPLAIN to the start of your query MySQL will spit out information describing what indexes it tried to use, did use and how it used them. I labeled this point ‘Finding a balance’ because although indexes can help your script perform faster, they can just as easily make it run slower. You need to make sure you index what is needed and only what is needed. Every index consumes disk space and adds to the overhead of the table. Every time you make an edit to your table, you have to rebuild the index for that particular row and the more indexes you have on those rows, the longer it will take. It all comes down to making smart indexes, efficient SQL queries and most importantly benchmarking as you go to understand what each of your queries is doing and how long it’s taking to do it.
- Index On, Index Off
As we worked on the database and front-end script, both the client and us started to notice little things that needed changing and that required us to make changes to the database. Some of these changes involved adding/removing columns and changing the column types. As we had already setup a number of indexes on the data, making any of these changes required the server to do some serious work to keep the indexes in place and handle any modifications. On our small VPS server, some of the changes were taking upwards of 6 hours to complete…certainly not helpful to us being able to do speedy development. The solution? Turn off indexes! Sometimes it’s better to turn the indexes off, make your changes and then turn the indexes back on….especially if you have a lot of different changes to make. With the indexes off, the changes took a matter of seconds to minutes versus hours. When we were happy with our changes we simply turned our indexes back on. This of course took quite some time to re-index everything, but it was at least able to re-index everything all at once, reducing the overall time needed to make these changes one by one. Here’s how to do it:
-
Disable Indexes:
ALTER TABLE my_big_table DISABLE KEY
-
Enable Indexes:
ALTER TABLE my_big_table ENABLE KEY
- Give MySQL a Tune-Up
Don’t neglect your server when it comes to making your database and script run quickly. Your hardware needs just as much attention and tuning as your database and script does. In particular it’s important to look at your MySQL configuration file to see what changes you can make to better enhance its performance.
- Don’t be Afraid to Ask
Working with SQL can be challenging to begin with and working with extremely large datasets only makes it that much harder. Don’t be afraid to go to professionals who know what they are doing when it comes to large datasets. Ultimately you will end up with a superior product, quicker development and quicker front-end performance. When it comes to large databases sometimes it’s take a professionals experienced eyes to find all the little caveats that could be slowing your databases performance.
I'm posting my finding in a few of the responses I've seen that didn't mention what I ran into, and apprently this would even defeat BigDump, so check it:
I was trying to load a 500 meg dump via Linux command line and kept getting the "Mysql server has gone away" errors. Settings in my.conf didn't help. What turned out to fix it is...I was doing one big extended insert like:
insert into table (fields) values (a record, a record, a record, 500 meg of data);
I needed to format the file as separate inserts like this:
insert into table (fields) values (a record);
insert into table (fields) values (a record);
insert into table (fields) values (a record);
Etc.
And to generate the dump, I used something like this and it worked like a charm:
SELECT
id,
status,
email
FROM contacts
INTO OUTFILE '/tmp/contacts.sql'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES STARTING BY "INSERT INTO contacts (id,status,email) values ("
TERMINATED BY ');\n'
I have made a PHP script which is designed to import large database dumps which have been generated by phpmyadmin or mysql dump (from cpanel) . It's called PETMI and you can download it here [project page] [gitlab page].
It works by splitting an. sql file into smaller files called a split and processing each split one at a time. Splits which fail to process can be processed manually by the user in phpmyadmin. This can be easily programmed as in sql dumps, each command is on a new line. Some things in sql dumps work in phpmyadmin imports but not in mysqli_query so those lines have been stripped from the splits.
It has been tested with a 1GB database. It has to be uploaded to an existing website. PETMI is open source and the sample code can be seen on Gitlab.
A moderator asked me to provide some sample code. I'm on a phone so excuse the formatting.
Here is the code that creates the splits.
//gets the config page
if (isset($_POST['register']) && $_POST['register'])
{
echo " <img src=\"loading.gif\">";
$folder = "split/";
include ("config.php");
$fh = fopen("importme.sql", 'a') or die("can't open file");
$stringData = "-- --------------------------------------------------------";
fwrite($fh, $stringData);
fclose($fh);
$file2 = fopen("importme.sql","r");
//echo "<br><textarea class=\"mediumtext\" style=\"width: 500px; height: 200px;\">";
$danumber = "1";
while(! feof($file2)){
//echo fgets($file2)."<!-- <br /><hr color=\"red\" size=\"15\"> -->";
$oneline = fgets($file2); //this is fgets($file2) but formatted nicely
//echo "<br>$oneline";
$findme1 = '-- --------------------------------------------------------';
$pos1 = strpos($oneline, $findme1);
$findme2 = '-- Table structure for';
$pos2 = strpos($oneline, $findme2);
$findme3 = '-- Dumping data for';
$pos3 = strpos($oneline, $findme3);
$findme4 = '-- Indexes for dumped tables';
$pos4 = strpos($oneline, $findme4);
$findme5 = '-- AUTO_INCREMENT for dumped tables';
$pos5 = strpos($oneline, $findme5);
if ($pos1 === false && $pos2 === false && $pos3 === false && $pos4 === false && $pos5 === false) {
// setcookie("filenumber",$i);
// if ($danumber2 == ""){$danumber2 = "0";} else { $danumber2 = $danumber2 +1;}
$ourFileName = "split/sql-split-$danumber.sql";
// echo "writing danumber is $danumber";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = $oneline;
$stringData = preg_replace("/\/[*][!\d\sA-Za-z@_='+:,]*[*][\/][;]/", "", $stringData);
$stringData = preg_replace("/\/[*][!]*[\d A-Za-z`]*[*]\/[;]/", "", $stringData);
$stringData = preg_replace("/DROP TABLE IF EXISTS `[a-zA-Z]*`;/", "", $stringData);
$stringData = preg_replace("/LOCK TABLES `[a-zA-Z` ;]*/", "", $stringData);
$stringData = preg_replace("/UNLOCK TABLES;/", "", $stringData);
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
} else {
//write new file;
if ($danumber == ""){$danumber = "1";} else { $danumber = $danumber +1;}
$ourFileName = "split/sql-split-$danumber.sql";
//echo "$ourFileName has been written with the contents above.\n";
$ourFileName = "split/sql-split-$danumber.sql";
$ourFileHandle = fopen($ourFileName, 'a') or die("can't edit file. chmod directory to 777");
$stringData = "$oneline";
fwrite($ourFileHandle, $stringData);
fclose($ourFileHandle);
}
}
//echo "</textarea>";
fclose($file2);
Here is the code that imports the split
<?php
ob_start();
// allows you to use cookies
include ("config.php");
//gets the config page
if (isset($_POST['register']))
{
echo "<div id**strong text**=\"sel1\"><img src=\"loading.gif\"></div>";
// the above line checks to see if the html form has been submitted
$dbname = $accesshost;
$dbhost = $username;
$dbuser = $password;
$dbpasswd = $database;
$table_prefix = $dbprefix;
//the above lines set variables with the user submitted information
//none were left blank! We continue...
//echo "$importme";
echo "<hr>";
$importme = "$_GET[file]";
$importme = file_get_contents($importme);
//echo "<b>$importme</b><br><br>";
$sql = $importme;
$findme1 = '-- Indexes for dumped tables';
$pos1 = strpos($importme, $findme1);
$findme2 = '-- AUTO_INCREMENT for dumped tables';
$pos2 = strpos($importme, $findme2);
$dbhost = '';
@set_time_limit(0);
if($pos1 !== false){
$splitted = explode("-- Indexes for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} elseif($pos2 !== false){
$splitted = explode("-- AUTO_INCREMENT for table", $importme);
// print_r($splitted);
for($i=0;$i<count($splitted);$i++){
$sql = $splitted[$i];
$sql = preg_replace("/[`][a-z`\s]*[-]{2}/", "", $sql);
// echo "<b>$sql</b><hr>";
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
}
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
} else {
if($table_prefix !== 'phpbb_') $sql = preg_replace('/phpbb_/', $table_prefix, $sql);
$res = mysql_query($sql);
if(!$res) { echo '<b>error in query </b>', mysql_error(), '<br /><br>Try importing the split .sql file in phpmyadmin under the SQL tab.'; /* $i = $i +1; */ } else {
echo ("<meta http-equiv=\"Refresh\" content=\"0; URL=restore.php?page=done&file=$filename\"/>Thank You! You will be redirected");
}
}
//echo 'done (', count($sql), ' queries).';
}