Few days back our client communicated that their archive tables on MYSQL instance are consuming a lot of disk space (> 50GB >10000000 rows) and half of the data is no longer required.
We have to clean the data and schedule regular removal on datetime basis and we cannot change any configuration on server.
It was risky on production to execute direct DML :( CTAS was an option too but when we compare with volume of data, it takes a lot time and server load.
MYSql supports bulk data operation with file very effectively.I have used that path and it was successful.
Following I have described that scenario for viewers. You can set your filter criteria and indexes accordingly.
Have a nice weekend :)
-- Create test table
CREATE TABLE `migtest` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`)
);
-- Create procedure to Load test data
DELIMITER #
CREATE PROCEDURE Load_Test()
BEGIN
DECLARE v1 INT DEFAULT 0;
WHILE v1 < 1000001 DO
INSERT INTO migtest (NAME) VALUES (CONCAT('John',v1));
SET v1=v1+1;
IF MOD(v1,5000) = 0 THEN -- Commit every 5000 records
COMMIT;
END IF;
END WHILE;
END #
-- Call stored porc to laod data
CALL Load_Test();
-- counting rows
SELECT COUNT(*) FROM migtest;
-- Creating New host table for data
CREATE TABLE `migtest_b` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`));
-- Exporting data to file.
-- I am using limit to transfer records. You can use your criteria to export records
SELECT * FROM migtest
-- WHERE Your criteria
LIMIT 500000
INTO OUTFILE 'F:\\lg_test\\logs.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';
SELECT '=============== importing File =======================================';
# Populating new table
LOAD DATA INFILE 'F:\\lg_test\\logs.csv' INTO TABLE migtest_b
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';
-- counting rows
SELECT COUNT(*) FROM migtest_b;
-- Table Operations
RENAME TABLE migtest TO migtest_del;
RENAME TABLE migtest_b TO migtest;
DROP TABLE migtest_del;
We have to clean the data and schedule regular removal on datetime basis and we cannot change any configuration on server.
It was risky on production to execute direct DML :( CTAS was an option too but when we compare with volume of data, it takes a lot time and server load.
MYSql supports bulk data operation with file very effectively.I have used that path and it was successful.
Following I have described that scenario for viewers. You can set your filter criteria and indexes accordingly.
Have a nice weekend :)
-- Create test table
CREATE TABLE `migtest` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`)
);
-- Create procedure to Load test data
DELIMITER #
CREATE PROCEDURE Load_Test()
BEGIN
DECLARE v1 INT DEFAULT 0;
WHILE v1 < 1000001 DO
INSERT INTO migtest (NAME) VALUES (CONCAT('John',v1));
SET v1=v1+1;
IF MOD(v1,5000) = 0 THEN -- Commit every 5000 records
COMMIT;
END IF;
END WHILE;
END #
-- Call stored porc to laod data
CALL Load_Test();
-- counting rows
SELECT COUNT(*) FROM migtest;
-- Creating New host table for data
CREATE TABLE `migtest_b` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`));
-- Exporting data to file.
-- I am using limit to transfer records. You can use your criteria to export records
SELECT * FROM migtest
-- WHERE Your criteria
LIMIT 500000
INTO OUTFILE 'F:\\lg_test\\logs.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';
SELECT '=============== importing File =======================================';
# Populating new table
LOAD DATA INFILE 'F:\\lg_test\\logs.csv' INTO TABLE migtest_b
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';
-- counting rows
SELECT COUNT(*) FROM migtest_b;
-- Table Operations
RENAME TABLE migtest TO migtest_del;
RENAME TABLE migtest_b TO migtest;
DROP TABLE migtest_del;
No comments:
Post a Comment