TanveerHaiderBaig: big data removal from table

Friday, October 3, 2014

MYSql : Migrate Data in Big Table

Few days back our client communicated that their archive tables on MYSQL instance are consuming a lot of disk space (> 50GB >10000000 rows) and half of the data is no longer required.

We have to clean the data and schedule regular removal on datetime basis and we cannot change any configuration on server.

It was risky on production to execute direct DML :( CTAS was an option too but when we compare with volume of data, it takes a lot time and server load.

MYSql supports bulk data operation with file very effectively.I have used that path and it was successful.

Following I have described that scenario for viewers. You can set your filter criteria and indexes accordingly.

Have a nice weekend :)

-- Create test table
CREATE TABLE `migtest` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`)
);

-- Create procedure to Load test data
DELIMITER #
CREATE PROCEDURE Load_Test()
BEGIN
DECLARE v1 INT DEFAULT 0;

WHILE v1 < 1000001 DO
    INSERT INTO migtest (NAME) VALUES (CONCAT('John',v1));
    SET v1=v1+1;

    IF MOD(v1,5000) = 0 THEN -- Commit every 5000 records
      COMMIT;
    END IF;
END WHILE;
END #

-- Call stored porc to laod data
CALL Load_Test();

-- counting rows
SELECT COUNT(*) FROM migtest;

-- Creating New host table for data
CREATE TABLE `migtest_b` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(20) DEFAULT NULL,
KEY `Id` (`Id`));

-- Exporting data to file.
-- I am using limit to transfer records. You can use your criteria to export records
SELECT * FROM migtest
-- WHERE Your criteria
LIMIT 500000
INTO OUTFILE 'F:\\lg_test\\logs.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';

SELECT '=============== importing File =======================================';
# Populating new table
LOAD DATA INFILE 'F:\\lg_test\\logs.csv' INTO TABLE migtest_b
FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n';

-- counting rows
SELECT COUNT(*) FROM migtest_b;

-- Table Operations
RENAME TABLE migtest TO migtest_del;
RENAME TABLE migtest_b TO migtest;
DROP TABLE migtest_del;

Friday, November 1, 2013

Delete Millions of Rows From MSSQL Table

Recently we faced a poor performance in one of our ETL routine and we found that it uses join on 2 tables and structure was very simple. Both table has pk index and sql is seeking right results. Only problem was with data size. Our table2 holds about 27 millions records and it joins to table1 that has only 10000 records. So there was cleanup missing that caused not to delete records from table2 :(.

So the task was to remove millions of unwanted rows. we can initiate delete but this huge amount can cause lot of redo and ultimately can lead to table lock by RDBMS. So the good way is to delete in chunks.

here is the test run
--======================
-- Creating First Table
--=======================
CREATE TABLE test1
(id INT NOT null);
ALTER TABLE test1 ADD CONSTRAINT pk_test1 PRIMARY KEY (id);

--Loading data to table 1
declare @i int
declare @rows_to_insert int
set @i = 0
set @rows_to_insert = 10000

while @i < @rows_to_insert
    begin
    INSERT INTO test1 VALUES (@i)
    set @i = @i + 1
    end
-- verifying load
select count(*) from test1;

--======================
-- Creating Second Table
--=======================
CREATE TABLE test2
(id INT NOT null);
ALTER TABLE test2 ADD CONSTRAINT pk_test2 PRIMARY KEY (id);

--Loading data, set @rows_to_insert for rows you want to load
declare @i int
declare @rows_to_insert int
set @i = 0
set @rows_to_insert = 200000

while @i < @rows_to_insert
    begin
    INSERT INTO test2 VALUES (@i)
    set @i = @i + 1
    END

-- verifying load
SELECT count(*) FROM test2;

-- Query to fetch records that exist in table2 but not in table1

SELECT test1.id AS id1,test2.id AS id2 FROM    test2 t2
                LEFT JOIN test1 t1 ON t1.id = t2.id
        WHERE   t1.id IS NULL;

-- Delete in chunk
WHILE @@rowcount > 0
    BEGIN
        DELETE TOP ( 5000 )
        FROM    TEST2
        FROM    test2 t2
                LEFT JOIN test1 t1 ON t1.id = t2.id
        WHERE   t1.id IS NULL

    END