Challenges with large data during platform upgrade

We’ve confirmed that 4 of the 5 changesets (the ones that ADD and DROP) can be run “online” without locks in MySQL 5.6 (though they still take about an hour each to run), with one caveat… if your database was originally created with a version of MySQL prior to 5.6, the first change will need to lock and copy the entire table, at which point it will be updated to 5.6.

The changeset that alters the value_complex requires a full column copy and can take take 1-2 hours to run… it reading the docs it also appears that upgrading to MySQL 5.7 would not help this since the size of the VARCHAR field is being changed from 255 to 1000 (see https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-overview.html).

I was working yesterday on using Percona Toolkit to do this and have come up with something that appears to be working… it takes a few hours to run, but allows all 5 changesets to be run while the system is “online”… Percona Toolkit stresses through out its docs that you should use read the documentation very carefully, so here’s the command we use in ‘dryrun’ format:

pt-online-schema-change --alter ‘ADD COLUMN form_namespace_and_path varchar(255), ADD COLUMN status varchar(32) NOT NULL DEFAULT “FINAL”, ADD COLUMN interpretation varchar(32), DROP COLUMN value_boolean, MODIFY COLUMN value_complex varchar(1000)’ D=openmrs,t=obs -u root --ask-pass --alter-foreign-keys-method drop_swap --charset utf8 --print --statistics --dry-run

(Note that there are backticks around the column names that seem to get lost in the formatting). To run it for “real” change “–dry-run” to “–execute”… but please review first and note that we haven’t run this in production yet!!

Note that after running this changeset, all the foreign key constraints will have been renamed… you’ll want to rename these back, as future liquibase changesets may reference them. The following should do the trick, but should not be run against a live system since we temporarily turn off key keys… but with key checks off it appears to run in a matter of seconds:

SET FOREIGN_KEY_CHECKS=0;

ALTER TABLE obs
DROP FOREIGN KEY `_answer_concept`,
DROP FOREIGN KEY `_answer_concept_drug`,
DROP FOREIGN KEY `_encounter_observations`,
DROP FOREIGN KEY `_obs_concept`,
DROP FOREIGN KEY `_obs_enterer`,
DROP FOREIGN KEY  `_obs_grouping_id`,
DROP FOREIGN KEY  `_obs_location`,
DROP FOREIGN KEY `_obs_name_of_coded_value`,
DROP FOREIGN KEY  `_obs_order`,
DROP FOREIGN KEY  `_person_obs`,
DROP FOREIGN KEY  `_previous_version`,
DROP FOREIGN KEY  `_user_who_voided_obs`,
ADD CONSTRAINT `answer_concept` FOREIGN KEY (`value_coded`) REFERENCES `concept` (`concept_id`),
ADD CONSTRAINT `answer_concept_drug` FOREIGN KEY (`value_drug`) REFERENCES `drug` (`drug_id`),
ADD CONSTRAINT `encounter_observations` FOREIGN KEY (`encounter_id`) REFERENCES `encounter` (`encounter_id`),
ADD CONSTRAINT `obs_concept` FOREIGN KEY (`concept_id`) REFERENCES `concept` (`concept_id`),
ADD CONSTRAINT `obs_enterer` FOREIGN KEY (`creator`) REFERENCES `users` (`user_id`),
ADD CONSTRAINT `obs_grouping_id` FOREIGN KEY (`obs_group_id`) REFERENCES `obs` (`obs_id`),
ADD CONSTRAINT `obs_location` FOREIGN KEY (`location_id`) REFERENCES `location` (`location_id`),
ADD CONSTRAINT `obs_name_of_coded_value` FOREIGN KEY (`value_coded_name_id`) REFERENCES `concept_name` (`concept_name_id`),
ADD CONSTRAINT `obs_order` FOREIGN KEY (`order_id`) REFERENCES `orders` (`order_id`),
ADD CONSTRAINT `person_obs` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON UPDATE CASCADE,
ADD CONSTRAINT `previous_version` FOREIGN KEY (`previous_version`) REFERENCES `obs` (`obs_id`),
ADD CONSTRAINT `user_who_voided_obs` FOREIGN KEY (`voided_by`) REFERENCES `users` (`user_id`);

SET FOREIGN_KEY_CHECKS=1

Hope this helps, but please use with care and test first! I will report back if we run into any issues…

Also, I guess this should be all be moved to the wiki in the second about upgrading to 2.x… I’ll see if there’s an appropriate place (2.x release notes?)

Take care, Mark

1 Like