Managing Moving Migrations: Who cares?
TL;CR The most important person is the end user who needs to know, at all times, where their data is, but there other very interested parties.
Last week I talked about Managing Moving Migrations: The Only Constant is Change which is why these days we have to migrate data between systems whilst the users are actively using both the source and target systems. I outlined how we used to do it and indicated why it causes problems.
I haven’t provided a new way of doing it, yet, but I’d like to expand on the most important part of any migration, the people involved.
The Users
Users need to know where each piece of content is at all times
This is the single most important requirement. If a user is not sure where a piece of content is when they need that content, then bad things happen:
- They need a bit of content to do their job and they can’t find or access it so they can’t do their job.
- They no longer trust either the source or target system, so they create their own, suddenly you have content stored outside of the corporate system, hello Shadow IT.
- They recreate the document in the source system so now there are two copies of the content one in the target and one in the source that do not agree, goodbye content integrity.
- They recreate the document in the target system so when it is migrated there is a conflict, goodbye content integrity.
Note they don’t need to have access to their content they just need to know where it is. This is one of three states:
- In the source system.
- In transit
- In the target system.
What is fatal is if users can access it in both systems at the same time this needs to be avoided.
Not fatal but important is if the user can’t access it at all, but this can be mitigated by ensuring the user knows its temporarily unavailable and limiting this unavailability as much as we can.
We have identified the first of our requirements for a moving migration:
- Users must know where their content is at all times
- Users must not be able to access the same content in multiple locations
- If a user cannot access a piece of content, then the period it's not accessible must be as short as possible
The Managers
Project Managers can’t use waterfall so they can’t estimate the project
This is a real issue because managers, reasonably, want to know how long the process will take but technology, reasonably, can’t answer that.
Why does a manager need to know how long it will take?
They are managing resources at the end of the day. This may be reflected in staff availability, so if I have three people working on this how long before they can work on something else, or I need to procure people to do this and they cost £1,000 / day, so how much budget do I need.
There is another constraint they are managing, deadlines. Be aware that most deadlines are not actual deadlines they are dates people would like things done by not dates that if not met means that something stops working. Examples we typically see of real deadlines are:
- Burning platforms, that is data hosted on a third party’s systems that is being switched off at a known date.
- Capacity, that is the current system has a hard capacity limit, typically storage size, that cannot be breached and as the system is currently in use will reach that cpacity limit at a given date based on current usage patterns.
A deadline is only a deadline if the project cannot control the date.
Why doesn’t technology know how long it will take?
Essentially because of the sheer complexity of the system and the challenge posed by moving a system that is still in use.
- Unless they are using a third-party tool then they need time to write such a tool, and they can’t know it covers all the data until the process has migrated the final piece of content.
- If they are using a third-party tool, then they don’t know if it will work on all the content until it has migrated the last bit of content.
- Time to move content is dependent on its transfer size and metadata complexity both of which can be hard to report on.
- Many systems use throttling to prevent degradation of service and migration tools typically need to do a lot of calls for reporting and migrating so they will often be throttled meaning performance is hard to predict.
- Content can’t be migrated if it is being updated so this requires some form of retry mechanism.
- As users continue to use the source system it will continue to grow but it’s hard to predict the effect that migrating content will have on the source systems usage so historically usage figures, which are often seasonal and complex, are impacted by the progress of the project.
Frustrating, but the project manager needs to be able to report on progress and budget. So, technology do need to give indicative figures for each stage of the process, not estimates because in reality they cannot say how reasonable they are. This might look something like:
- Develop the toolkit to do the migration, weeks for 2 people.
- Deploy the toolkit to production, hours for 1 person.
- Migrate a representative sample of content, days for 1 person.
- Migrate all content based on current metrics, months for 2 people.
Even if these were accurate it doesn’t matter because of user engagement. Our experience has been that the more users are involved in the process the more chaotic it becomes. The reasons vary but the effect is the same the planned process takes longer than anyone expects. The only solution is to completely exclude users from the process but if you can’t manage that then:
- Some content is seasonal, that is it cannot be migrated at certain times of the year, examples we have seen are annual reports, financial data pertaining to year end, content about scheduled events. This means some content can only be moved in specific windows which have been as specific as only in the third week of any month, except March when it can’t be migrated at all.
- Some users do not have capacity to deal with a content move, or if sign off on a migration is required, time to sign off. This results in planned content migrations being rescheduled, often multiple times.
- Some content does not have obvious owners, especially where it is being generated by processes, and thus the effects of migrating it are hard to establish. Tracking down owners depends on someone who knows the business finding them, a ‘treasure hunt’ not a ‘hole digging’ enterprise.
- Externally generated content needs to have its external generator reconfigured to point to the equivalent target content container, this brings in a third-party dependency because until it is reconfigured new content will keep being added to the source system.
Waterfall is a great project management style where the start and finish of a project is known but when performing a migration between two systems that are in use the finish is not known. When clients have insisted on waterfall for these types of projects then the poor project managers have spent days or weeks just updating the project plan.
There are alternative methods for managing projects worth looking at and they need to be able to manage a continuous evolving process with a stop condition.
Looks like we have another few requirements
4. Managers must have a management method that caters for a continuous evolving process with a stop condition rather than known date.
5. Managers need indicative timings from technology.
6. Managers need user engagement to be minimised to reduce the number of changes during the process.
7. Third parties that generate content must be catered for.
The Developers
Technology can’t practice the migration in its entirety and so can’t be sure it will work
Migration is nasty because its data, and these days often complex unstructured data. We’ll talk a lot more about granularity as this series progresses but with the rise of unstructured data until you’ve moved the very last bit you don’t know the boundaries of the data schema. Even if you’re lucky enough to be migrating a structured data source, such as SQL, then custom extensions often mean discovering structures in JSON and XML held as part of the record.
The only way you know that you can migrate the data is to have successfully migrated it.
What’s worse is that there is never a point in time until the migration finishes that a new type of data or twist on the existing data usage won’t occur. We’ve seen example such as:
- Permissions models that differ wildly between source and target and have external managed aspects, yes, I’m thinking the difference between a ‘classic’ SharePoint Site Collection and a ‘Modern’ SharePoint Site.
- URLs embedded in fields that reference other items in the source system.
- IDs that are auto generated but system dependent so they don’t match as content is migrated between systems.
So even if you buy in a migration tool you can’t be sure it will cover all your content.
This means that the only place you can test the system is in production. You could clone the system and migrate it but there are big issues with this.
- Not all systems can be easily cloned due to complexity or size.
- The clones system will not have people actively using it so you won’t see issues arising from this.
- Its a copy of production data and as such needs to be thought of for GDPR and other compliance and security issues. Many of our clients do not allow the cloning of production data for non-production use.
This means that technology need a process that can be modified as it goes to account for what is discovered during the migration.
Good some more requirements:
8. Developers need a representative, but not production clone, system to test the migration process on.
9. Developers need to be able to halt the process, in the case of unexpected data, resolve the issue and restart the process.
The Governers
Governance can’t easily determine if the content is consistent
Often forgotten but for critical. They ‘own’ the risk associated with the project. Typically with a title like SRO this is the person responsible for ensuring that the project delivers on time, to budget and achieves its goals, in this case that no content is modified in an unacceptable way, and often that an audit trail of the content migration is available.
The requirement to complete on time means they need to know what ‘on time’ means. We have established this is hard to determine, even roughly in advance so they may choose other metrics:
- Peak storage usage
- Active users
- Rate of change of number of pieces of content
- Total cost to date of consultants
- Total cost to date of service, e.g. Cloud costs
Alternatively they may split the project up into a number of pahases each with a gateway review that needs to show progress, even though the total number of phases may be unknown.
They have their own requirements.
10. Governors need to be able to show that all content has been migrated in an acceptable manner.
11. Governors need to be able to show appropriate project metrics.
TL;CR The most important person is the end user who needs to know, at all times, where their data is, but there other very interested parties.