Impact of Evergreen Platforms on Development and how not to handle it
TL;DR We’ll just update once every so often doesn’t work in an evergreen world, if you can’t do continuous upgrade then be very careful when doing periodic updates.
Again a real story from a real site.
The cloud is evergreen, we all know this but I don’t think that the impact of that is something that the development community really understands.
We went back to a site where we had done some work about eighteen months ago. Specifically we went to migrate from ADAL to MSAL. What we hadn’t appreciated was how much changes in eighteen months in the cloud DevOps world.
- Authentication libraries, ADAL has been deprecated.
- Platform libraries, Azure AD Graph has been deprecated, PnP JS Core has moved from V2 to V3
- Platform as a Service updates, such as Azure Function V4 and language versions, e.g. Node 10 no longer available.
- Azure Best Practice, the use of Microsoft’s Cloud Adoption Framework with its Landing Zones or using Managed Identities instead of Azure AD App Registrations.
- DevOps functionality, such as multi-stage Azure DevOps Pipelines and the deprecation of hosted deployment clients, such as Windows 2012.
- Credential timeout, for example Azure DevOps PAT and Certificate validity now recommended to be three months or less.
So we took a small project and started to upgrade it to the latest version and best practice for everything.
What happened?
It failed in deployment to the test environment, the first issue was around naming conventions, the client had come up with them and they are similar but different to the Microsoft ones. They have automated deployment of some Azure resources, via Terraform, and as a result you can’t define the names of Azure Resources, they get created based on your project’s information. This is good but it threw us.
Then we discovered that the Release Pipelines we had converted to Multi-Stage Pipelines had a few subtle bugs in them. They had relied on Variables and these had been manually expired as they were set to never expire. We reverted to the originals and discovered this was already an issue, not one we had introduced.
We used specific versions of the Azure DevOps hosted deployment machines, so those don’t change but as some of these images are being retired Microsoft have ‘brown out’ periods where they don’t work, to force you to upgrade.
Once we got around all of these environment issues we then realised that updating all of the code to latest versions, often with API changes, was a big task and in many cases it was going to be quicker to rewrite the applications, not what a client wants to hear. We did have one win though as we were able, thanks to Azure Monitoring figure out which applications are actually being used and that due to changes to Software as a Service products, specifically SharePoint Online, we saw that about 50% of the custom extensions were no longer being used as the product now did those jobs out of the box.
What we should have done
We learnt and applied these lessons to the next project.
In essence though instead of trying to do it all in one we took each step one at a time, fixed it, tested it in UAT and then moved to the next step.
- Ensure your baseline, we ran a deployment ‘as is’ into the UAT environment and fixed up any issues with certificates and tokens.
- We upgraded MSAL to ADAL using existing versions of libraries.
- We deployed to production.
- We moved onto the next project.
Once we have solved the immediate issue we’ll need to sit down with the client and agree how to support their custom code going forward, because whilst the code won’t spontaneously change, everything else will.
Lessons learned, or how to future proof evergreen
Be Agile — Be Deployable
Always have a deployable version this means before you start a change check you can actually deploy the current system.
Even if you change nothing then deployments may stop working due to any number of factors. Which when you have an urgent bug fix to get out you really don’t want.
I recommend you set your UAT deployment up on a regular basis, maybe weekly, and you ensure that if it fails someone is notified.
This means you don’t need to be explicitly notified by the person making the change, whether its an internal group or an external supplier. Your UAT system will fail to deploy and that will warn you the same is going to happen in production.
Refactor — don’t rewrite
Its always tempting as a developer to throw out the old and start afresh but this means that during this period you have to maintain two systems and if that stretches to weeks or months, or in one client’s case years, how do you keep the changes needed in sync?
Instead iterate towards the desired state by refactoring. Change one thing, deploy to UAT, test and then move onto the next. You may not be deploying every individual change to production, these might be batched up, but you always can.
For your own sake put each refactor into its own branch in your Source Code Repository. Yes merging branches can be a pain but if you do your work quickly, say in days rather than weeks, the chance of a conflict is small. It makes it clear to everyone what is the current releasable system and means that your scheduled UAT deployment is testing that version and not some ‘half done’ refactor.
Refresh credentials — frequently
Credentials are an issue. They expire. They should expire so if they are compromised then the breach is minimised.
We have become very used to creating credentials and storing them. There are now better alternatives.
- Use resources that manage their own credentials, Azure Managed Identities are a good example, they roll their own certificates automatically so all you need do is specify which identity to use.
- We have had some good experiences with generating credentials on the fly, for example changing the password on a SQL Server’s admin account in a pipeline, attaching, performing the work and then setting the password to a random value, which we don’t store.
- If you have to store credentials put them in a key vault but implement something that will automatically refresh them, we recommend every six weeks or less, then retrieve them as needed. Recently a development team asked us for ‘get’ only permission on a key vault, they didn’t even want list as they know exactly which secret they need and don’t need to be able to scan the vault for all secrets.
Be supported — yes its a hidden cost
You wouldn’t buy software as a service without a support contract but companies do bring in third parties to develop custom applications and then having deployed them either hand them over to internal staff, or more often just leave them alone.
This did work quite well from a cost point of view whilst the platforms were static but in the cloud that doesn’t work any more.
Ideally don’t do custom development, I know we make a living from it but its getting harder and harder to justify as a good way forward for our clients, instead buy services, and pay for support.
However your requirement may not be available as a Service and in this case you will need someone to develop it for you.
When you do ask them how will it be supported going forward?
What will happen, for example, when Azure Functions v1 is deprecated?
What will happen if a security flaw in a library it uses is discovered, log4j anyone?
Put in place a support contract, that provides active support. A reputable company will automate a lot of the tasks so actual overhead could be small but I reckon on our systems each one needs days, not weeks, a year to keep it up to date in an ever changing world.
TL;CR We’ll just update once every so often doesn’t work in an evergreen world, if you can’t do continuous upgrade then be very careful when doing periodic updates.