Designed 4 Cloud Limited

Featured

Roles for Cloud Journey

Your journey to the cloud is almost like your journey to a wonderland in the middle of nowhere where you have to travel with all your needs while you are in wonderland via all sorts of terrain on your way to wonderland and every single living things try not to get you to pass them!

It all depends on how long you have to travel to reach your wonderland, how bigger is your crew, what are their needs to support your journey and more than anything how to keep you all safe while you are travelling to the wonderland and while you are in wonderland.

Let’s come back to the cloud world from the wonderland, your journey to the cloud is determined by:

Your size and needs of your organisation
The technology, processes and skills available in your organisation
Appetite for risk to your organisation

These factors determine, which roles or collection of roles below are required for your respective cloud programme.

Stakeholders and Operational Owners
Programme and Delivery Management
Cloud Architecture
Cloud Security
Cloud Engineering
- DevOps
- Reliability

The first two areas of roles are not anything specific to the cloud and are not expected to be specific to the cloud.

Stakeholders and Operational Owners
- Senior Leadership
- Product Manager / Owner (Business and Technical)
Programme and Delivery Management
- Programme Manager / Project Manager
- Scrum Lead / Delivery Lead

Let’s talk about the roles that are specifically required for your Cloud Journey.

Cloud Architecture

The Cloud is already architected by the Cloud providers, then what’s the need for the Cloud Architecture type roles for your Cloud Journey? Are you looking to work for a Cloud Provider?

If you are looking to work for Cloud Providers, the type of roles they need may drastically be different and will be really specialised for each of the services that they offer, so we won’t try to describe the roles for Cloud providers here.

Let’s talk about your roles if you are looking to use Cloud as a customer.

Enterprise Architect – Cloud

The person/role who works with Senior Leadership and other stakeholders to come up with Cloud Strategy, Architecture and Guiding Principles for Cloud Journey that takes your organisation in the right direction.

This role is also supposed to test their deliverables with the cloud subject matter experts and subject matter experts in the broader technology ecosystem.

You are expected to have a broader understanding of your organisation’s business strategy and technology strategy as well as forward-thinking towards the transformation of the same using Cloud technologies.

This is really a critical role for your organisation as, without a clear strategy and direction, your organisation’s cloud journey may not be successful.

Platform Architect

The person who designs the foundational services for the Cloud platform in their organisation, so there will be network connectivity, a secure landing zone for application workloads and customised services to integrate with organisations enterprise capabilities.

This role requires cross-functional skills across pretty much all technology landscapes in their organisation that the Cloud platform needs to integrate according to agreed standards.

If you have come from an infrastructure, networking, security or integration background it may help to move into this Platform Architect role as this role will be considering all those domains as part of the Cloud Platform ecosystem.

Solution Architect

The person designs the application to integrate with the Cloud ecosystem and enterprise services provided by the Cloud platform. They make sure the application workload is compliant with architecture principles and security policies.

You are expected to know the application problem domain, the technologies used within the application regardless of whether it’s a COTS or bespoke solution and their dependencies to function in any environment.

In this role, your focus will be more on solving the specific business problem rather than holistically enabling Cloud Platform for your organisation.

Cloud Architect

This role is a cross-functional role that can go across the Platform Architect and Solution Architect as an extended role to bring Cloud expertise into the architecture practice.

The Cloud Architect is expected to know most of the core cloud services within a given cloud provider and will be able to do hands-on coding where it’s required.

There is a slim line between the Cloud Architect and Cloud Engineer role when you most of the time end up doing coding instead of designing.

But this also provides an opportunity for Cloud Engineers to step into the Architecture discipline as compared to other domains, Cloud Architecture won’t be challenging if you are already a Cloud Engineer, it comes down to understanding the discipline of Architecture more than Cloud.

Cloud Data Architect

This role is a new trend in the industry with data experts with non-cloud experience entering the Cloud environment, the way we treat our data on-premise is not the same in Cloud and hence the introduction of this role.

The data is the organisation’s main asset when it comes to Cloud, which means that needs to be well protected and maintained more than the cloud infrastructure itself.

The Cloud Data Architect will focus on making use of Cloud data services to load, process and store data while still following industry best practices around data modelling, data ingestion, data handling, data processing and presentation.

DevOps Architect

This person designs the DevOps tooling around CI/CD in the Cloud platform. This includes the source control, artifact repository, CI/CD automation tools and processes associated to provide complete automation in Cloud.

You are expected to come from a strong DevOps hands-on background with greater motivation into making use of cloud-native services.

Your expertise is to help design and provide facilities to do DevOps in Cloud for platform and application workload teams, so each one of them doesn’t have to come up with their own way of doing DevOps in the Cloud.

Cloud Security Architect

Enabling security framework, certification of cloud services within the cloud platform, secure identity and access and secure connectivity are all part of the Cloud Security Architect’s role.

For a smaller organisation, usually, the Platform Architect is expected to assume this role as well therefore it’s absolutely important to have security expertise in Cloud if you like to be a Platform Architect or Security Architect in Cloud.

You are expected to come from a strong security consultation background, be familiar with industry-standard security frameworks such as NIST and CIS and be aware of the security framework within your own business domains.

Cloud Network Architect

This role designs the networking from on-premise to Cloud, to and from Internet and connectivity required to integrate other services within their organisation.

In order to be successful in this role, you will need to have a deeper understanding of TCP/IP, BGP and DNS as well as familiar with network devices and how they are interconnected.

In addition, you will also need to have expertise in Cloud network services including Hub/Spoke model, Virtual Private Cloud, Endpoint Interfaces and Gateway network patterns.

Cloud Security

When you are using the public cloud it’s absolutely critical to understand your responsibility in the cloud as a customer, which is highlighted in the shared responsibility model of the respective cloud provider.

Cloud Security Consult

You should have good exposure to industry-standard security frameworks, best practices, threat modelling and risk assessment techniques at the same time, and have a great understanding of your organisation’s risk appetite and security posture.

This role is critical in terms of maintaining the security standards for the Cloud that are applicable for their organisation closely working with wider IT security consultants.

In fact, this role is the custodian of such standards and makes sure all cloud environments are compliant with those standards.

This role operates in an advisory role capacity rather than an operational role and may involve external security advisors for specialised services.

Cloud Security Advisor

This role is normally brought as a reinforcement role to supplement the Cloud Security Consultant role, where this role is expected to bring expertise around industry-standard security practices, frameworks and expertise around cloud security.

It’s quite normal for this type of role was brought in the early stages of cloud adoption to build the Cloud security foundation, processes and even codified automation to help out the Cloud Security Consult and Cloud DevSecOps.

Cloud Engineering

Cloud Engineering is an extended practice of software engineering with cloud infrastructure as the piece of software and a template to create cloud infrastructure as the code for the piece of software.

If you have come from a software development background and have the motivation to learn about Cloud, this discipline is the best fit for you.

Cloud DevOps

This role is no different from the standard DevOps role, where the “you own what you developed even in production” model with the extension of Cloud infrastructure coming into the scope of DevOps other than the software that runs on top of the infrastructure.

You are expected to understand the cloud native template code development, any generic template code development such as Terraform and your application software development language code.

You are also expected to be familiar with SDLC (Software Development Life Cycle) with an extension in Cloud as Infrastructure Development Life Cycle.

You are passionate about automation and do not do anything manually unless by exception when it comes to infrastructure or software build and deployment.

Cloud DevSecOps

This is an extension of the Cloud DevOps role with more focus on security tooling, security controls, security monitoring and security event response.

You are expected to build the security guardrails around the infrastructure, software and data that are present in the public cloud but owned by your organisation.

You work with Cloud Security Architect and Consultant to make sure your security controls and processes are in line with your organisation’s security standards.

Cloud Reliability

The actual reliability of the Cloud is the responsibility of your Cloud provider, so why is this role?

If you are deploying Cloud resources (e.g. Cloud Infrastructure) into the Cloud the responsibility of that infrastructure including the reliability belongs to you as a customer and not to the cloud provider, are responsibly for all those resources.

You are tasked with building observability around your infrastructure and software that runs on the selected cloud provider, make sure they are in line with your internal and external SLAs and always make sure it’s the case.

Your role!

It’s absolutely important to understand your current role, your motivation, your ultimate destiny, and your required skills and experience before you land on a cloud role, just like any other role.

But the key difference with all cloud roles is that you always have to keep on top of the changes coming on your way as the cloud providers are always faster than individuals like yourself when it comes to releasing new services and features.

Choose your next steps towards your ultimate wonderland (Cloud Journey)!

Elasticity is Contextual!

Public Cloud became very popular in a very short timeframe, as it was well advertised by all cloud providers, led by Amazon Web Services calling out the concept of "Elasticity".

The term Elasticity means a lot in business and there is an in-depth definition provided here.

Well, what's wrong with it as advertised by Cloud providers?
 
As an individual I can spin up a cloud resource when I wanted it, scale-out/in as well as up/down and I can terminate if I am done with it. 

Even better, I can go serverless and/or I can outsource creation and lifecycle management to cloud provider and be relaxed, how Elastic that is!

That's all looks great, the services offered by all public cloud providers are looking very elastic in nature, we only get to pay what we use so let's move all workloads to the cloud so all our workloads and the services offered by us also become elastic!

Let's put some context to Elasticity, just like any language, the meaning of the same word can be quite different depending on where and how it's used, would Elasticity be any different?

Startup

If you are a founder of a startup and all your services are born in Cloud, you would have designed the services for the cloud and built the services for the cloud, so you have truly utilised the Elasticity nature of the Cloud. As you were so much interested and excited about Cloud services, you have created everything codified and invested a lot of time in developing lots of automation pipelines and they are always triggered by a successful master merge, perfectly scales out and in and replaced with every update.

As time goes on, you become so busy with looking after the business and non-technical side of your startup, so you went to hire a junior Cloud engineer and an intern to look after what you have developed thinking they just need to maintain what you have already done. You have also asked them to add new features to the existing services and you have also asked them to make sure they always use the latest features released by the Cloud providers.

It has been a few months since you asked for new features, you went to check what they are up to only to learn they are still figuring out how to accommodate the new cloud features into the existing service and didn’t even get time to think about your new feature, which is your main business interest.

For you you only had a couple of months before your competitor may launch similar features, so you need to get those features out immediately. So in desperation, you have hired two highly paid Cloud consultants to deliver those new features within a month. You thought you have told them everything and expected them to work with your junior Cloud engineer and the intern.

After a month, you went to check how they are getting on with the new feature, they have impressively demonstrated the new features in their environment. You were impressed and you are curious what the overall experience looks like when you consume the overall service with these new features. So you asked them to demonstrate the overall service consumption.

They were all confused and asked you at the same time, what you mean by overall service experience, you have just asked us to develop and integrate these new features, which we have demonstrated already and nothing else to demonstrate!

You were shocked and speechless, it took a while for you to recover and realise the situation. The consultants released as a separate branch to get the new features out immediately, you haven’t guided the consultants to work on top of your existing codebase or work with your Cloud engineer and intern. Now not only those new features are not integrated with your existing code base, but also you Cloud engineer and intern not across the entire work done by the Cloud consultants.

You have already promised your investors to launch the new features on time before your competitors, now you have features developed in isolation by two consultants who are starting a new engagement next week and both of your cloud staff are clueless on the new feature and how it’s developed and you are forced to make a decision!

Penny dropped! Yes, everything in the cloud follows the Elasticity principle, let’s deploy the new features alone side the existing services, bring them up/down as needed and integrate via RESTful endpoints so it can be loosely coupled with existing services.

It doesn’t sound like a bad decision, looks like perfect loosely coupled microservices. But then you said to your cloud staff, please integrate both by end of next week and get into production by end of the sprint starting next week, otherwise, your competitors will be doing it before you do so.

Your new cloud staff had a panic situation, slowly recovered and integrated everything by end of next week and managed to get into production. It was excellent news for you and your investors, so you are happy with their performance and moved on.

Your promotion of the new feature becomes massively popular there it has generated a massive amount of traffic towards your site, putting your Elastic Architecture to test. But unfortunately, your customers started getting errors and service denials, initially you have suspected some kind of DDoS attack on your site, but it was quickly turned out to be negative.

Your team later on reported based on their investigation, the original service and the new features disproportionally scales and there is no queuing in between the two, so slow scaling on one component causes stress on the other and eventually the entire site stops responding unless all sessions are abruptly closed!

You have taken a moment to pause and thought about Elasticity that’s advertised by Cloud providers, so all of their services should be Elastic in nature, so why we are not?

Your staff corrected yourself, the Cloud providers provide Elasticity for all of their services, what about our services? We know you have given us a tough deadline to get it to production, which leaves us no time to think about Elasticity for our overall services!

You have been shocked to hear but not surprising as you also forgot about the Elasticity of your services is the responsibility of you and your team, not the responsibility of the Cloud provider.

On the other side, you are tasked with explaining the situation and action to remediate it immediately to your stakeholders and customers, who won’t understand the term Elasticity in the context of your services.

Lessons Learnt

So in summary, you and your team have unknowingly made your technology stack as well as your people/processes brittle instead of making it Elastic!

Your services won’t inherit Elasticity principles by simply deploying your services into Cloud.
Provide right context to your staff and consultants before commencing any major work.

Clearly understand your contextual boundary before architecting, designing and building your solution to make sure your solution follows Elasticity principles end to end, but component by component.

Small Business

That wasn’t a very good experience with your startup and unfortunately, that wasn’t successful as well. The main return on the investment you and your investors contributed are the lessons learnt on the overall startup initiative.

You thought of taking a break from your startup initiative and decided to join a small business as a Cloud Engineering Lead for a team of ten Cloud professionals. The experience you gained around understanding your services end to end from your startup initiative helped to gain experience for a Cloud Engineering Lead role.

You have started the role with great excitement to architect and design your services following cloud-native principles and well-architected framework.’

You have been given the requirement to create a campaign application, considering the bursting nature of any campaign services, you have architected the entire solution mostly using Serverless and spot instances.

You have given a full walkthrough to your team and your team understand end to end had the full context to your campaign application and delivered the completed application into production in three months time.

There were several promotions launched via your campaign applications, your application was perfectly architected for the cloud and was able to handle thousands of responses at the same time and queued the responses for the next steps.

Business anticipated the popularity of the campaign and had more business operations staff hired during the campaign period to process all responses and were able to load the service activation requests into the main legacy on-premise system.

The legacy on-premise system has monolithic architecture and to reduce the load on the main database, there was a throttling set up by the system admin who left the company last month and unfortunately not been documented or communicated.

Due to this, even though your campaign application and the business operations team can handle the bursting responses, all get queued by your on-premise legacy system and generated frustration among your customers and even some customers discarded their original responses and went with your competitor as your competitor also had the similar campaign and they were able to handle the load.

You are now tasked with understanding the legacy on-premise application, making it scalable and changing the throttling setting, so all your customer responses are processed on time and especially not to lose any more customers to your competitors.

Your team worked hard including day and night to understand the code, create a manual deployment process and able to deploy the fixes to your legacy on-premise application so the throttling setting can be relaxed. But this whole process, took almost a month and more than half of your customers went with your competitors as they don’t want to wait that long.

On one side you have your team delivered more than what you expected and on top of that, they did more than what they to stabilise the legacy on-premise system, while on the other side not only business is not happy with the outcome, but also the story made public and it has impacted your company’s reputation as well.

Lessons Learnt

Understand your technology landscape end to end that forms your business, not just the service that you offer.
Make sure that the agility expected from the business is factored in before move your workload to cloud.

Though it wasn't a great experience, you felt that you and your team has learnt a lot and subsequently were able to handle a similar situation well in advance.

Large Enterprise

Since leaving the Cloud Engineering Role, you have joined a large enterprise via multiple promotions and finally, you made it to the CTO role where you are in charge of the entire Technology and Operations of your enterprise.

You have been getting constant pressure from business owners, senior leadership and even from the CEO around even increasing technology running costs in terms of hardware, licensing, people and management costs.

You also see the data centre lease coming up for renewal in a couple of years and all of your on-premise hardware support run out at the same time. In terms of your TCO (total cost of ownership), your on-premise data centre, hardware and maintenance cost on both mounts to 40% of your entire technology expenditure per year.

You thought it’s a perfect opportunity to migrate everything to the cloud as once into the cloud, we should be able to scale as needed because cloud services follow Elasticity principles and also make sure you factored in all your learnings from your past experience elsewhere.

You have asked your teams to map out the entire technology portfolio, group the servers and storage into applications and perform assessments to understand the treatment of each of the applications.

You have also asked each of the application teams to make sure they look from end to end when they plan to migrate their applications into Cloud.

These were the two key lessons learnt from your previous engagements respectively from small business and startup, assuming you got all right this time!

You also have appointed a programme manager to oversee everything, who is also expected to appoint delivery leads to focus on each application domain.

Each team has taken a quick start and deployed their non-production instance into Cloud and got it working with local accounts and standalone.

Once each team started to integrate with other applications, all tried to create network connectivity from their cloud network to other applications cloud network, which needs to be done by an already constrained network team.

There are almost a hundred different applications, a cloud network for each of the applications per environment and some of them need to integrate into each other and all of them should be reachable from the office private network in order to use them.

Though there are hundreds of applications, all network connectivity needs to be designed by one central networking team, who work closely with the single enterprise security team and both of these two teams become so busy on requirements gathering and design and couldn’t even connect one cloud network for another three months!

Your programme managed decided to escalate the matter to yourself as the whole programme delayed by six months now without having single network connectivity established with a cloud environment.

At the same time, the security team via the head of IT security escalated another concern which they saw consistent across all of the application workloads. The security team found out that no application team is integrating with the central directory server for integration as there is no network connectivity and also no one has set up federated single sign-on as the identity and access management team is also busy gathering requirements and design with all of the cloud workload teams.

When you hear both the escalations you were speedless for a moment as you come to a realisation that none of them going to be migrated to the cloud on time, which you were planning to complete within two years, so you didn’t have to renew the on-premise contract, which is due in a couple of years time.

Your technical background tells that’s something not right, but you are not quite getting there. But you also realise it’s too low a level for you to get involved, but there are some strategies missing altogether.

To investigate further, you have asked your programme manager to bring a consultant and to produce the report on the critical issues that are delaying the overall progress.

The consultation took another three months and now you have almost spent one year with no workloads functional in the cloud, so you were desperately waiting for the report.

The report highlighted a few critical issues:

All of the workload teams were trying to create point to point connectivity back to office network via on-premise network.
All of them brought specific cross application connectivity requirements, which could be dealt as network filters rather than point to point networking.
There is no consistent strategy for single sign-on setup, which created more work for identity and access management team to force to come up with standard patterns, which wasn’t planned.
Most of the workload teams doesn’t have infrastructure background, which created more confusions to network and IT security teams as they received the requirements from workload teams.

As you go through these findings, again and again, you were still clueless as to what’s the next steps. As you read it again and again, the last finding finally strikes your mind, are we missing a team here altogether?

You also realised that on-premise resources have limits and are not elastic, so we need to create reusable patterns for networking, single sign-on and anything else that require on-premise resources.

All are busy with their own workloads who is going to provide consistent networking from on-premise to cloud and who is going to work with the identity and access management team to have a consistent single sign-on pattern that can be used for most of the workloads?

That’s the right conclusion, but at the wrong time as one year is already over and you have only one more year before the on-premise support contract expires.

Does it actually mean, most of the cloud migration takes more than two years or even more than five years?

Hang on, if I had planned for upfront work on networking, identity and access management, any other standard patterns work upfront, by now I should be able to have workloads active in the cloud?

This could be accurate, but what’s the fundamental principle hidden behind all these?

Lessons Learnt

Elasticity is Contextual, if you introduce resources that are not elastic by nature such as on-premise infrastructure, then you need to work with it and maximise the reusablity otherwise you will be heavily constrained by your on-premise infrastructure no matter whatever Cloud offers you.
Elasticity should be scalable in context of a large enterprise otherwise not only cloud but nothing will provide Elasticity across enterprise.

Executive Summary

Elasticity needs to be considered as a contextual property and needs to be assessed within each context before getting deeper into cloud work.

In fact, there isn't anything you need to do to make cloud Elastic, but you will have to do a lot outside the cloud to make use of Elastic cloud aspects.

Elasticity should be considered end to end for each service.
Elasticity should be there across business processes and technology stacks.
Elasticity should be able scale at enterprise level.

Disclaimer: This article was produced in my own capacity; no association could be assumed with the organisations that I am helping at present or helped in past.

Cloud (Data) Security – An unfinished business

When it comes to Cloud Security, the first thing that comes into everyone’s mind is that everything running on someone else’s data centre and how their only asset, which is their data can be kept secure there. This has pretty much driven the threat modelling for Cloud security including everything in transit and the rest are to be encrypted with your own key, added layers of network flow controls, identity protection and continuous compliance monitoring in other words defence in depth.

It sounds like a perfect security control and no one can access the data, yes, no one can access the data including the ones who legitimately need to access, update and delete! Then how do we know a person is a known person, for their role they need to have access to a particular dataset and how do we track what, when and why they access the data?

You may feel like I am trying to teach basic data security, yes in a way, it’s right, but it’s not easy to get basic data security right, let me tell you a story.

In those days people used to keep cash and jewellery at home before banks introduced vaults. Once people started using vaults in a bank, they occasionally visit the bank proof their identity, record their visit and get access to their valuables. When they take the valuables they put them into some protected briefcase, wallet or portable vault in transit and do the same at rest in their home. These are only retrieved when there is a need once the need is achieved these are placed back to where they belong in either short-term or long-term secure storage.

In this story, people never leave their valuables all over everywhere or even anywhere in their own house that’s seen or accessible by anyone, people were quite mature in terms of their awareness of threats and how to protect the valuables in transit and at rest when it comes to their own valuables.

Now if we move from valuables to sensitive data containing an identity document, such as a passport, people still keep them securely but not necessarily in a vault and may casually use those in public places. There is a reason for it, unlike valuables, the owner of the passport only can use it and no one else can use it even if they steal it. In other words, there is another factor to validate the authority to use the passport at the time of use and every time it’s been used.

If we think about the data protection provided to the passport, just because it cannot be used by someone we don’t leave for anyone to look at it as the information on the passport is still sensitive, but the level of protection is not the same as other valuables again because the passport only can be used by the owner and this needs to be validated for every single use.

Likewise on Cloud, we won’t leave the sensitive data open to the internet, but we do have to make sure the data can be only used by the owner and parts of the data accessible by others as delegated by the owner. Like the passport, whenever someone wanted to access the data there should be a validation with audit events and at any given time the owner of the data can revoke the access delegation.

The Cloud environment is perceived to be insecure by nature as it’s someone else data centre and someone else physically has access to the storage devices holding the customer data, the data security services, processes and maturity well above how it has ever been handled in an on-premise data centre or even in a corporate office.

These days, where there is a data breach it’s due to immature practices the customers adopted from on-premise data centres or corporate offices, rarely been the Cloud provided themselves to become responsible for this.

But it takes many years to move data from an on-premise or corporate environment to a Cloud and most of the time, the majority of the data is retained back on-premise mainly due to the perceived risk of Cloud being always higher than on-premise.

Now let’s take a look at some of the control one may introduce in Cloud to keep their data secure.

Encrypt and Bring Your Own key – This is 101 of Cloud data security, otherwise probably not consider Cloud migration at all.
The key is protected by access policies and mandated to have another factor if it’s for human access, just like how a passport works.
If it’s for an application or a cloud resource, the permission is only added for the lifetime of the application or the cloud resource and this is done by automation.
Any automation that alters the access policies of the encryption key must be controlled by approval.
All access to the key, data and approvals are to be audited.
All access including the access to the root of the trust must be able to be revoked, when this happens, it requires a break-glass process with one-time elevated access that needs to be approved.

Wow! yes, it’s not easy to implement Cloud data security, but once it’s implemented and proven to be consistent with the right level of automation, it’s almost impossible to penetrate.

There is a fundamental principle for Cloud data security, it’s identity-based as there is no boundary for the network to trust and you should control the root of trust and the encryption key. Almost all Cloud data breaches occur by penetrating a trust application or a cloud resource, which is exposed, which may not be the failure of the Cloud data security, but failure of the respective affected application or cloud resource.

Hence there should be sufficient controls before an application or a cloud resource can be trusted, hence permitted to access a certain dataset, even better there is an access challenge every time and only temporary access is granted based on the source interacting with the application or cloud resource.

Yes, again it’s not easy to establish a mature Cloud data security process, but if it’s consistently established with the right level of automation, you probably can provide Cloud data security as a service!

Now let’s take a pause on Cloud and move back on-premise well to the ground!

let’s talk about the on-premise data centre, it’s an on-premise data centre, but is it your data centre? Well, no, as you know, most businesses using data centres don’t have running data centres as their business. So it mostly belongs to someone else and there is a contractual arrangement standing in between your data and your data centre supplier. I am not going to get into how we know they can’t access your data and if it ever occurs how do you know when what and why they accessed it, as I am pretty sure most of you have a smile as your read this 🙂

I also hear some of you saying, you know not many get into the data centre and there will be physical access controls, biometric controls and surveillance cameras operating with guards manning the data centres. Yes, I hear you, but don’t we have on-premise data centre breaches at all? They are not reported like how Cloud breaches are reported. Most of the time, when on-premise data centre breaches are involved, the responsibility is with the operators’ own control failure rather than the customer’s control failure.

This responsibility difference is one of the key factors for slowness in Cloud migration as if you can always find data centre operator responsible as far as if you keep the data on-premise, why even bother you migrated to Cloud that also for you to become responsible? Yes, this is again a perceived responsibility, I am not sure how clearly it’s stated in any on-premise data centre services agreement that’s the customer data is the responsibility of the operator rather than the customer? Really, probably worth reading again!

Now let’s move into your corporate office or in general your company premises. As you know the level of access is not the same as that of your on-premise data centres, probably doesn’t even require biometric verification and could even have back doors for various suppliers to come in and so on. But mostly there will be good controls around what computers are allowed to be connected to your office with multifactor to login to your company account.

So this creates a perceived trust on who can access your company network, which is most likely going to be a staff member and depends on how many staff in your company or allowed to be in that network, they all are in a flat network of trust!

Normally human beings or in general, any living being like to be in a secure environment and once they are in a secure environment they like to be a bit more independent, do what they think is right, like explore things and share without a second thought is not even an issue.

If you take the pre-Covid-19 scenario, everyone can handshake, everyone can hug each other and sometimes even share food with strangers and no one sees any issues until Covid-19 become a thing!

Yes, it has not only shaken the world but has changed the mindset of the people, no one can live in a secure environment in the context of Covid-19 and people used to adopt living in an insecure environment and trust no one including their friends and not only that, even themselves! Yes, no handshake, facemask and even sanitise their own hands before using it. Covid-19 has changed the way humans perceived a secure environment as nothing secure. People adopted the new way of life for over many years and probably it’s the life going to be forever!

Let’s go back to your company office, people are still in a trust mindset within the company mindset, share data and sometimes even share or keep it in a shared drive, which is pretty much open to the entire company network. All of a sudden one of the staff members got an email, the staff was really curious to see the attachment, opened it up and all of a sudden before the staff realised everything stored in the staff’s machine as well as in the shared drive encrypted with hacker’s key and hacker asking for a ransom to release the data or forget about it!

The moral of this story, you probably cannot trust anything including your own inbox regardless of whether you are using Cloud or an on-premise data centre. Data is your asset and it’s your only asset, so don’t leave it everywhere or even anywhere that’s open to someone else even if it’s a staff member or even yourself.

In other words, treat data as valuable, only take it out of the vault when you need it and put it back in the vault when you don’t need it and when you need it next time you are supposed to prove who you are and the owner of the data should be able to see who, what and why their data been used for every single use.

The companies already established this kind of matured process for data governance in their company environment, they have already changed the mindset of the people and when the data goes to Cloud, the people’s mindset doesn’t need any more changes as they already have established the data governance maturity that’s required in Cloud.

On the other side, if you are two clicks away from a data breach, probably better to implement a matured data governance process within your company environment before even planning about Cloud migration.

Your transformation does not necessarily have to wait for Cloud migration, it can begin now, but if you are not willing to go through the transformation, you are just accepting the fact, that your data breach is clearly two clicks away by any one of your staff members!

New Zealand adopted a position in the early stage of Covid-19 lockdown, treat as yourself got Covid-19 and behave responsibly! When you get Covid-19, you will stay away from others and implement your own self-quarantine. This is the mindset you need to create regardless of whether you are in the company network or Cloud, nothing to be trusted including yourself. Remember, you are represented by your account, which could be hijacked by anyone, so technically physically you are not representing yourself, it’s a digital representation of yourself, how can you trust it?

Disclaimer

This article was produced in my own capacity and experience so it could be beneficial for others; no association could be assumed with the organisation that I am working for now or the organisations that worked in past.

Networking Strategy for Cloud transition

If you decided to build a house in the wild that you have to reach via a narrow and steep footpath, where you want to live and not worry about taking anyone else with you or commuting to anywhere else, then it’s not probably a bad idea.

But sooner you see, more people trying to move closer to where you are and you want to commute elsewhere for various reasons, you might start to think have I made a poor decision!

Yes, for your requirements ideally you should have laid the road networks before you move in and also able to scale in and out your road network based on your demand.

Public Cloud

In the early days when the Public Cloud was introduced, there was a perceived understanding of no longer network expertise required in the cloud, as the networking is managed by the Cloud provider in the Cloud. This has soon become a false assumption due to various factors unless you are a startup and only using one Public Cloud provider in a single region.

Large on-premise networks and office networks that requires connectivity to Cloud networks.
Most of the applications in the Cloud requires connectivity to on-premise, Cloud providers backbone and as well as the public internet.
Some of the shared capabilities such as monitoring, tooling, service bus, domain controls, DNS, API Gateway and many more require connectivity to pretty much every single Cloud applications, which means every single private Cloud network.
The number of applications that needs to be transitioned to Cloud and limitations from Cloud providers and RFC1918 private IP address space.
Securing Cloud backbone and internet access from internal networks.
Able to support network segregations level required by company security policies.
Able to support local and global on-premise and Cloud provider regions.

And this list will go on and it’s only going to be longer and never going to be simplified.

It requires quite a lot of due diligence, design and planning to make sure your networks are still manageable while keeping all of your stakeholders happy. This also means, unfortunately, before you start your Cloud transition journey, you have to secure people with network expertise who can extend their network expertise from on-premise to Cloud and beyond!

The areas expertise across on-premise and Cloud requires at least the following:

Routing or Layer 3 expertise, both in terms of static and dynamic routing.
Network firewall expertise, in terms of network filtering, inspection, detection and prevention using modern anomaly detection techniques and large scale policy lifecycle management.
Web application controls, in terms of policies, web application firewall rules, API specific policies, inspection, detection and prevention using anomaly detection.
Internal and external DNS, in terms of managing top-level domains, split DNS, forwarding across on-premise to Cloud and traffic management.
Expertise in Internet Edge protection services such as DDoS, CDN, data loss prevention and anomaly detection.

Once again, this list will grow as new trends and threats appear.

Depends on the stage of the Cloud transition, what expertise needed can change so the network specialists should be able to adopt an ever-increasing portfolio of services. In other words, the skills and expertise are elastic, you stretch and contract as you see fits into your organisations demand.

On the other side, no one can predict unless it’s been very well planned when the network of an organisation is going to expand or shrink and again like elastic. This also includes expanding into another region if that’s allowed according to your company policy.

Elastic Network as a Strategy

AWS has named its network interface as Elastic Network Interface as it can be attached or detached from an EC2, but it’s still limited within an Availability Zone. But “Elastic Network” in the context of this blog, is the “Network Strategy”, not AWS ENI!

Able to expand and shrink your internal and external network across on-premise, Cloud and Internet.
Able to expand your network team’s skills as you grow and able to allocate and release network addressing as needed across on-premise and Cloud.
Able to expand from local to global as needed across your on-premise, office networks and Cloud regions.
Able to easily provision and de-provision network services from Layer 3 and above with minimal interruption to applications.

In other words, worth re-considering your Cloud journey if your organisation is lacking any one of the above elasticity as in the long run it may lead to blockers, technical debts, unhappy stakeholders and customers.

Networking First

Every network landscape has a different construct, it’s really recommended to create a network model per each scenario.

Network Model for an on-premise application including access from the office network.
Network Model for a Cloud application including dependent services and access from the office network.
Network Model for each environment and environment segregation by network or application.
Network Model for an external-facing on-premise application.
Network Model for an external-facing Cloud application.

There are no limitations for Network Model, so focus on what network models that you need to consider based on your organisation’s strategy.

An internal application deployed in AWS VPC that requires access from the office network, requires connectivity from office network to AWS VPC, routing in place for on-premise to and from AWS VPC and network access allowed in on-premise firewall, AWS NACL and AWS Security Groups for same.

The same will be more complex if there are dependencies on-premise or and if the application is for an external-facing application.

The below snippet provides a sample network provisioning request for an application:

enable_network(app) {
	enable_routing_onpremise(app) 
        # internal as well as advertised to cloud
	enable_routing_cloud(app) 
        # via Cloud Gateway or Hub
	allow_inbound_outbound_onpremise_firewall_rules(app)
	allow_inbound_outbound_cloud_firewall_rules(app)
}

If there are hundreds of such applications and if this network model is not managed very well, it can lead to way more complex network and can potentially slow down the cloud transition as the network team managing the same may not have more time to manage more networks.

In case, if there is another Cloud provider to be considered for Cloud transition as a second provider or even as a replacement, things are just going to be going out of control in terms of network management.

Now you can appreciate the level of complexity, ideally, this entire provisioning can be automated end to end for an application and the same for de-provisioning and updating, all using “application” as the “identity”. If your organisation embracing the “zero trust” model as a strategic position, defining this identity model based on an application from the ground up is quite critical.

In Cloud, the concept Security Group or the firewall ruleset associated to the network interface of an application or load balancer could be an example of an “identity” as far as this identity is owned by each of the application team.

Network Team Driven

If you are planning for a large scale Cloud transition, the first team to get trained to be prepared MUST be your network team, other than your security teams. The infrastructure team and application teams come next.

Your funding vehicle here could be an interesting challenge, if your organisation’s strategy is operational excellence and funding is only allocated for aligning towards that strategy, then it needs to be handled very carefully by still meeting stakeholders’ expectations while working towards “Elastic Network”.

If your network team need to collectively improve the operational model, then all of their services need to be clearly defined, automated and key matrices such as SLA are to be measured based on application onboarding, maintenance and decommissioning. For the organisation, operational excellence is all about how easy to introduce a new application, manage and dispose of it.

The following is a high-level guideline to define the operational model:

Based on the application, select the Network Model
Based on the Network Model, select the dependencies
For all dependencies, define the interface to execute.
For each interface, work with the appropriate team to implement possibly using automation.
If that appropriate team is your Cloud infrastructure team and if it’s funded, you can engage them to automate it.

The last step above is agnostic of which Cloud Provider and your network team not necessarily have to debate about whether talking about it or even thinking about it is funded, in other words, if you are not even thinking about it, you may not be aligned to your operational excellence strategy when Cloud transition starts!

Aligning to your organisation’s strategy is a commitment expected from every single person in your organisation, business or project management won’t say how when it comes to networking, it’s in your hand to make it work, so that’s more important to remind yourself before next time you start debating about in scope vs out of scope.

Operational Model for Elastic Network

Let’s look at how an operational model looks in a truly Elastic Network aligned network strategy.

One of your application team wanted to want to deploy their application into Cloud and they requested the network team:

The network team select the right network model and provision all network constructs end-to-end.
The network team updates the CMDB for the respective application with the new network.
The network team handover the network to the application team.

Now the application team has gone live and they requested to decommission the on-premise network that hosted the on-premise instance of the application.

The network team select the on-premise network from CMDB using the application identifier.
De-provision the on-premise network that belong to the application.
Update the CMDB as the on-premise network de-provisioned.

The application team received a feature upgrade, now they requesting on-premise access from their Cloud application.

The network team select the Cloud network from CMDB using the application identifier.
The network team updates the network model to introduce on-premise connectivity in CMDB.
This event triggers an update to the Cloud network of the respective application, this may involving creating on-premise connectivity if not exists already.

The application team once again received a feature upgrade, now they wanted to connect to a SaaS service from their Cloud application over the internet.

This pretty much follows the above sequence, except the fact, instead of on-premise connectivity it’s the Internet Egress connectivity is requested.
Also additionally the SaaS application should get registered in CMDB and its own network model, if not registered already.

Now the sales team found external customers, who wanted to access the same Cloud application, so they requested the network team to allow access from the internet.

Once again, the sequence is quite similar, but this time it’s the Internet Ingress will be allowed and a new network model will be selected.
Based on your organisation, there may be more teams interested in case if you opening up your application to the public internet.
Notify all your interested parties including invoking their process as sub-processes.

Now as the application exposed to the public internet, it requires additional scaling capacity and it’s currently limited by the network size.

The network team updates the network and increased the existing CIDR or added a new CIDR.
This triggers updates to the already provisioned network based on the latest network model as per CMDB.

Every service offered by network team here comes with it’s SLAs and better the SLA the better the Operational Excellence, better the Operational Excellence the transition to Cloud also can speed up.

Even better, due to agility and pay as you go model in Cloud, the Operational Excellence improved more in Cloud than in on-premise.

To quantify how important your network team’s ability to execute the “Elastic Network” strategy, let’s say you have a hundred applications and your network team takes a day to provision just the network for each application, which is almost a year. If your network team takes a week a two, this simply become 5 to 10 years, again just to provision the network for your Cloud application.

For the application and infrastructure teams, they have their own work to provide each of the application in the Cloud and normally that takes much longer than networking. So if you have some serious Data Center Exit Strategy, then make sure your network team is able to execute the change in days or even better hours rather than weeks or months.

No Silver Bullet

You might be wondering whether I am re-inventing SD-WAN solution as part of this blog, the SD-WAN solution may help “Elastic Network” strategy, but SD-WAN itself does not guarantee help to realise “Elastic Network strategy without people, process, modelling and readiness.

Even with SD-WAN, you will still have to create Layer 2.5 circuits, manage Hubs in Cloud as well as on-premise and use another solution to provide network controls, inspection, detection and prevention.

Once again, cannot emphasise more on people and skills required before you start your Cloud transition journey.

The Strategy is yours

Any strategies, such as Elastic Network, belongs to you, you cannot expect the Strategy from your partners or solution providers. Your partners, vendors and other solution providers may be able to help provide the solution to realise and execute your strategy but the Strategy always belongs to your organisation and you cannot purchase it from anyone.

In order to make use of the Elasticity of Cloud computing, your network should also be elastic, like Cloud you want to provision networks when needed and de-provision them when no longer needed, able to scale out and scale in as you see it fits. In other words, align your Networking Strategy to your Cloud Strategy as well as your organisation’s Strategy!

Disclaimer

Cloud transition and Strategical (mis)alignment

It has been almost two decades since the public cloud started to make an impact in the technology industry. It also has been more a decade since large enterprises started adopting cloud for mostly IaaS use cases.

There are several reasons every organisation wanted to transition fully or partially into Cloud, those can be summarised into:

Agility
Uplift the scalability
Uplift the service level
Uplift the security
Initial and Running Cost [Really? yes, please be patient to read further]

These reasons don’t make any sense without any business context such as:

New business opportunity
Establishing and running a Startup business
Expanding into a new region
Operational excellence
Reducing business risks

Ultimately everything we do including Cloud transition should advance in the direction to provide a business outcome that’s perceived by business owners in the first place before they agreed for the technical team to move to Cloud.

We could have moved into the best Cloud provider in the world, we cloud have transitioned everyone to adopt Scrum, we cloud have transition everyone to be a DevOps person but if we didn’t provide a business outcome, what we have done so far cloud be called a “debt” for business, yes, a massive technical and process debt!

Success story

Let’s take a look at a success story where clearly defined strategy and direction potentially saved thousands of lives. This story is from a small Pacific Country called “New Zealand”!

Before Covid-19 has been accepted by WHO as a global pandemic, NZ decided it’s going to be the case and let’s come up with a strategy with no vaccines possible for at least another year. The vaccines are one way to eliminate Covid-19 so how are we supposed to eliminate it without it! But NZ decided to clearly call their strategy as “Elimination Strategy” this made everyone including myself think, it’s impossible without vaccine as this was the only successful solution for an Elimination Strategy.

But the NZ government didn’t stop there, they have done all groundwork to “Go Fast” and “Go Hard” whenever there is a community case. Now not only me, but the entire world knows their achievement again and again which is basically Elimination of Covid-19 without even needing to look for a vaccine!

Once the vaccine becomes available, it could follow the same strategy which is Elimination but the strategy never changes nor the “Go Fast” and “Go Hard” approaches.

Strategy for a successful Business Outcome

Based on that success story you can see the strategy shouldn’t change based on the solution as the great strategy should always focus on the final outcome and there cannot be multiple!

Let’s get back to our on topic, if an organisation’s strategy is “operational excellence” and approaches are going to be “operational efficiency”, “streamline processes” and “continuous improvement” then it’s the same across the entire organisation and doesn’t matter whether you are an expert in Cloud or on-premise. This way, everyone in the organisation is expected to perform their best to align with this unified strategy and bring alone the best solution that might or might not include Cloud.

As a Cloud expert, if you have faith in your Cloud provider and the ability to integrate the Cloud environment will drive a successful Cloud transition that still aligns with your organisation’s Strategy, then the Cloud transition could be a solution. If your chosen Cloud provider doesn’t provide what you wanted to align with your strategy, you just have to pass that feedback to your Cloud provider and move on as you are still expected to deliver to the unified strategy no matter whether or not your Cloud provider is ready.

The same applied to on-premise, if everyone busy managing servers and no one has any time to focus on “operational excellence” then it’s the problem with your on-premise setup, not an issue with your strategy, move on trying to eliminate one problem at a time and work with your counter-part on Cloud side whether they can help to eliminate your problem and remember everyone has to work towards “one” strategy!

Now let’s take a look at some example strategies for the entire organisation.

Operational Excellence

It’s worth understanding the background and context behind why an organisation decide Operational Excellence as their strategy as this could be for several reasons.

Reducing the number of staff
Provide a better experience to customers by improving the service level
Ability to introduce new services faster than your competitor
Have acquired another company, so the operations should be streamlined across the new companies.

Now let’s imagine you are tasked with leading the Cloud transition, every single decision that you take should be aligned with this strategy and able to address the original reasons for which the strategy was born.

Just because you are tasked with leading Cloud transition, you don’t have to define another strategy such as “Cloud Only” as make your Cloud provider happy and may not be even aligned with your organisation’s strategy. Also decommissioning your data centre may not provide Operational Excellence if you don’t have sufficient skilled staff to manage services once all of them are transitioned into the Cloud.

As a lead of Cloud transition, you could look at defining the SLAs for each service that aligns towards Operational Excellence and if your Cloud experts and Cloud providers can help, then it’s great, otherwise, move on with alternatives.

The initial and running cost of any IT landscape is also a factor in Operational Excellence, just because Cloud from the surface looks cheap and your Cloud Provider offers a massive discount, doesn’t mean you have to migrate everything to Cloud as it’s would be more aligned with your Cloud provider’s Strategy and may not be aligned with your organisation’s Strategy.

But if the services are designed 4 cloud including taking operational cost into account, then it’s definitely a good candidate to transition into Cloud. Normally IaaS not going to be cheaper in Cloud than on-premise, but with their PaaS and SaaS offering, especially for your non-production IT landscape, the cost savings can go to the next level!

De-risk

This is a really complex one and really need better focus from every leader in the organisation to achieve a better outcome.

Risks from competitors
Risks of not meeting customer expectations
Security Risks
Risks of not meeting regulatory and other compliance attestations
Ultimately risk of losing the business

As this is quite a large area and potentially there would be so many different reams to help out reducing each type of risk.

If the risks are from competitors about how soon we can provide better services before our competitor does it, then if the same can be achieved within Cloud rather than on-premise, that’s a good case for Cloud transitioning the affected services.

If it’s about Security Risks, the people who are leading in this space should have a clear understanding of the strategies that are adopted by the organisation, just because we have reduced the risk doesn’t mean we have to slow down on everything and just focus on security risk as it may slow down or even block initiatives towards operational excellence. It’s worth understanding the overall security posture, areas where we need to put our effort and areas, that could be outsourced and moving cloud may fall intoin to that category. (Ref cloud-is-middle-tier-in-3-tier-infrastructure)

One of the risks of regulatory and compliance could be that of control framework failures if it’s a large organisation and if the on-premise environment expanded beyond and due to staff limitations and other reasons, we cannot demonstrate the successful implementation of controls framework and if that could be easily achievable in Cloud, then definitely it’s a right purpose for Cloud transition.

The last risk is any organisation’s worst nightmare, yes, the risk of losing the business! If there is no BCP process should they lose the on-premise environment and if Cloud could provide a BCP process by providing DR capability, then it’s a definite alignment towards your organisation’s strategy.

Expansion

There are several reasons why a business decides to expand, this could involve expanding to a new region, business or expanding partnership.

Normally expanding into a new region involve a separate instance of the entire IT landscape and it’s a good candidate to see whether Cloud transition the right approach. In this scenario, there are more downsides to on-premise than Cloud as you don’t want to build a new data centre just because your business decided to expand the business into a new region. But having said that if your existing data centre has sufficient capacity and provisioning and managing infrastructure is fast and efficient than in Cloud, then Cloud transition is not required.

A new business may or may not come with a new company, if it does require a new company, then it’s almost seen as a startup and it’s really hard to find any valid reason not to go for transition into Cloud unless you can match the same with your on-premise and use the same people to manage new companys’ IT landscape. But worth considering the pros and cons of making use of the same people to look after more than one company’s IT landscape as you don’t want your strategy driven by the limitation of your existing staff!

The expansion of your partnership may or may not involve extending access capabilities of your services to your partners, which could be a primary driver for Cloud transition. Certainly, you won’t give open access to any partner into your internal network, but you could provide access to particular services and it’s worth considering whether providing those services via internet or VPN to partners and perform strong identity-based access controls to let them use your services.

Summary

A clearly defined unified, well communicated and well-understood strategy can take your organisation to a level that’s seen as impossible otherwise. If a team of 5 million in NZ can work towards one strategy, why can’t all of your staff work towards one strategy!

Next time when you do the following, ask yourself how this is aligned to our strategy and:

VM in on-premise, who is going to manage this?
VM or Service in Cloud, who is going to manage this?
A Private IP address in Cloud, how your customer is going to access it?
New Security tool, how it’s going to reduce the security risk?
New Strategy, do you really need another one?

All of you are experts on what you are doing and everyone is special in something, all you have to do is to ask the right question at the right time and apply your common sense. It’s that’s simple!

Disclaimer

Cloud is middle tier in 3-tier Infrastructure

In my previous blog take-incharge-of-your-public-cloud, I have compared the analogy of building a house in the wild with building an application in the cloud.

You might ask, why we put all those efforts into building a house in the middle of the jungle unless you wanted to manage wild animals, enjoy the wild nature and enjoy managing abundant wild land. Yes, you are right, we don’t have to move into Wild for the sake of moving into Wild, hang on, am I saying you don’t have to move into Cloud for the sake of moving in Cloud?

Based on your lifestyle, you may invite wild visitors or visitors from the city, based on who you invite most, you may choose to be in the wild or city. If Cloud is like Wild and if the internet is like wild visitors, are we saying unless we have internet visitors we should choose to deploy our applications in on-premise?

Let’s get the real facts into the Cloud journey and be upfront about the challenges. If you have so many visitors to your application from the internet, then Cloud could be a better place as it’s close to the internet and you don’t have to deal with all internet security yourself in your on-premise data centre. But if you have so many visitors from your internal network and if you still chose to deploy your internal applications to the cloud then you have to protect your cloud environment from wild internet as I mentioned in take-incharge-of-your-public-cloud.

Public Cloud providers have improved internet security so much in fact these days large scale DDoS attack is a business as usual for them. But other companies protecting their own internal network from the internet have a long way to go in order to get protection from modern sophisticated cyberattacks and most of the companies never able to catch up with modern-day cyberattacks. So it’s worth asking yourself, do you really want to expose your internet network to the public internet and then put all your effort into building protection for it?

These days, building private circuits from on-premise to cloud or another on-premise environment become a standard pattern even for small businesses, so if you have a private network in Cloud you don’t have to traverse the wild internet to connect from your on-premise network.

We have spoken about 3-tier applications for more than two decades now, it helped us to scale from hundreds to thousands, but not much further, which requires modern decoupled architecture.

But have you ever heard of 3-tier infrastructure, unless someone else introduced it before you read this blog! If you have all of your users on the internet and if they have to access valuable assets in your on-premise infrastructure, your cloud infrastructure become your middle-tier infrastructure!

In this architecture, you will have wild internet traffic coming into your public cloud environment, you apply for all cloud-native edge protection as it arrives, route the traffic to your internal cloud network in the same public cloud provider and finally via private circuits connect to your on-premise network without traversing to wild internet.

Yes, I hear you, this sounds like a solution, what’s the actual problem that we are trying to solve? I’ll ask the other way around, what’s the problem we cannot solve in this way?

Do we really want to protect our on-premise network from the internet?
Do we really need to find a solution for staff to access applications anywhere they want, if they can all of them via the internet using story identity protection?
Do we really want to manage complex network connectivity and network controls to allow applications across on-premise and cloud internal network, just because we wanted to allow internet via on-premise and part of the applications are getting migrated to the cloud?
Do we really want to slow down cloud migration, just because we are busy protecting the on-premise network from the internet?

In other words, it will change your entire cloud migration strategy and create a single focus across your IT department. When you have internal facing applications and external-facing applications in a mix and all need to be accessed by a single user your IT landscape become so large and in fact, most of your IT department will be busy supporting your on-premise internal applications and have no time to focus on cloud migration.

All public cloud providers have SaaS and PaaS offering that will be using cloud providers own internet and backbone without any traversal to on-premise or internal cloud network. If that’s not possible, you can still have a public network in the cloud, use cloud providers native internet edge protection capabilities and finally route to your internal cloud network, without managing any infrastructure for edge protection.

In the modern world, the network boundary of the user cannot be defined, so the entire world moving towards strong identity and multi-factor authentication instead of relying on network controls. This is a key criterion for your application to be exposed to the internet in your preferred cloud environment.

If you are using Active Directory, with Windows 10 (latest release), you can directly domain join your devices to Azure AD without choosing to hybrid domain join that comes with it’s own complexity!

This is easily said than done, but it needs binding from your leadership team, especially opening up all your applications to the internet so all of you can focus on moving the applications to the cloud and only provide access to your applications via the internet and use the cloud as middle-tier infrastructure should you apply any edge protection or other network controls or inspections.

We had and some still having lockdown, so whether we like or not we are accessing all of the applications remotely and some via the internet, this trend is going to continue to grow as we embrace working from home become the new norm!

In summary, let the public cloud providers deal with the threats from the internet, that reduces your network boundary significantly, so you can focus on your own applications and it’s cloud migration journey, which is more aligned to your and worlds strategical position. Yes, more than anything you focus on what you can do best and let others do their best in the area they are already experts at! If you are ever successful in fully migrating everything to the cloud, then you can say, now I got 2-tier Infrastructure! In other words, we are still aligned and no accumulating technical debt in investing in on-premise infrastructure that’s later supposed to disappear!

Disclaimer

Cloud Migration – Take off, Autopilot and Landing

Airline Industry

A to B: Travel / Migration

We are living in a pandemic world where travelling in flights become the past and only appear in your dreams. Say in one of such dreams you are taking a long haul flight from London to Los Angeles, which normally takes 11+ hours just on the flight if all goes well. Yes, all must go well otherwise the respective airline will be grounded, which creates a demand of close to 100% reliability and with no SLA breach.

If you consider yourself and your baggage as payload, reliability and SLA requirements to deliver yourself and your baggage from A to B is almost non-negotiable to be perfect.

Is that really easy?

NO.

There are lots of ground work before an aircraft can be made available for take off, there are lots of process to onboard a passenger and their baggage to aircraft and really complicated readiness checklist before take off and even complicated and risky maneuvering by pilot while take off.

The challenge doesn’t stop there, in flight even though most of the time the aircraft will be flying in autopilot mode it must detect any anomalies and bring it to pilot’s attention for their intervention, so the autopilot system must be absolutely perfect. This system is an outcome of hundreds of thousands of hardworking people and satisfying hundreds if not thousands of compliance requirements and continuous and regress testing to make sure there aren’t any defects.

If that’s not challenging enough, landing is probably even complicated especially coordination with the control tower and making sure all the checklist are done in the right sequence and all these are done by handful of pilots and control tower staff. This followed by arrival process, which is again process intensive but works all the time.

There are few key observations here, that airline industry have zero tolerance for mistakes, but still they have automated where their staff have to work long time yet tirelessly and with same level of attention (i.e. pilots) and still not trying to automate where their staff need to make an intelligent decisions, work effectively but most importantly meeting all compliance and regulatory obligations from various authorities.

Let’s come out of the airline industry, you would be wondering why I am talking about flying in a Cloud Blog!

Obviously no one would say that the Cloud migration is complicated when compared to sending a passenger and their baggage from A to B. But have we achieved near 100% reliability and almost certainly meeting the SLA in terms of Cloud migration?

Takeaways from airline industry

All processes are streamlined for departure, arrival and aircraft manufacturing.
Everyone involved are working towards one outcome, which is delivering the passenger from A to B with their baggage.
Where automation is a must and possible then it’s automated, but where automation is too complex and more accuracy is required, the streamlined manual processes are put in place.
Throughout the journey what works hard are computers and other equipment in both airports, aircraft itself and it’s software systems, but NOT people.

Cloud Migration

Let’s consider each passenger and their baggage as a workload that needs to be migrated to A (on-premise) to B (Cloud) with certainly reliability if not 100% and agreed timeframe (i.e. SLA).

Obviously all passengers and their baggage are almost same where as each workloads has their one unique requirements, so that creates a unique challenge to Cloud migration that the airline industry doesn’t have to handle.

But not all people can travel and before anyone can travel, each person has to go through a process to qualify for travel, same applies to workloads.

Let’s get in to each stage of the migration process.

Take Off

Workload Readiness

Before a person arrive at the departure airport, the person needs to be ready to depart with compliant baggage otherwise the person won’t be allowed.

The same applies to each workload, before each workload can be migrated, the workload needs to be ready with the right level of treatment required so the workload is Designed 4 Cloud and ready for Cloud.

Tooling and Cloud Readiness

Also on the same note, no one is allowed if departure or arrival airports are not ready or if the flight is not ready. Once the passenger is checked in, they should be able to reach the final destination without any delays.

For a successful migration of a workload to Cloud, it’s not only the workload that needs to be ready, but also the tooling required to migrate the workload, cloud environment that’s going to run the workload once migrated and the crew which is going to look after the workload in Cloud.

Migration Focus

There is one thing standing out in the airline industry is that the focus on getting the passenger and their baggage from A to B safely with minimal or no delay.

So when it comes to a workload migration to Cloud, everyone must be focused on migrating the workload on a timely and sustainable manner.

Lessons from Airline to Cloud on Take Off

The workloads that needs to be migrated, must be carefully analysed and must go through a cloud migration treatment including designing the workload for cloud before it can be considered for migration.

Here the usual challenges are with COTS and legacy workloads as they are never designed 4 cloud. My blog on enterprise-cloud-transformation-a-recipe-for-successful-strategies might help for further reading on this.

Readiness of the migration tooling and cloud environment is normally a chicken and egg problem for first few workloads as it all depends on how funding works within each organisation. It’s best to get the base level of migration tooling and foundational cloud environment before considering the first workload. Very strong Cloud strategy, Technology Leadership and Technology funding is required as until a workload is considered to be migrated as there will be no business units will be ready to fund it.

It’s really critical for first few migration to get alignment with technology and business, worse outcome could be where technology spending all technology funding on building migrating tooling and cloud environment and business spending all funding on on-premise to either remediate risks or add more services!

Now it’s good time to talk about the outcome, yes, it’s really tricky when it comes to Cloud migration but unfortunately, this one thing that needs to be aligned from CEO to any layman, that the outcome is clear. Such as migration of workload to cloud or out of data center by certain date etc.,

Autopilot

Assuming the workload is allowed to enter Cloud migration (i.e. take off), the actual migration itself is intense and time consuming operation, same as aircraft’s navigation after take off.

So this is where, there must be focus on automated processes to migrate the workload from on-premise to cloud so it can land in the new cloud environment.

The automation also must consider human intervention where it’s required to make an intelligent decision same as how autopilot system alerts the pilot with manual override.

Also the automation must have regress automated testing to make sure it’s always functional without any defects.

Here again, the focus must be given towards the workload in question, the crew who is going to manage once it’s in cloud and the target cloud environment rather than preferred technology or language. Of course there is always preference for one technology over another, but it must have the same outcome as rest of the organisation.

Lessons from Airline to Cloud on Autopilot

Develop the automation with sufficient automated testing and alerting to get human intervention where the automation cannot make an intelligent decision.

Let the Cloud provider and their technology to work hard for the automation as far as the crew who is going to manage is happy to adopt it and efficiently run with it.

Don’t let the people to build complex, heavy and unmaintainable automation also on top, with a technology that doesn’t suit for the cloud provider or the crew who is not familiar with.

Landing

The moment of truth, yes, regardless of whether it’s a passenger or a workload, this is the time for calibration if all goes well.

But it’s easily said than done. For a successful cloud migration to go-live in cloud there are lots of coordination required with various stakeholders and business just like how the pilot coordinates with the control tower.

Certain times, the control tower wouldn’t give permission for the aircraft to land and the aircraft end up rotating in air until the clearance is issued from the control tower or the fuel is about to run out, in which case it would be diverted to another nearest airport.

The same analogy applies to a workload to go-live in Cloud, if the Cloud environment is not ready then the workload has to wait until the Cloud environment is ready or the budget runs out for the particular workload.

In case if the Cloud environment is not ready before the budget run out, the workload ends up in unfortunate situation until further budget is available or may not even be attempted to be migrated as business would have moved away from the particular workload/application.

Lessons from Airline to Cloud on Landing

In traditional world, go-live can have rollback as a strategy, but as you can see there is no rollback when it comes to aircraft and this also equally applies for Cloud migration as well.

Not that the rollback is impossible, but once it’s not migrated on the scheduled day/time, it probably never going to be migrated for foreseeable future, so better be prepared for close to 100% success with a roll forward strategy, not a rollback strategy.

Disclaimer

Public Cloud Adoption

The first step in entering a public cloud will be creating the initial construct, which is mostly referred to as a subscription created in the name of your organisation regardless of the size of your organisation or regardless of the cloud provider.

A subscription bounds your organisation’s business email, company name, pricing tier, support level and payment method with normally a monthly recurring billing arrangement until you decide to cancel it.

Startup

This model normally provides free credit to be used with the first subscription and does not limit the usage, hence this model is normally well adopted by R&D institutions and startups.

For startups, the critical factor would be the budget more than anything, so the first thing or first guardrail for any startup is to set up a budget before starting using any cloud subscriptions. You need to be mindful as normally the budget is only updated once a day so spending within a day will have to wait for the next day to understand burnout on the budget.

All the public cloud providers have this subscription construct or account available with free credits to motivate their cloud adoption among developers and startups. On top of this, some cloud providers also provide additional credits for eligible startups case by case based on the startup scenario.

Small business

For small businesses, the scenario will be added complexity compared to the startup scenario as small businesses normally would have applications running on-premise or on office computers unless they are completely paper-based!

From the surface, it almost looks like a startup scenario, but it’s way more complicated to manage than a startup scenario, which unfortunately most people realise only after migrating one or two workloads into the cloud.

Managing workloads in the cloud is completely different from managing workloads on-premise from skillsets, cost, provisioning and risk. For small businesses, their IT, finance and security will mostly have one and best chance two in each area.

So ideally all those people need to be upskilled and be prepared to take up new workloads in the cloud otherwise after migrating one workload they all will end up spending most of their time managing one or two workloads in the cloud.

It’s worth considering your business outcome and do a SWOT analysis based on doing the same thing on-premise vs cloud to understand where you have more likely to get closer to your business outcome.

A business focus MUST be on how to reduce the cost while increasing the value and aligning with strategic outcomes

Cloud is promoted more for speed and agility, it’s only effective if you can sustain all support aspects on the cloud together with everything else you have on-premise. So be very clear about your business outcome and see whether cloud migration is going to get you the business outcome that you desire, do NOT expect your desired business outcome just by moving in to cloud.

Enterprise

You can imagine an enterprise as a collection of multiple small businesses based on a product or value stream, yet IT, finance and security are centrally managed to drive standardisation and cost-efficiency. Although it’s an understatement you can now clearly see the order of magnitude in terms of complexity in migrating even a single enterprise workload in to cloud compared to a small business.

But regardless of whether it’s a small business or large enterprise, everything that we do including cloud migration MUST be aligned towards the desired business outcome. This is easily said than achieving it, as includes so many factors and a very good macro-level understanding of the big picture across technology including cloud, finance, operations and market proposition.

The CEO would have nominated CTO, CFO, COO and CMO to drive the business outcome and steer the company in the strategic direction. Once it’s get translated into each of these CXO roles, they will start looking at their own area and it’s very common from there each one of them may start driving outcomes that they think will help the overall business and strategic outcome of the company, not necessarily all aligned to each other.

Our workload in question here is Application A

At Enterprise-scale, it’s required a clear strategy on how you want to set up your Public Cloud subscription. The same is expected to be any Public Cloud offerings to provide the construct that can manage all these different viewpoints and still make sure the overall outcome is managed so that is always aligned.

The enterprise demands centralised management and dashboard view of cost, technology and application footprint, operational and governance and last but not least the entire product portfolio and significant features visible across your Cloud deployment.

This was so hard to achieve even for major Public Cloud providers like Amazon and Microsoft took several years to get to where they are now as of the time of writing this blog. Google is slowly getting there to support enterprise-level organisation structure and the rest are still far behind.

This is to just highlight the fact that enterprise adoption of Public Cloud is not only evolving within enterprises but also the cloud providers rapidly introducing new constructs and features to fast track it.

The Public Cloud construct for enterprises and enterprise cloud strategy are their own separate topics and will be covered by other blogs posts.

Take Incharge of YOUR Public Cloud

When you read this title may make you think, the blogger is coming from a traditional on-premise background or the blogger yet to understand PaaS and SaaS offerings and still talking about only IaaS offering in Public Cloud. For those who are not convinced with the title, let me take you on a short journey into completely something different.

Say that you are working in Cloud and earn a lot, so you wanted to build a house in an urban area where you have bought a small land and it’s also easy to commute to work. What would you do to secure your land? Probably put up a fence with one or two gates, so people only can enter via gates, of course, we are not talking about people who jump over the fence. Then you would know those who can enter your land via gate treated as more trusted than the people seen outside your land, so you don’t have to put a massive door for your house again!

Sorted: Your house protected by the fence then the door.

Now things have changed in your life and now you decided, you had enough with the computer screen and Cloud world, and decide to leave all and do some farming! Having said that, I am not reading anyone’s mind here! Obviously, your urban area land would not scale for farming. So now you go and find a massive piece of land in a very remote area. Once you bought the land, you found out it’s a hostile environment with the presence of all wild animals and poisonous creatures. The only reason you went for a remote area is because of scale, but unintentionally you have inherited additional risk in going into a remote wilderness area. Just because you have this massive land, would you go and build the house and farm straight away or would you make sure your area is secure enough before you can start any building work?

Land in a remote wilderness: How do you go about protecting it?

Hope that created some thought process around adopting public Cloud when you want to migrate from your managed data centres. Public Clouds were introduced to solve two specific problems.

Agility
Scale

In order to get both of these benefits without introducing extra risk, you will need to conscious build sufficient guardrails like what you would do in the farmhouse scenario before you can build and put your data there. This is exactly what I referred in the title as taking control of your public cloud.

In order to prepare yourself on this journey, you will need to understand your risk appetite and the controls you needed in the Cloud, so you do not increase your risk appetite by moving into the Cloud.

If you already have an established on-premise environment then you will also need to have a clear strategy for your hybrid model, as obviously, it’s now imminent it takes months and mostly years to migrate completely from your on-premise, but still, you will have your people accessing various services from your offices that also needs to be considered.

The specific areas around taking control of your public cloud and hybrid strategies are topics on their own in order to give meaningful insights with examples, so there will be other blogs here that take you to the next level if you are convinced to move on of course 🙂

Enterprise Cloud Transformation – A recipe for successful strategies

Recipe

Any great food requires a great recipe and great execution, through multiple transitions. The recipe might need adjustment if available local ingredients are different and in fact, local ingredients may be preferred by locals, but ultimately everyone expecting the same outcome, which is great food!

Similarly, for a successful Cloud transformation, it requires a great recipe with all required ingredients including local ingredients each with the right portion. As you know just like food, any mistake we make in getting the right recipe leads to complete failure and that’s no different in Cloud transformation.

Background

Each organisation is different in terms of the business model, technology strategy, customer expectations, industrial obligations and cooperate compliances.

Most of these organisations are facing challenges around digital transformation and IT modernisation, which become a by-product of Cloud Transformation.

This paper explores whether a recipe could help here if so how to come up with an enterprise cloud strategy that would abstractly fit with each organisation.

What’s in Recipe

Your business domain(s) and business model
Your technology domain(s) and technology operations
Your people and process
Your compliance requirements

These four are not an exhaustive list of everything in a recipe, but without these, the recipe would not be considered complete.

Recipe Item One — Your Business Domain(s) and business model

There are three aspects of the business domain that need to be understood.

1) The current state of the business domain and how it’s related to customers, partners, and suppliers.
Furthermore, if there are any contractual obligations associated with any of the above parties, industrial regulations, regional jurisdiction and global laws such as data privacy acts.

2) What’s the business strategy look like in 6 months, 1-year, 2-year and 5-year timeframes?
This is probably the hardest and one must engage from C-level managers all the way down to business SMEs to get a consolidated view of business strategy.

3) Understand current business problems, inefficiencies, and data quality issues.
The business operations people would be the right people to provide insight into this, approaching the right person, asking the right ‘why’ questions while showing appreciation would help to get a better outcome.

Recipe two — Your Technology Domain and Technology Operations

1) The current state of the technology domain and what infrastructures and applications are forming the critical part of the workload.
This is a static view of the current state to understand the critical part of the IT landscape as a minimum requirement.

2) How the technology strategy looks in terms of infrastructure, application, delivery approach and cloud.
This needs to have details on each technology choices for long-term use, otherwise, it needs to be established. The delivery approaches such as agile and lean also come into play to include the cloud strategy, so it’s not something to be ignored.

3) Understand the current IT operational issues and deprecated platforms.
The IT operations/DevOps/SysOps/application support would be the right people to provide insight into this, again the right person/question with the appreciation of their experience would provide a better outcome.

Recipe Item Three — Your people and Processes

1) Gather information about IT skill sets across the organisation subject to the organisation’s long-term strategy to retain the capability in-house.
Either we need to have people already skilled in the chosen cloud or people who are passionate about gaining experience, otherwise, it needs to be addressed first, before any further steps.

2) If the organisation is looking to outsource parts of IT or entire IT, then we still need to understand what exactly needs to be outsourced.
Understanding and capturing clear roles and responsibilities now and later is critical for any transformation, not just cloud.

3) Understand any macro-level drivers such as re-structuring in progress/planned, acquisitions and disposals.
All of these factors affect people, it’s critical to understand what people skills will remain and useful for the cloud.

Recipe Item four — Your compliance requirements

1) Understand any existing certifications and attempts to gain any new certifications.
Knowing these would help to understand the future state of compliance certification requirements for the cloud, this will also highlight any special security requirements other than standard security.

2) Any data related compliance requirements.
Including any prescribed location of data (at rest, in use & transit) and local/regional/internal data privacy acts such as GDPR.

3) Financial compliance such as operational budget and transparency.
Understand whether there are budget caps for overall operations, by subdomain, by business capability, and by business service.

For further reading on Cloud Strategy, have a look at Cloud transition and strategical misalignment.

Conscious Choice — Understand the worse case of not knowing all the above!

This is very important and if a business didn’t know or understand the consequences of a poor cloud strategy, it’s an IT consultant’s responsibility to clearly articulate the risk associated with not considering all of the aspects above, so the business can make a conscious decision.

For the business to understand, IT should be using the terms in a business subdomain, business capability, and business services, it may create a worse impact if we talk about a particular application with business.

An application may be used across multiple business units and a single business unit never going to understand the impact to other business units when we migrate the application to Cloud, using the terminology that business understand is critical.

In the cloud, we talk about workloads, which could be fine-grained business services rather than all the application from the entire on-premise data center.

If there is a reason that an application must be migrated in one go, then we need to understand the impact from all affected business units and business services.

Ingredients

Now you have a clear understanding of your recipe for the Cloud Strategy, let’s look at the ingredients to go with the recipe.

Cloud Provider(s) and location of data

1) The factors that may influence the selection of the cloud providers including cooperate partnership, technology strategy and required capacities are in the cloud.

2) For each business subdomain, business capacities and business services, we need to define where the data can be stored at rest.

Separation of interest for cloud accounts and data

1) Establish how the cloud accounts should structure considering the business domain, subdomain, business capabilities, environments and compliance requirements.

2) Establish data stores and their locations for all business capabilities and accounts based on data access/retention policies, data privacy/classification requirements and compliance requirements.

Understand or Establish architecture principles and standards

1) The principals must include data and technology and anything else.

2) The standards must include data classification, data application, and data ownership.

3) Microservices strategy based on the business domain.

Establish migration strategies for each business service or application

1) Using “The 6Rs” approach we need to establish a migration strategy for each business services and applications.
The 6Rs are Rehosting, Replatforming, Refactoring, Repurchasing, Retiring and Retaining.

2) Identify and catalog the datasets that need to be migrated with each business services or applications.
In a large enterprise migrating a business service or application may have an impact on other business services if they are not Microservices already and applications if they are shared across multiple business services.

3) Identify and catalog data classification and location for each business services and applications.
This is critical as they may include data that must stay on-premise, data must not allow certain regions, data must be encrypted at rest (provided or BYO key) and data that needs to be shared for wider use (limited or open).

4) If the business services or applications are going to be separated from the datasets required, then establish the data access strategies.
Here special considerations should be given for security and compliance requirements for data in transit and best to use APIs (provided by the cloud provider) than having direct access to data stores.

5) Identify and establish networking and inter-networking requirements with appropriate network access control and required bandwidth.
Here we don’t have to go down to individual NACLs or subnets, but identify the number of security zones required, how they are going to relate to in a network also across a network.

Establish policies

1) Data retention policy.
At the minimum, this will establish how long the data should be accessible via the application and how long the data should be downloadable/accessible from the archived storage.

2) Data access policy.
This would involve expanding data classification with aspects such as audit requirements, allowed to modified or deleted and versioning if modification allowed.

Summary — No single cloud strategy

All above covers a broader perspective to consider for a long-term enterprise cloud strategy that may provide benefit for business and technology strategies.

The level of consideration given to each of the above perspectives may differ from organisation to organisation and the level of maturity they are seeking.

These guidelines would help to make a conscious choice to come up with an enterprise cloud strategy.

This could be a joint effort between an enterprise architect, solutions architect(s) and cloud architect(s) “roles” with consultation from senior managers and others as required from business and IT.

The size of the organisation might determine how many of these “roles” could be combined into one person as not all organisation may have an individual for each of these “roles”.

The enterprise and solutions architect roles may need lots of people skills while the cloud architect role provides broader and deeper inside into cloud technology and the latest trends.

Last not least more than anything else IT’s primary function is to support the business, reiterating this in every solution would help to come up with the right size solution for each business needs.

Disclaimer: This article was produced in my own capacity; no association could be assumed with the organisations that I am helping at present or helped in past.