3. Unhindered access to the endpoints required.
The next principle is to get your traffic to Microsoft as unhindered as possible, this means minimising the work done on the traffic or ensuring any work done doesn’t cause an impact to that traffic. When moving to the cloud it’s worth having a rethink about how you handle traffic leaving your managed network, understandably the security model will often be to break and inspect all outbound traffic to ensure that the endpoint is something we want the users to be able to access, and that there is nothing malicious being bought into the corporate network. Whilst necessary, this inspection is computationally expensive and can impact performance. When visiting unknown websites for viewing, this impact is isn’t vitally important, but with SaaS services it indeed is.
The bulk of your cloud traffic will be user initiated to the cloud and that is what this section is referring to, where the traffic is to Microsoft endpoints and will account for a large volume of the traffic flow. For inbound flows, (Microsoft initiated) it is understood that a higher degree of inspection/control may be desired, the volume of traffic in this direction is also relatively low. Also there are elements of the service (those which we can’t/don’t provide IP addresses for on the URL & IP page) which don’t reside within Microsoft’s infrastructure (mainly CDN endpoints for non-customer data such as scripts/images or help/additional content). It’s also understood that for these particular endpoints, it may be desirable to route these via a standard proxy for example, as they are not directly managed by Microsoft and thus more controlled access is required.
When there is a move to SaaS services, numerous connections which were once kept within the corporate network in their entirety, now traverse the egress equipment to reach the endpoints in the cloud. As noted, this is often a large volume of traffic which puts terrific load on this standard egress equipment/method, which is often used as it is the default method to egress to the internet and thus is performing break and inspect on all traffic.
Here is the question we need to ask though. Does the same policy which applies to unmanaged and unknown endpoints on the internet, need to apply to known, managed and trusted endpoints used for business-critical services? The answer is hopefully no. Services like Office 365 should be treated somewhere between how you treat internet traffic, and how you treat traffic to your on premises datacentres (which usually has no security applied). Where exactly in-between depends on how much you trust Microsoft but it’s certainly worth assessing whether what you’re doing at the edge is already done, or can be done in the backend, and thus is unnecessary inline where it can cause performance issues. Data Loss Prevention can be done in Office 365 itself and AV/anti malware scanning is also done by Office 365 so isn’t required inline. We’re not going to unknown and untrusted/unmanaged endpoints with Office 365 like we are with the internet traffic, we’re going to known, business critical, managed and security controlled endpoints. Microsoft Trust Center holds all the relevant information on how we secure your data in the cloud.
Therefore, Microsoft strongly recommend that SSL interception is not performed on Office 365 traffic going to Microsoft owned endpoints. Any other endpoints, if interception is done, it should be carefully managed so as not to cause a bottleneck. If SSL interception absolutely must be done, the devices doing this should be scaled up considerably to reduce the bottleneck this will inevitably cause, this can often be very costly given the volumes of data SaaS services entail. Our support statement on traffic inspection devices is here.
This brings me to my next point, what is Microsoft’s recommended egress model? How do we recommend you get Office 365 traffic out of the corporate network and to Microsoft? There are essentially three main methods:
- Proxied Access
- Direct Routing
- ExpressRoute (which is essentially direct routing via another path)
There are distinct pros and cons to each method:
Proxied Access
This is a common method used to connect to the internet as it simplifies the connectivity process and allows a centralized device to control access and intercept traffic.
Pros:
-
Easy to Configure to get Office 365 connected
-
Often the existing internet access method
-
Small Number of IP addresses for clients to direct traffic to
-
Uses known ports for easy firewall traversal
-
No need to route external IP address on internal network
-
Easy monitoring/auditing
-
Provides a Security Barrier between clients and the internet
Some of these pros mean a proxy is the only way a customer can access the internet without major network redesign work. All Office 365 services will work though a proxy, even Skype for Business, however there are a large number of drawbacks to this method.
Cons:
-
Proxies generally do not handle UDP traffic & Skype traffic is therefore forced over TCP to traverse the proxy.
-
Skype’s coping mechanisms for poor networks are drastically reduced when TCP is used
-
The proxy functions can delay frames on their way through adding jitter and latency
-
Older proxies often struggle to deal with the long lived, high throughput connections SAAS services entail.
-
Proxies commonly alter TCP level settings which can cause performance issues
-
SSL issues can also occur as the proxy is a ‘man in the middle’
-
Often don’t scale and were not installed/designed with SaaS services in mind.
- Capacity upgrades to cope with the additional workload SaaS services will add in terms of Ports/Memory/Processing are likely to be very expensive.
-
End result is often poor-quality calls and performance
As you can see, there are some considerable downsides to standard proxies when it comes to SaaS services. Skype traffic is highly likely to run into issues via standard on premises proxies due to the use of the non-optimal (for real-time voice/video traffic) TCP protocol (as opposed to UDP) and the processing at the proxy layer is likely to introduce issues such as jitter. Imagine the load the proxy would be under handling thousands of real-time TCP media sessions during an all hands call for example, scaling up to handle this is likely to be very expensive and still run the risk of causing performance issues due to the protocol used. The other thing to bear in mind is that proxies were likely designed for access to transient endpoints, in that a TCP connection will be made to a website, the data will be obtained and then the session closed and the resources (memory, processing, ports) will be returned to the pool. SaaS services tend to work very differently however. Outlook as an example will open multiple TCP connections per users, and sit there all day with them in use, as such the resources aren’t returned to the pool as they would be with transient access and again, the devices need upgrading to deal with this extra load. We recommend around 2000-4000 clients per public IP address for network address translation.
Due to the high risk of these devices causing performance issues due to their design and role being for a different purpose, Microsoft recommends you don’t use these types of proxy solutions unless absolutely necessary. If there is no other option, or it is a very strong requirement to use proxies, the following advice should be followed.
• Ensure the devices are scaled up to cope with SaaS services, in terms of memory, processing and NAT capability
• Avoid overly centralized proxies which can increase latency
• Ensure they are in the local region of the client
• Evaluate Cloud Proxy nodes if the above isn’t possible
• Avoid packet inspection (i.e. SSL break & inspect)
• Ensure all settings are checked and optimized
• Avoid using Skype for Business through these devices unless they can bypass for UDP
Whilst not a recommendation for any vendor over another, Microsoft are working with various vendors such as Z-scaler and Bluecoat to help better align cloud proxy products to best practices for Office 365. Z-scaler for example have a button which automatically optimizes Office 365 traffic (e.g. disables SSL offload) for customers who use the service. Bluecoat have updated their firmware to bypass Skype traffic from SSL decrypting.
If a proxy is a requirement for your business, it’s worth checking that your current implementation is going to work well with SaaS services like Office 365. If not, then it’s worth talking to the vendors you choose to remediate this, around their alignment to the Office 365 (and cloud in general) connectivity principals discussed in this post, and ensure they are followed upon implementation.
A final point on proxies, they are absolutely a supported method for our customers to reach Office 365, however they are very likely to provide performance issues if not redesigned & uplifted from their old internet access design, to their new usage with cloud services.
Direct Routing
Direct routing would be similar to that which you have at home, a single TCP session is used to connect to the endpoint with (in most cases) the source IP simply being translated from an internal (e.g. 10.x.x.x) to a publicly routable one on the way out of the managed network. The egress device may also ensure that the destination IP and or port is also allowed. This means the endpoint connected to receives the request from the translated (public) source IP, but the client connects to the public IP address of that endpoint.
This method is generally the recommended way to connect Office 365 services if possible.
Pros:
• Allows direct UDP traffic meaning Skype can work at its best.
• Generally, no interference with payload at egress meaning optimal connectivity for all services
• Allows for local egress use in most cases meaning minimal latency
• Minimal work done on traffic means scaling (whilst still necessary for the volume of connections and network address translation) is less demanding
• Best connection method to Office 365 for most customers when ISP routing is optimal
Cons:
• Customers need to authorize Office 365 URLs/IPs and open required ports on all firewalls used (if controlled egress is desired). These need to be constantly monitored and firewalls updated with changes which can be challenge in large organizations and missing updates to IP ranges can cause connectivity issues. Having controlled egress by using the IPs for Office 365 also means endpoints where we cannot provide the IPs (such as CDNs, DNS, CRL lookups) have to be routed via another path with URL based or unrestricted access.
• Routing to the appropriate egress needs to be managed internally and needs to include external IP address routing (which can be an issue for some customers)
• External DNS resolution is required (which can be an issue for some customers)
• Devices still need to scale to the increased connection count needed for Office 365 services
Due to the efficient, low impact manner of egress, allowing connections to flow direct, using the protocol of choice, this method is the recommended method to connect your Office 365 services wherever it is possible.
ExpressRoute
ExpressRoute is private peering with the Microsoft global network described above. Essentially, it’s simply a private network connection from the edge of the customer network to the edge of Microsoft’s network (the same network you’d reach over the internet) avoiding the leg which the internet takes in connecting to Microsoft. This private network can carry some elements of Microsoft bound traffic via three types of peering:
1. Azure Private – Connecting to virtual networks in Azure (e.g. to private IP addresses on virtual machines)
2. Azure Public – Connecting to Public IP addresses in Azure (connections require network address translation)
3. Microsoft Peering – Connecting to a subset of Office 365 endpoints (the same public endpoints reachable over the internet path)
The first two types of peering are the recommended connection methods to Azure endpoints, they are relatively easy to configure requiring little change in the corporate network infrastructure. The third type of peering is the one we’re interested in with regards Office 365 connectivity, this type of peering requires authorization from Microsoft to enable, and this is for good reasons, however let’s look at the pros of this type of peering first.
Pros:
• We can provide a 99.95% SLA for availability
• Because it’s a dedicated circuit for Microsoft traffic and managed end-to-end, it can provide predictable performance and bandwidth
• It can provide better QoS capability than the internet path for Skype (All QoS markings are stripped at the edge over the internet path).
• It avoids the internet path for the bulk of Office 365 traffic, some organizations have a regulatory requirement for this.
• If configured such, it allows a customer to bypass network egress equipment doing SSL interception or other behaviour which may cause a bottleneck
• It allows Skype for Business to use UDP which is the preferred protocol for performance on real-time traffic
Cons:
• Good internet connectivity is still required for endpoints which cannot use ExpressRoute (e.g. DNS, CDN, CRL checks) As such if this internet pipe is not available, Office 365 is inoperable to a large degree
• A good internet connection can, in many cases, give a similar, or in some cases, better performance levels (for example if the internet peer point was closer than the ExpressRoute peer point).
• Often encourages the hub and spoke model for connectivity which runs contrary to the high-level guidance of local connectivity.
• Often a higher cost of implementation and usage than a standard internet connection. (This isn’t always the case depending on the locality and equipment upgrades required)
• Enabling this type of peering (Microsoft as opposed to Azure) is very complex and without what we see as typically 2-6 months of planning and work from a large cross skilled team, will very likely result in an outage of your Office 365 implementation
• If the direct method is used (i.e. non-proxied to the edge) then external DNS and IP routing needs to be available as with the direct internet path above.
• It is often possible to resolve performance issues quicker, easier and at a lower cost by isolating the issue and resolving it via using an unhindered, direct internet peered connectivity model or other optimized method.
Because of this list of cons, especially the complexity and high risk of outages if not correctly implemented, Microsoft have a review policy of requests to use ExpressRoute for Office 365, this is so we can discuss these pros/cons with the customer and ensure that all parties are aware of the 2-6 months of planning, extra complexity and what ExpressRoute can/cannot deliver. The end goal is that if the customer chooses ExpressRoute they have done so fully armed to make an informed decision that it’s the right thing for their business. Also, that they are aware of the guidance to ensure the implementation is a success and is going to deliver the desired benefits before spending the time, money and effort implementing.
We have a wealth of technical guidance which covers the implementation, routing, some training videos and some Ignite 2016 content which should give you a good overview of what is required to implement this type of peering. As you can see, this option is not for everyone, in fact a direct internet path connection is the best for the majority of use cases. However, if you think ExpressRoute might be the right thing for your organization’s Office 365 traffic, after reviewing the links here, contact your Microsoft account team for assistance in requesting a review for approval and we can then work to help you make the right decision for your business.
So, in summary, there are an array of ways to get your traffic to Microsoft’s network, direct networking generally works best to ensure unhindered and quick access, but if you must use a proxy, ensure it isn’t causing a bottleneck to your traffic and follows the principles outlined in this post, and finally ExpressRoute for Office 365 is not simple to implement and isn’t for everyone.
Next Up: Part 4: Local DNS Resolution