Fix the NLS website of course! The most common cause of this trouble is the SSL
certificate expiring on the NLS web server. NLS must be an HTTPS site, and therefore must have a valid SSL certificate. Often this cert is issued from an internal CA server to save costs, and so you don't normally get a notification when it is about to expire if
you haven't put it in your calendar by hand. Make sure you keep track of this date, and replace the cert before it expires. Of course any other normal IIS or web server problem that affects the NLS website can cause this same situation, and so it is recommended that the NLS be highly available when possible. To counteract this point just a little, I have had a couple of occasions now where there was trouble validating NLS when the site was sitting behind an internal load balancer. The website seemed to be functioning properly, it would pull right up in a browser, even from the DirectAccess client computers, but the underlying Network Connectivity Status Indicator (NCSI) detection mechanism (the NLS's under-the-hood parts) wasn't validating it, and so the clients continued to think of themselves as outside the network. In both cases, as soon as we pulled the NLS website out from behind the load balancer, everything started working normally. One other cause to this behavior that I have experienced is when using the default IIS splash screen as the web page for NLS. It is common when turning on an NLS server to simply setup a new website and use the default settings, including allowing the website to use the default splash screen as its default page. For
unknown reasons, in some environments this works fine, and in others NLS detection
fails completely. Simply swapping out default.htm with a brand-new default.htm that contains just some simple text will resolve the issue and get NLS up and running.
I enabled NLB and DA broke!
Say you have a DirectAccess server up and running. Everything is going great, you have all kinds of users connected, and you are just loving it! So now that the technology has proven itself, it's time to turn up another DA server and create a cluster so you can make this thing redundant. So you follow all of the guides, bring the new server up to best practice standards, add the roles and walk through the wizards on the primary server to create the Load Balanced Cluster. The wizard is going to ask you to specify some new IP addresses, so that it can commit your existing IP addresses to
be the new virtual IPs. That way, when you finish this wizard and the settings on the
server side are changed, you continue to utilize the same IP addresses on the public side, so that your clients can just continue connecting like they are right now and won't
have to come into the office for a Group Policy refresh before working. If the wizard
utilized a new IP address for the cluster instead of your existing IP, the client-side
GPOs would have to change to reflect the new IP information, and the connection
would break for everyone. So this is the reason that the wizard uses the existing IPs as the VIPs, and asks you to specify new dedicated IPs (DIPs) for the server to use itself.
Back to the task at hand. You walk through the wizard, specify your new IP addresses for the server to use, click on the Commit button, and everything says that it completed successfully. Great! Except the phone rings. Nobody can connect from outside. You check out the server and everything is green checks, it's happy. Everything is well as far as you can tell from the server. However, if you go to an external computer and try to telnet daserver.contoso.com 443 (which should be listening on the IP-HTTPS listener), you get a timeout. You head onto the DirectAccess server itself, and the same command results in a connection! So the
server is indeed listening, but the packets are not flowing from outside to inside. But they were just a minute ago. What gives?
This issue, believe it or not, is most likely being caused by your switches. It could
also be a firewall or router or even your datacenter's ISP equipment that sits
between your external clients and the external NIC of the DirectAccess server, but whichever device in-line is at fault, something in that stream of devices is not allowing the packets to pass. The reason is ARP cache. Many network devices are "smart" enough (when you are in this situation you aren't going to think of them as very smart though) to remember the MAC address of the NIC which has been accepting these packets from the external IP address. The switch does this to speed up communications. If it can remember where packets coming in from a particular IP address have to go, it can send them straight there instead of polling the network
for the right destination. When you configure NLB on a DirectAccess server, the
MAC address changes. Now that there are virtual IP addresses involved which are going to be shared by two or more physical hosts, the wizards swap out the physical MAC address that has been used up until this point with a virtual MAC address. When this happens, your switch or other networking equipment may continue to send packets at the old physical MAC address, and those packets will not be received
by the new configuration. So the server thinks it's listening, but the equipment in
The resolution
Clear the ARP cache. You may have to dig through some documentation or reach
out to your network guy to make this happen, but as soon as you flush the ARP
cache on those switches or devices, they will suddenly cache the new MAC address,
and packets will start flowing again. This issue is actually quite common, because
most shops that are big enough to have more than one DirectAccess server are also big enough to have "smart" networking equipment. If you don't have any smart networking equipment but the symptoms line up, contact your ISP. And don't let
them discount your theory! I have had to fight with ISPs more than once that this
was their problem, and that simply resetting the ARP cache in their router would resolve the issue. On one case, I am certain they cleared the cache just to get rid of me because I was being so utterly persistent, and then all I could hear were crickets on
their end of the line, because the traffic had magically started flowing as soon as they
cleared it. It's actually quite surprising the number of networking personnel who have no idea that this cache exists.