Firstly, there is a configuration issue toresolve. I notice you have the HDC node configured as “Read Only” in the Membershipstab. That’s correct. This is how you should configure a Read-Only member.
However, under Connections you havethe HDC member connection disabled. That’s a big problem! You are now in anunsupported state. You cannot use the connections as a way of“forcing” a Read Only replication… that is not what they’re for. Thisconfiguration is allowing OSK to talk to HDC, but not allowing HDC to talk backto OSK… even to give its replication state or other control information!
You are now in a precarious state becausedepending on how long this connection has been disabled, you could get intosome nasty problems once the connection is re-activated. Ned Pyle fromMicrosoft wrote an article on how to deal with this: . You will need to follow this article and follow the instructions to fixthis.
This is probably the cause of your issues andthe strange (and misleading) error messages.
As for diagnosing DFSR, there are multipleways to do it:
1.A replication report from within the “DFSManagement” snapin (it seems you already know how to do this)
2.Dfsradmin.exe (not great for diagnostics)
3.Dfsrdiag.exe (better for diagnostics)
4.Windows Event Log (under “Applications and ServicesLogs” à “DFSReplication”)
5.The DFSR debug logs inc:\windows\debug\dfsr*.log (this is very detailedinformation and a great place to work out weird issues). Note that the logs areautomatically GZIPed and rotated once they reach a set size (20MB).
For checking the replication speed, it’s bestto simply open Resource Monitor. Under the Network tabyou’ll see if DFSR is sending anything.
However,note that DFSR does a lot of stuff that doesn’t involve sending network traffic– it has to build its hashtables, check files, etc. Typically DFSR isextremely hungry for IO while doing the first hashing of files, and willsaturate the IO of any storage device it uses – be careful of this! Youshould watch it carefully in the Disk tab of Resource Monitor. If“Active Time” is near 100%, your storage may be showing latency for everyoneusing it.
Also note that DFSR is not configuredfor best speed “out of the box”. You need to make adjustments to it if you wantto run it at its highest performance. However these settings consume (a little)more RAM and (a lot) more disk IO. To enhance performance, apply all thehighest settings on .
To reduce disk load and optimize consistencyyou should also:
1.Disable 8.3 filenames on the filesystem (fsutil8dot3name set 1)
2.Set optimal MFT size (fsutil behavior setmftzone 4)
3.Set optimal memory cache usage (fsutil behaviorset memoryusage 2)
4.Install the Dynamic Cache Service to avoid ahorrible bug with caching (this bug also exists in Windows 2008 R2 andMicrosoft has provided us with the fix, a service: install from \\Domain.com\Distrib\Storage\DynamicCache Service )
5.Enable the USN Journal on all replicated disks (fsulusn createjournal fsutil usn createjournal m=5000 a=500 <driveletter>)
It is actually a good idea to perform steps 1to 4 above on all servers that handle a lot of network-basedfile access .
Additionally you should tweak all Windowsnetworking settings to high performance settings, according to the Windows 2008R2 Performance Tuning Guide (). Words cannot say how strongly I recommend this! Without these settingsyou will limit the performance of Windows NAS servers.
Finally you should tweak the network cardsettings. Ensure you have a number of RSS queues that match your CPU cores, andset transmit and receive buffers to their maximum possible settings. Review theother settings on the network cards to ensure you’re using the best possibleperformance options. Also ensure you have the latest drivers and firmware– this is especially critical for Broadcom and Intel cards, which both haveserious flaws with non-latest versions. Intel cards in particular will sufferrandom link disconnects using version 17.x of the PROSet drivers, and you mustupgrade to 18.x or newer to fix this.
After you’ve applied each of these fixes andsettings, review again to see if your problems are gone.
Finally, I notice that you’ve called yourreplication groups names “OSK -> HDC” and so on. This goes against ourstandard – the RG name should preferably be the pathname in DFS, so we knowexactly which paths are being replicated. For example we would use the name Domain.com\Folder\Test. You should only use another naming format when there is no DFS path forthe replication (e.g. “Teamlink” and “Synapse”). In these cases the owner andpurpose should be clearly identified in the RG description.