With some pain we recently uncovered a risk in using Observable.Timer(...)
in Reactive Extensions.
The system is a WPF real-time trading system, utilising RX heavily. Some of our users access the system via Citrix, and it was these users who first started experiencing the issue. Sporadically a thread pool callback would cause the system to crash with an unhandled exception. The exception being thrown from inside RX:
System.InvalidOperationException: Failed to start monitoring system clock changes.
at System.Reactive.PlatformServices.PeriodicTimerSystemClockMonitor.NewTimer()
at System.Reactive.PlatformServices.PeriodicTimerSystemClockMonitor.TimeChanged()
at System.Reactive.Concurrency.ConcurrencyAbstractionLayerImpl.PeriodicTimer.Tick(Object state)
at System.Threading.TimerQueueTimer.CallCallbackInContext(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.TimerQueueTimer.CallCallback()
at System.Threading.TimerQueueTimer.Fire()
at System.Threading.TimerQueue.FireQueuedTimerCompletion(Object state)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Being on Citrix I started hypothesising that this might be some kind of CPU load issue. Then we had another “normal” desktop user encounter the exception. Her application kept crashing and upon investigation I found another process on her 4 core box taking 25% CPU. Killing that process-which seemed to have gone rogue-suddenly cleared our application of the RX error. So again, it pointed to CPU bound slowness.
The actual RX source which throws the exception can be seen here, inside the NewTimer
method.
If you use Observable.Timer(...)
RX will internally create a new PeriodicTimerSystemClockMonitor
instance and subscribe to its SystemClockChanged
event. And this class creates a timer to raise the event. Everytime this timer calls back into TimeChanged
a time threshold is checked. If the callback was too slow, a new timer will be generated within NewTimer
. Again, the length of time it takes to create this timer is checked. If the SYNC_MAXDELTA milliseconds threshold is broken, another attempt is made. The loop allows for 100 attempts. If after this time the timer hasn’t been created within the allowed time threshold the offending InvalidOperationException
is thrown.
At which point your application will die.
It took a few attempts of reading the code before I understood this; it is not immedietely easy to get this set-up in debug. I wanted to add some logging to give us an idea of how slow the timer recreation was. To achieve this I had implement a custom IPlatformEnlightenmentProvider
(and set via: PlatformEnlightenmentProvider.Current = new CustomPlatformEnlightenmentProvider();
), which unfortunately involved some reflection so I could instantiate existing classes internal to RX. But now I am able to get RX to use a ConfigurablePeriodicTimerSystemClockMonitor
where I can get some logging going and experiemnt with the time thresholds.
The two classes are at the end of the article. The important thing is I can see when RX decides the callback is too slow and that a new timer is required. And then, when the exception is finally thrown inside NewTimer
, I can see how slow the timer creation operation was:
if (n >= _syncMaxRetries)
{
_logger.Warning("Failed to start monitoring system clock changes (Configurable Impl). diff={0}", diff);
throw new InvalidOperationException(
"Failed to start monitoring system clock changes (Configurable Impl).");
}
With all this in place I sat back and waited for the crash reports to come in. The first exception report looked like this:
Failed to start monitoring system clock changes (Configurable Impl). diff=489.0552
So in this case, a call which RX wants to happen within 10ms is taking nearly half a second. As of writing I have questions outstanding with our Citrix team, CPU usage/spikes/load, mem etc. I am also pondering other things: did a gen2 GC collect kick in? What other process are running? Do we also have a memory leak? etc etc. Perhaps clutching at a few straws :)
So what’s next?
I’ve got options, mainly two, but I don’t particularly like either:
- Make the time check much more relaxed. But how relaxed? There is no line in the sand here…
- Remove all usages of Observable.Timer. And stand guard over every developers’ check-ins for the application’s lifetime?!
Right now we are still contemplating our next step, collating information. I’ll try an update this post with our final approach. I will also try and solicit a response from the RX team for their opinion on this.
UPDATE: Solution From Microsoft
Kudos to the RX team at Microsft who very quickly came back with a possible solution. There’s a couple of gotchas, but its very workable. The suggestion is to “create a custom system clock monitor using SystemEvents.TimeChanged“.
So we rely on the Win32 event to notify RX of changes to the system clock, instead of the more complicated PeriodicTimerSystemClockMonitor
which has all the time checks and constraints at the root of the problem we are seeing.
There’s a couple of small issues:
- We will still have to use our custom
IPlatformEnlightenmentProvider
, which uses reflection to instantiate RX internal classes. This leaves us slightly vulnerable to future re-workings to the innards of RX. UPDATE: dealt with; see UPDATE 2 below.
- The solution requires a message pump (which apparently is why RX doesn’t use this method by default). Luckily for us, being a WPF application, we have one.
So I’ve created a new class:
public class SystemEventsSystemClockMonitor : INotifySystemClockChanged
{
private int _refCount;
private EventHandler<SystemClockChangedEventArgs> _systemClockChanged;
public event EventHandler<SystemClockChangedEventArgs> SystemClockChanged
{
add
{
AddRef();
_systemClockChanged += value;
}
remove
{
RemoveRef();
_systemClockChanged -= value;
}
}
protected virtual void OnSystemClockChanged(SystemClockChangedEventArgs e)
{
var handler = _systemClockChanged;
if (handler != null) handler(this, e);
}
private void AddRef()
{
if (Interlocked.Increment(ref _refCount) == 1)
{
SystemEvents.TimeChanged += SystemEventsOnTimeChanged;
}
}
private void RemoveRef()
{
if (Interlocked.Decrement(ref _refCount) == 0)
{
SystemEvents.TimeChanged -= SystemEventsOnTimeChanged;
}
}
private void SystemEventsOnTimeChanged(object sender, EventArgs eventArgs)
{
OnSystemClockChanged(new SystemClockChangedEventArgs());
}
}
Then inside our CustomPlatformEnlightenmentProvider
I can return it when asked inside GetService
. I haven’t reflected that change in the implementation listed below so beware, understand what’s happening, and choose your INotifySystemClockChanged
implementation wisely before using this in your system…
UPDATE 2: Slight improvement…
I’ve only just spotted that the RX factory method CurrentPlatformEnlightenmentProvider.GetService()
is marked as virtual. So I can inherit from this, override and provide my custom class where applicable. This resolves any issue I had regarding reflection. Now the enlightenment provider is pretty simple:
public class CustomPlatformEnlightenmentProvider : CurrentPlatformEnlightenmentProvider
{
public override T GetService<T>(object[] args)
{
var t = typeof(T);
if (t == typeof (INotifySystemClockChanged))
return (T)(object)new SystemEventsSystemClockMonitor();
return base.GetService<T>(args);
}
}
Happy days…
Customised RX classes, from before Microsoft proposed solution, which were used during investigation:
public class CustomPlatformEnlightenmentProvider : IPlatformEnlightenmentProvider
{
private readonly ILogger _logger;
private readonly bool _useCustomImplementations;
public CustomPlatformEnlightenmentProvider(ILogger logger)
{
if (logger == null) throw new ArgumentNullException("logger");
_logger = logger;
_useCustomImplementations = Session.IsRunningRemotely || Debugger.IsAttached;
if (_useCustomImplementations)
_logger.Info("Platform enlightenment will run ConfigurablePeriodicTimerSystemClockMonitor.");
}
public T GetService<T>(params object[] args) where T : class
{
var t = typeof(T);
if (t == typeof(IExceptionServices))
{
var type = Type.GetType("System.Reactive.PlatformServices.ExceptionServicesImpl, System.Reactive.PlatformServices", true);
return (T)Activator.CreateInstance(type);
}
if (t == typeof(IConcurrencyAbstractionLayer))
{
var type = Type.GetType("System.Reactive.Concurrency.ConcurrencyAbstractionLayerImpl, System.Reactive.PlatformServices", true);
return (T)Activator.CreateInstance(type);
}
if (t == typeof(IScheduler) && args != null)
{
switch ((string)args[0])
{
case "ThreadPool":
return (T)(object)ThreadPoolScheduler.Instance;
case "TaskPool":
return (T)(object)TaskPoolScheduler.Default;
case "NewThread":
return (T)(object)NewThreadScheduler.Default;
}
}
if (t == typeof(INotifySystemClockChanged))
{
return (T) (_useCustomImplementations
? new ConfigurablePeriodicTimerSystemClockMonitor(
TimeSpan.FromSeconds(1),
_logger,
syncMaxRetries: 100,
syncMaxDelta: 10)
: (INotifySystemClockChanged) new PeriodicTimerSystemClockMonitor(TimeSpan.FromSeconds(1)));
}
#if HAS_TPL46
if (t == typeof(ITaskServices))
{
return (T)(object)new TaskServicesImpl();
}
#endif
if (t == Type.GetType("System.Reactive.Linq.IQueryServices, System.Reactive.Linq"))
{
//
// We perform this Debugger.IsAttached check early rather than deferring
// the decision to intercept query operator methods to the debugger
// assembly that's dynamically discovered here. Also, it's a reasonable
// expectation it'd be pretty hard to turn on interception dynamically
// upon a debugger attach event, so we should make this check early.
//
// In the initial release of v2.0 (RTM), we won't have the corresponding
// debugger assembly available yet, so the dynamic load would always
// fail. We also don't want to take the price of (an attempt to) a dynamic
// assembly load for the regular production case.
//
if (Debugger.IsAttached)
{
#if NETCF35
var name = "System.Reactive.Linq.QueryDebugger, System.Reactive.Debugger";
#else
#if (CRIPPLED_REFLECTION && HAS_WINRT)
var ifType = t.GetTypeInfo();
#else
var ifType = t;
#endif
var asm = new AssemblyName(ifType.Assembly.FullName);
asm.Name = "System.Reactive.Debugger";
var name = "System.Reactive.Linq.QueryDebugger, " + asm.FullName;
#endif
var dbg = Type.GetType(name, false);
if (dbg != null)
return (T)(object)Activator.CreateInstance(dbg);
}
}
return null;
}
}
public class ConfigurablePeriodicTimerSystemClockMonitor : INotifySystemClockChanged
{
private readonly TimeSpan _period;
private readonly ILogger _logger;
private readonly SerialDisposable _timer;
private DateTimeOffset _lastTime;
private EventHandler<SystemClockChangedEventArgs> _systemClockChanged;
private readonly int _syncMaxRetries = 100;
private readonly double _syncMaxDelta = 10;
private readonly int _maxError = 100;
/// <summary>
/// Creates a new monitor for system clock changes with the specified polling frequency.
/// </summary>
/// <param name="period">Polling frequency for system clock changes.</param>
/// <param name="logger"></param>
/// <param name="syncMaxRetries"></param>
/// <param name="syncMaxDelta"></param>
/// <param name="maxError"></param>
public ConfigurablePeriodicTimerSystemClockMonitor(TimeSpan period, ILogger logger, int syncMaxRetries = 100, int syncMaxDelta = 10, int maxError = 100)
{
if (logger == null) throw new ArgumentNullException("logger");
_period = period;
_logger = logger;
_syncMaxRetries = syncMaxRetries;
_syncMaxDelta = syncMaxDelta;
_maxError = maxError;
_timer = new SerialDisposable();
}
/// <summary>
/// Event that gets raised when a system clock change is detected.
/// </summary>
public event EventHandler<SystemClockChangedEventArgs> SystemClockChanged
{
add
{
NewTimer();
_systemClockChanged += value;
}
remove
{
_systemClockChanged -= value;
_timer.Disposable = Disposable.Empty;
}
}
private void NewTimer()
{
_timer.Disposable = Disposable.Empty;
double diff;
var n = 0;
do
{
_lastTime = SystemClock.UtcNow;
_timer.Disposable = PlatformEnlightenmentProvider.Current.GetService<IConcurrencyAbstractionLayer>().StartPeriodicTimer(TimeChanged, _period);
diff = Math.Abs((SystemClock.UtcNow - _lastTime).TotalMilliseconds);
} while (diff > _syncMaxDelta && ++n < _syncMaxRetries);
_logger.Info("Took {0} attempts to start new timer.", n);
if (n >= _syncMaxRetries)
{
_logger.Warning("Failed to start monitoring system clock changes (Configurable Impl). diff={0}", diff);
throw new InvalidOperationException(
"Failed to start monitoring system clock changes (Configurable Impl).");
}
}
private void TimeChanged()
{
var now = SystemClock.UtcNow;
var diff = now - (_lastTime + _period);
if (Math.Abs(diff.TotalMilliseconds) >= _maxError)
{
var scc = _systemClockChanged;
if (scc != null)
scc(this, new SystemClockChangedEventArgs(_lastTime + _period, now));
_logger.Warning("Clock Monitor exceeded max error, initiating new timer. Vars: now={0}, _lastTime={1}, diff={2}", now.ToString("O"), _lastTime.ToString("O"), diff);
NewTimer();
}
else
{
_lastTime = SystemClock.UtcNow;
}
}
}