06766: | java.exe crashes on Windows XP (2000 as well?) |
Category: Container Status: ClosedSeverity: MEDIUM Reported against release: 1.0 Fixed in release: 1.1
PROBLEM:During 1.0 release testing, we observed intermittent crashes of
java.exe on Windows XP. This has also been observed on Windows 2000
machines, but more infrequently. The crashes appear to be container
processes dying, often this occurs soon after a process is stopped or
when Openwings is shut down, but it has been observed at other times as
well. ANALYSIS:Java version is 1.4.1_02
Windows XP brings up a very unhelpful error dialog when these crashes
occur, but Java itself outputs some log files that are somewhat
helpful. These files are of the form hs_err_pidXXXX.log, and are
usually found in the directory in which the Openwings core is started
($OW_HOME\openwings-1.0\lib or the directory of your command prompt).
All of these log files denoted errors that occurred in native code.
Highlights are here.
STACK TRACE 1:
Unexpected Signal : EXCEPTION_ACCESS_VIOLATION occurred at PC=0x6D3B6B07
Function=[Unknown.]
Library=C:\j2sdk1.4.1_02\jre\bin\client\jvm.dll
NOTE: We are unable to locate the function namd symbol for the error
just occurred. Please refer to release documentation for possible
reason and solutions.
Current Java thread:
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:718)
at java.net.InetAddress.getAddressFromNameService
(InetAddress.java:996)
at java.net.InetAddress.getLocalHost(InetAddress.java:1125)
at net.jini.discovery.LookupDiscovery$ResponseListener.interrupt
(LookupDiscovery.java:407)
at net.jini.discovery.LookupDiscovery$Requestor.run
(LookupDiscovery.java:528)
- locked <032947B8> (a java.util.LinkedList)
ANALYSIS: This appears to be an instance of Java Bug
"Bug ID: 4801219 Inet4AddressImpl.getHostByAddr() throws exception in
WinXP native code"
http://developer.java.sun.com/developer/bugParade/bugs/4801219.html
This bug has been marked as fixed in the bug database for the 1.4.2
release. No workaround.
STACK TRACE 2:
Unexpected Signal : EXCEPTION_ACCESS_VIOLATION occurred at PC=0x70006E
Function=[Unknown.]
Library=(N/A)
NOTE: We are unable to locate the function namd symbol for the error
just occurred. Please refer to release documentation for possible
reason and solutions.
Current Java thread:
at sun.awt.windows.WToolkit.eventLoop(Native Method)
at sun.awt.windows.WToolkit.run(WToolkit.java:253)
at java.lang.Thread.run(Thread.java:536)
ANALYSIS: Unfortunately, there are many possible Java bugs with this
exact stack trace. The ones that are still open or not in a public
release that appear most likely are listed below:
http://developer.java.sun.com/developer/bugParade/bugs/4821397.html
(popup, has workaround)
http://developer.java.sun.com/developer/bugParade/bugs/4779118.html
(file dialogs)
http://developer.java.sun.com/developer/bugParade/bugs/4816519.html
(color)
Some of these have workarounds, but I'm not sure if any of them is the
actual problem we experienced.
STACK TRACE 3:
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : EXCEPTION_ACCESS_VIOLATION occurred at PC=0x77C43E05
Function=wcscmp+0x15
Library=C:\WINDOWS\system32\MSVCRT.dll
Current Java thread:
at com.sun.security.auth.module.NTSystem.getCurrent(Native
Method)
at com.sun.security.auth.module.NTSystem.
(NTSystem.java:50)
at com.sun.security.auth.module.NTLoginModule.login
(NTLoginModule.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at javax.security.auth.login.LoginContext.invoke
(LoginContext.java:675)
at javax.security.auth.login.LoginContext.access$000
(LoginContext.java:129)
at javax.security.auth.login.LoginContext$4.run
(LoginContext.java:610)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokeModule
(LoginContext.java:607)
at javax.security.auth.login.LoginContext.login
(LoginContext.java:534)
at
com.gd.openwings.container.remote.RemoteManagerConnectorUserProxyImpl.co
nnect(RemoteManagerConnectorUserProxyImpl.java:73)
at
com.gd.openwings.container.remote.RemoteContainerSmartProxy.connect
(RemoteContainerSmartProxy.java:592)
at
com.gd.openwings.container.remote.RemoteManagerSmartProxy.connect
(RemoteManagerSmartProxy.java:365)
at
com.gd.openwings.component.jini.UseServiceManager.connectService
(UseServiceManager.java:743)
at com.gd.openwings.component.jini.UseServiceManager.useService
(UseServiceManager.java:260)
at com.gd.openwings.component.jini.JiniComponent.useService
(JiniComponent.java:713)
at
com.gd.openwings.context.ui.common.OpenwingsEntityRoot.lookUpInitialCMs
(OpenwingsEntityRoot.java:100)
- locked <03018F98> (a
com.gd.openwings.context.ui.common.OpenwingsEntityRoot)
at
com.gd.openwings.context.ui.common.OpenwingsEntityRoot.refreshAllMyDesce
ndantsFromScratch(OpenwingsEntityRoot.java:638)
at
com.gd.openwings.context.ui.explorer.ExplorerViewTree$1.construct
(ExplorerViewTree.java:79)
at com.gd.openwings.context.ui.explorer.SwingWorker$2.run
(SwingWorker.java:108)
at java.lang.Thread.run(Thread.java:536)
ANALYSIS:
Apparently the JAAS "NTLoginModule" does not work on Windows XP or NT,
it does appear to work on Windows 2000. Note that this isn't really one
of the "random" crashes noted in the problem description of this
defect.
UPDATE 9/4/03:
There continue to be problems with JVM crashes on all Openwings builds
on JDK 1.3.1_03 and later on Windows platforms. These crashes occur
apparently randomly when a process in the container is stopped, but
certain processes seem to cause the crash with more regularity
(HelloWorldPublisher, HelloWorldSubscriber).
The hs_err_pidXXXX.log files are apparently no longer created starting
with Java 1.4.X, but you can usually find the stack trace information
in a "drwtsn32.log" file (location varies by platform). Unfortunately,
there is no Java stack trace information available - apparently the
crash is not associated with a particular Java thread. However, it is
very definitely related to calling Thread.stop().
When the container shuts down a process, it calls any shutdown hooks or
ProcessShutdown callbacks the process has registered, then gives a few
seconds for the process to shut itself down. If there are any threads
left in the process's thread group, they are interrupted (several
attempts are made), then stopped. The crash occurs pretty reliably if a
process's threads are still alive after the interrupts and they have to
be stopped.
A quick analysis of the threading for HelloWorldPublisher revealed that
there were two threads that didn't respond to interruption, perhaps
more could be done to make sure that these threads exit.
The proposed solution is to not call Thread.stop() anywhere in the
container implementation on Windows. This has the side effect of
causing a slow memory leak in the container, however it is preferable
to the alternative of having this major reliability problem.
A separate analysis will be done to see if more can be done to get
process threads to exit successfully. Most applications don't tend to
do much complex thread activity, it's usually Jini or Openwings
Component Services threads that don't exit. |