Posts Tagged ‘ssh’

Remote Scripting for AWS

Saturday, January 24th, 2009

When signing up for Amazon Web Services, you end up generating all of the following identifying information:

  • Your Account Number
  • An Access Key
  • A Secret Access Key
  • An SSL Certificate file
  • The Private Key for your SSL Cert

The various AWS command APIs and tools will require you to provide one more more of these pieces of information. Rather than actually host all of this information on my instances, I have instead chosen to build Ruby scripts using:

I’m small-scale for the moment, so I can keep a centralized list of my instance IDs. Without too much effort, I can look up the instance(s), identify their public DNS (plus everything else), and then open up an ssh connection and push data into a channel’s environment or upload files for bundling purposes.

Here’s a few things that I’ve learned in the process.

Channels

Most of your core work gets done on a Net::SSH::Connection::Channel, and sometimes asynchronously (as in the case of my Health Check script). There are a lot of library shorthands — the prototypical Hello World example is channel-less:

Net::SSH.start("localhost", "user") do |ssh|
	# 'ssh' is an instance of Net::SSH::Connection::Session
	ssh.exec! "echo Hello World"
end

But as with all examples, you soon find that surface-level tools won’t quite do the trick. The main thing you’ll need to have all the time is convenient access to the data come back from the channel, asynchronously or otherwise. So create yourself a simple observer:

Then during and after executions, you’ll have access to all the core information you’ll need to make conditional decisions, etc.

Pushing Your Environment

Many of the AWS tools will recognize specific environment variables. The usual suspects are:

  • AMAZON_ACCESS_KEY_ID
  • AMAZON_SECRET_ACCESS_KEY

So, you can use Net::SSH::Connection::Channel.env to inject key-value pairs into your connection. However, you’ll need to make some config changes first — and major thanks to the Net::SSH documentation for clarifying this in its own RDoc:

“Modify /etc/ssh/sshd_config so that it includes an AcceptEnv for each variable you intend to push over the wire”

#       accept EC2 config
AcceptEnv  AMAZON_ACCESS_KEY_ID AMAZON_SECRET_ACCESS_KEY
...

After you make your changes, make sure to bounce the ssh daemon:

/etc/init.d/sshd restart

Then your selected environment will shine through.

Running sudo Remotely

It’s reasonable to execute root-level commands using sudo, because you can provide a good amount of granular control.

The first thing that we’ll all run into is:

sudo: sorry, you must have a tty to run sudo

Fortunately, there’s Net::SSH::Connection::Channel.request_pty:

ch.request_pty do |ch, success|
	raise "could not start a pseudo-tty" unless success

	#	full EC2 environment
	###ch.env 'key', 'value'
	###...

	ch.exec 'sudo echo Hello 1337' do |ch, success|
		raise "could not exec against a pseudo-tty" unless success
	end
end

I’ve taken the approach of allowing NOPASSWD execution for everything I do remotely, after making darn sure that I had constrained exactly what could be done under those auspices (HINT: pounds of caution, tons of prevention). You can configure all of this by editing /etc/sudoers:

# EC2 tool permissions
username  ALL=(ALL)  NOPASSWD: SETENV: /path/to/scripts/*.rb
...

Also, if you intend to have those scripts consume the environment variables that you’ve injected into your channel, you’ll also need to annotate the line with SETENV, or they won’t export across into your sudo execution.

If you are more security minded, then there’s many other variations you can play with /etc/sudoers. I haven’t yet experimented within pushing a sudo password across the wire, but that may be where Net::SSH::Connection::Channel.send_data comes into play.

Port Forwarding

I wanted to easily do SSH tunneling / port-forwarding against an instance, referenced by my local shorthand. So I wrote my usual EC2 instance ID lookup and started an Net::SSH connection. Here’s how you put yourself in ‘dumb forward’ mode:

And then [Ctrl-C] will break you out.

Inconsistent EC2 Marshalling

I’d gotten all of my EC2 data parsing working great, then I started to expand my capabilities with the Amazon S3 API gem (aws-s3). Suddenly and magically, the results coming back from my EC2 queries had taken on a new form. Moreover, this was not consistent behavior across all operating systems; at least I do not see it on WinXP, under which my health monitoring checks execute.

The Amazon APIs are SOAP-driven, so I guessed that amazon-ec2 (and aws-3) would both leverage soap4r. Well, that is in fact not the case; Glenn has commented below on this known issue. He uses the HTTP Query API, and the cause of the discrepancy are (as of this writing) still undetermined.

Regardless, our job is to get things to work properly. So for the meanwhile, let’s use a quick & dirty work-around. The two core differences are:

  • the data coming back may be wrapped in an additional key-value pair
  • single value result-sets come back as an Object (vs. an Array with a single Object)

So I crafted a little helper singleton, which which still allows me to detect between ‘valid’ and ’empty’ requests (nil vs []):

That will address our immediate needs for the moment, until a benificent coding Samaritan comes along.

Remote Debugging using JConsole, JMX and SSH Tunnels

Thursday, January 22nd, 2009

I’ve recently hosted my Spring / Hibernate webapp in the cloud thanks to Amazon Web Services. A future post will mention the monitoring I’ve put in place, but Tomcat keeps dying after about 36 hours. The first thing I need to do is enable debugging and JMX remoting so that I can put JConsole to work.

This turned out not to be as easy as I’d liked. Even with a bevy of useful resources available — lotsa people have run into these issues — it took a while to get the right combination. Let’s hope I can save you a bit of that pain …

JConsole / JMX Remoting via SSH Tunnels

As I mentioned, I’m hosting this solution in the cloud. So, when things go bad, I need to be able to debug remotely. I don’t want to open up my security group to general traffic, so using SSH tunnels is the best option. JConsole is a great tool for measuring current statistics and performance of your Java app, and relies on JMX remoting. It works great locally, or even within a trusted network, but once you’re behind a firewall with broad port-blocking, there are some significant issues. There are several core Java forum topics related to this discussion:

Daniel Fuchs has written several articles which illustrate these issues and provide good solutions. He explains that JMX remoting needs two ports: one for the RMI registry, and one for the RMI connection objects which are stubs used for remoting all the critical data. If you’re using the default JVM agent, you’ll tend to use the following JVM System.property’s on the server:

-Djava.rmi.server.hostname
-Djava.rmi.server.useLocalHostname
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port
-Dcom.sun.management.jmxremote.ssl
-Dcom.sun.management.jmxremote.ssl.need.client.auth
-Dcom.sun.management.jmxremote.authenticate
-Dcom.sun.management.jmxremote.access.file
-Dcom.sun.management.jmxremote.password.file

I’ll come back to these, but the one that’s important here is jmxremote.port. It allows you to specify the RMI registry port, the one that you’ll use to establish your remote connection via JConsole. However, the port for RMI export, which is used for all the critical data callbacks, is randomly chosen (on a session basis or JVM basis, not sure) and cannot be specified. And you can’t open a port for tunneling if you don’t know what it is.

You can see this issue if you crank up the debugging on JConsole. I was having issues getting the logging output so I took the double-barrel approach, using both the -debug argument and the custom java.util.logging descriptor, the contents of which I stole from here. Invoke it as follows:

jconsole -debug -J"-Djava.util.logging.config.file=FILENAME"

The quotes are optional. Provide the logging descriptor filename. You can call out the JMX Service URL or the hostname:port combination at the end if you like. Now you’ll eventually a debug output much like this:

FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] connecting...
FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] finding stub...
FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] connecting stub...
FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] getting connection...
FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] failed to connect: java.rmi.ConnectException: Connection refused to host: IP-ADDRESS; nested exception is:
java.net.ConnectException: Operation timed out

PORT will be the RMI registry port you’re tunneling into. IP-ADDRESS is special, we’ll get to that, and it’s important to note that it’s a ‘ConnectException‘ occurring against that host.

This debugging information can show up rather late in the whole connection process, undoubtedly because it’s an ‘Operation timed out‘ issue, so don’t be surprised if it takes a while. Fortuntely, you can also see immediate verbose feedback when you set up your ssh tunnel connection (see below).

Addressing the Randomly-Generated RMI Export Port

The first problem I chose to resolve was the one relating to the random RMI export port issue. Daniel has provided a fine example of how to implement a custom ‘pre-main’ Agent which you can use to supplant the standard JVM one. There’s his quick’n’dirty version which doesn’t address security — which is where I started. And then there’s a more full-fledged version, which I modified to be configurable.

Most importantly, it builds its JMXConnectorServer with the following service URL:

service:jmx:rmi://HOSTNAME:RMI-EXPORT-PORT/jndi/rmi://HOSTNAME:RMI-REGISTRY-PORT/jmxrmi

Traditionally, you’ll see this service URL from the client perspective, where HOSTNAME:RMI-EXPORT-PORT is not defined and you just have ‘service:jmx:rmi:///jndi/rmi://...‘. JConsole will build that sort of URL for you if you just provide HOSTNAME:RMI-REGISTRY-PORT (eg. hostname:port) when connecting.

By calling out the RMI-EXPORT-PORT in the agent’s service URL, you can affix it and tunnel to it. You can use the same port as the RMI registry; this only requires you to open one port for tunneling.

On your client / for JConsole, the HOSTNAME will probably be localhost, where you’ve opened your tunnel like so:

ssh -N -v -L9999:REMOTE-HOST:9999 REMOTE-HOST

9999 is just an example port. REMOTE-HOST is the host you’re tunneling to. You can remove the -v argument, but it’s good to have that around so that you can see the network activity. You can also use the -l argument to specify the login username on the remote host. Note that you’re opening the same port locally as you’re hitting on the server, with no offset. You’ll need to open the same port because your agent on the server is going to need to know what port to callback to itself on for RMI export, and that won’t work if you have an offset. So you might as well use the same port for both the RMI registry and RMI export, and just keep that one port available locally.

On the server in your agent, the HOSTNAME part of the service URL can either be InetAddress.getLocalHost().getHostName(), or an IP address (127.0.0.1), or in my case ‘localhost’ just worked fine.

The major reason to create the custom agent is to build the port-qualifying service URL. As usual, the example code takes a lot of shortcuts. So, I built myself a more re-usable agent — influenced by the standard JVM agent’s System.property’s — which allowed me to configure the same sorts of things as mentioned above:

  • hostname : to be used as the HOSTNAME value above
  • port.registry : to be used for RMI-REGISTRY-PORT
  • port.export : to be used for RMI-EXPORT-PORT, defaulting to the same as port.registry if not provided
  • ssl : true to enable SSL
  • authenticate : true to enable credentials / authentication
  • access.file : specifies the location of the user credentials file
  • password.file : specifies the location of the user password file

And so I was able to configure my agent service URL for localhost, using the same port for both RMI requirements, and using simple password-based auth. I did not go down the SSL route, though many of the posts from Daniel and others explain this as well. Do that once you’ve tackled the core problem :)

Another great post relating to this issue mentions that Tomcat has a custom Listener for setting up a similar agent. The example was:

<Listener className="org.apache.catalina.mbeans.JMXAdaptorLifecycleListener" namingPort="RMI-REGISTRY-PORT" port="RMI-EXPORT-PORT" host="localhost"/>

I didn’t look any deeper into this to see whether it supports SSL and/or basic authentication. But it seems clear that this *not* a Java agent, because you have to set those up via System.property’s. Here’s what I needed to add to Tomcat startup for my custom agent:

-D___.hostname=localhost
-D___.port.registry=RMI-REGISTRY-AND-EXPORT-PORT
-D___.authenticate=true
-D___.password.file=$CATALINA_HOME/conf/jmxremote.password
-D___.access.file=$CATALINA_HOME/conf/jmxremote.access 
-javaagent:PATH/TO/AGENT.jar

I’ve omitted the namespace I used for my agent, and the formal name of the ‘pre-main’-compatible JAR file that I built using the instructions that Daniel provided. Tomcat won’t start up properly until the agent is happy; after that, then you’re golden.

So I got Tomcat running, started up an ssh tunnel, and invoked JConsole. And still no matter what I did, I still got ConnectException: Operation timed out. I tried to connect via JConsole in all the following ways:

HOSTNAME:PORT
service:jmx:rmi:///jndi/rmi://HOSTNAME:PORT/jmxrmi
service:jmx:rmi://HOSTNAME:PORT/jndi/rmi://HOSTNAME:PORT/jmxrmi

All of these are valid URLs for connecting via JConsole. For a while there I wasn’t sure whether you could use the same port for both the RMI registry and export, so I could see that the JConsole log was different when I called out the RMI export info explicitly in the service URL. Still, it didn’t seem to help.

Then I started to realize that there were two separate issues going on, although they tended to blend together a lot in the posts I’d been reading.

Addressing the RMI Export Hostname

The short version is, even if you’ve set up your JMX service URL properly on the server — yes, even if you’ve set its HOSTNAME up to be ‘localhost’ — you’ll still need to tell JMX remoting which hostname the RMI export objects should use for callbacks. This requires you to provide the following System.property’s as well:

-Djava.rmi.server.hostname=localhost
-Djava.rmi.server.useLocalHostname=true

The useLocalHostname may not be relevant, but it doesn’t hurt. All this time I’d thought that because I was configuring that information in the service URL that RMI would build the objects accordingly. But I was wrong … it doesn’t … you need to call that out separately.

What was not apparent to me — until I started to see the same articles pop up when I revised my search criteria — was the IP-ADDRESS in this exception dump:

FINER: [javax.management.remote.rmi.RMIConnector: jmxServiceURL=service:jmx:rmi:///jndi/rmi://localhost:PORT/jmxrmi] failed to connect: java.rmi.ConnectException: Connection refused to host: IP-ADDRESS; nested exception is:
java.net.ConnectException: Operation timed out

It was the IP address of my instance in the cloud. The callbacks were being made back to my VM, but they needed to be made to ‘localhost’ so that they could go through the tunnel that I’d opened. The ‘Operation timed out‘ was due to the port being closed, which is the whole reason you’re using ssh tunnels in the first place. Once the RMI exported objects know to use ‘localhost’, that addresses the problem. And magically, JConsole will connect and show you all the data in the world about your server.

So you must provide those System.property’s above regardless of what other configuration you’ve provided in your custom JMX agent.

Additional Concerns

There were a number of other red herrings that I followed for a while, but they were called out as being potential issues, so I kept note.

  • If your server is running Linux, there are a couple of things you’ll want to check, to make sure that your /etc/hosts is self-referencing correctly, and that you’re not filtering out packets.
  • You will have troubles stopping Tomcat when it has been started with your custom JMX agent; you’ll have to kill the process. Apparently agents don’t release their thread-pools very nicely. Daniel provides an example of an agent with a thread-cleaning thread — which still has some limitations, and raises the philosophical question ‘who cleans the thread-cleaner‘? He also provides an agent that can be remotely stopped — which is reasonably complex. I’ll save that one for a rainy day.
  • If you want to use SSL in your authentication chain, read up on Daniel’s other postings, and use the following System.property’s on both the server and when starting JConsole:

    -Djavax.net.ssl.keyStore
    -Djavax.net.ssl.keyStorePassword
    -Djavax.net.ssl.trustStore
    -Djavax.net.ssl.trustStorePassword
  • I have built some Ruby scripts which allow me to dynamically look up an AWS instance’s public DNS entry and then start up a Net::SSH process with port-forwarding tunnels. This works fine for HTTP and even for remote JVM debugging, but it did not work for JMX remoting. I’m not sure why, so you should stick with using ssh for setting up your tunnels.
  • I started out this exercise using Wireshark for packet sniffing. I’m using OS X, and I installed Wireshark — the successor to Ethereal — via MacPorts. It runs under X11, which you’ll need to install from either Apple’s site or your Optional Installs DVD. I couldn’t get any Interfaces (ethernet card, etc.) to show up, until I learned that I should:

    sudo wireshark

    The app will warn you that this is unsafe, but it works. The Nominet team says that you can address this issue by providing:

    sudo chmod go+r /dev/bpf*

    However that is volatile, and has to be done whenever your Mac starts up. More config involved, so I took the easy path.

  • If you’re using a script to start and stop Tomcat, you’ll need to somehow separate out the System.property’s that should be used on startup, and omit them when invoking shutdown. If you invoke shutdown with your debug and/or RMI ports specified, the shutdown will fail because those ports are already in use.

    I’m using the newest standard Tomcat RPM available for Fedora Core 8 — tomcat5-5.5.27 — and it’s uniquely nutty in terms of how it is deployed:

    /etc/init.d/tomcat5
    /usr/bin/dtomcat5
    /etc/tomcat5/*.conf

    That’s a very non-standard arrangement. The init.d script awks the *.conf files, and a whole array of other exciting things. I still haven’t gotten it to properly do an init.d restart due to how it blends the JAVA_OPTS handling. So that’s left as a case-specific exercise.

  • The whole reason I went down this path was to address a memory leak relating to Hibernate sessions which I blogged about a long time ago. The fix required me to invoke Tomcat with the following System.property’s:

    -Djavax.rmi.CORBA.PortableRemoteObjectClass
    -Djava.naming.factory.initial

    The org.objectweb.carol JAR, which these settings were targeted at, is part of my weapp, so it’s available in its own Classloader. However, once I put the custom JMX agent in place, I got:

    FATAL ERROR in native method: processing of -javaagent failed
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    Caused by: java.lang.ClassNotFoundException: org.objectweb.carol.jndi.spi.MultiOrbInitialContextFactory

    Attempting to create a symlink to the app-specific JAR in either common/lib, common/endorsed or shared/lib did not address the issue. I had to hack the JAR into the --classpath in order to get Tomcat to start. And yes, hack was the operative term (again).

In Summary

Frankly, all that discovery was enough for one day. And yes, it took me that long to find all of the corner-cases i was dealing with. I hope that if you find this article that it will make your path a bit easier. I know I’ll be glad that I blogged about the details the next time I bump into the issue!