Monday, December 30, 2013

How to correctly parse IPv6 addresses (in C and Java)

I recently started to do some bug fixing in GNU Netcat. One of the things I worked on was better support for IPv6. In principle, IPv6 support was added to GNU Netcat quite some time ago on the trunk (aka 0.8-cvs), but it turned out that it doesn't really work. After fixing two obvious bugs (c8c0234, 714dcc5), I stumbled over another interesting issue.

One experiment I wanted to do with Netcat was to connect to another host over IPv6 using a link-local address. With IPv6, a link-local address is assigned automatically to each interface that has a MAC address (i.e. all Ethernet interfaces, but not the loopback interface). The IPv6 address is derived from the MAC address and is unique (because MAC addresses are unique). E.g. an interface with the MAC address 08:00:27:84:0b:e2 would get the following IPv6 address: fe80::a00:27ff:fe84:be2.

The problem with link-local addresses is that because of the way they are defined, the routing code in the operating system has no clue which interface it has to use in order to send packets to such an address. Here is where zone IDs come into play. The zone ID (also called scope ID) is a new feature in IPv6 that has no equivalent in IPv4. Basically, in the case considered here, it identifies the interface through which packets have to be sent (but the concept is more general).

Together with the concept of zone ID, the IPv6 specification also introduced a distinct notation to represent an address with an associated zone ID:

<address>%<zone_id>

In the case considered here, the zone ID is simply the interface name (at least, that is how it works on Linux and Mac OS X). E.g. assuming that the remote host with MAC address 08:00:27:84:0b:e2 is attached to the same network as the eth0 interface on the local host, the complete address including the zone ID would be:

fe80::a00:27ff:fe84:be2%eth0

This address can indeed be used with programs such as SSH to connect to the remote host. Unfortunately that didn't work with GNU Netcat:

$ netcat -6 fe80::a00:27ff:fe84:be2%eth0 22
Error: Couldn't resolve host "fe80::a00:27ff:fe84:be2%eth0"

That raises the question how to correctly parse host parameters (passed on the command line or read from a configuration file) such that IPv6 addresses with zone IDs are recognized. It turns out that Netcat was using the following strategy:

  1. Attempt to use inet_pton to parse the host parameter as an IPv4 or IPv6 address.
  2. If the host parameter is neither parsable as an IPv4 address nor an IPv6 address, assume that it is a host name and use gethostbyname to look up the corresponding address.

The problem with that strategy is that although inet_pton and gethostbyname both support IPv6 addresses, they don't understand zone IDs. That is to be expected because both functions produce an in6_addr structure, but the zone ID is part of the corresponding socket address structure sockaddr_in6.

To fully support IPv6, several enhancements have been introduced in the Unix socket APIs. In our context the getaddrinfo function is the most relevant one. It is able to parse IP addresses and to translate host names, but in contrast to inet_pton and gethostbyname it produces sockaddr_in6 (or sockaddr_in) structures and fully supports zone IDs.

As a conclusion, to write C code that supports all types of IP address including IPv6 addresses with zone IDs, use the following approach:

  1. Don't use inet_pton and gethostbyname; always use getaddrinfo.
  2. Don't assume that the information to connect to a remote host can be stored separately as a host address (in_addr or in6_addr) and a port number: that is only true for IPv4, but not for IPv6. Instead you should always use a socket address so that the zone ID can be stored as well. Obviously there are use cases where the host address and port number need to be processed at different places in the code (consider e.g. a port scanner that takes a host address/name and a port range). In those cases, you can still use getaddrinfo, but with a NULL value for the service argument. You then have to store the partially filled socket address and complete the port number later.

Unfortunately, fixing existing code to respect those guidelines may require some extensive changes.

Interestingly, things are much easier and much more natural in Java. In fact, Java considers that the zone ID is part of the host address (an Inet6Address instance in this case) so that the socket address (InetSocketAddress) simply comprises a host address and port number, exactly as in IPv4. This means that any code that uses the standard InetAddress.getByName method to parse an IP address will automatically support IPv6 addresses with zone IDs. Note that this is true even for code not specifically written with IPv6 support in mind (and even for code written before the introduction of IPv6 support in Java 1.4), unless of course the code casts the returned InetAddress to an Inet4Address or is not prepared to encounter a : in the host address, e.g. because it uses it as a separator between the host address and the port number.

Thursday, December 19, 2013

Inspecting socket options on Linux

The other day the question came up whether on Linux it is possible to determine the socket options for a TCP socket created by some running process. The lsof command actually has an option for that (-T f), but it is not supported on Linux. The reason is that socket options are not exposed via the /proc filesystem. This means that the only way to do this is using strace, ltrace or similar tools. This is problematic because they require some special setup and/or produce large amounts of data that one needs to analyze in order to get the desired information. Moreover, since they trace the invocation of the setsockopt syscall, they have to be used at socket creation time and are useless if one needs to determine the options set on an already created socket.

In some cases, it is possible to determine the setting for particular socket options indirectly. E.g. the netstat -to command allows to determine if SO_KEEPALIVE is enabled on the socket for an established TCP connection: the -o option displays the currently active timer for each socket, and for established TCP connections with SO_KEEPALIVE set, this will be the keepalive timer. Obviously this is not a general solution since it only works for a small subset of all socket options.

To solve that issue, my original plan was to patch the Linux kernel to add the necessary information to the relevant files in /proc/net (tcp, tcp6, udp, udp6, etc.). However, it turned out that this is not a trivial change (such as adding a format specifier and argument to a printf call):

  • The files in /proc/net are not meant to be human readable; they define an interface between the kernel and user space tools. This means that before adding information about socket options, one first has to carefully define the format in which this information is presented.
  • The code that formats the entries in the various files in /proc/net is scattered over multiple files and partially duplicated (see e.g. tcp4_seq_show in net/ipv4/tcp_ipv4.c and tcp6_seq_show in net/ipv6/tcp_ipv6.c, as well as the functions called by these two functions).

That's why I finally settled on another idea, namely to write a kernel module that adds new files with the desired information to /proc/net. These files would be human readable (with a format similar to the output of the netstat command), so that one has more flexibility with respect to the presentation of the information in these files.

Fortunately the TCP/IP stack in Linux exports just enough of the relevant functions to enable reusing part of the code that generates the /proc/net/tcp and /proc/net/tcp6 files, making it fairly easy to implement such a kernel module. The source code for the module is available as a project on Github called knetstat. After building and loading the knetstat module, two new files appear in /proc/net:

$ cat /proc/net/tcpstat
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Options
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0 127.0.0.1:6011          0.0.0.0:*               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0 127.0.0.1:6010          127.0.0.1:59038         ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0 127.0.0.1:59038         127.0.0.1:6010          ESTABLISHED SO_REUSEADDR=0,SO_KEEPALIVE=1
tcp        0      0 192.168.1.15:22         192.168.1.6:57125       ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=1
tcp        0      0 192.168.1.15:22         192.168.1.6:57965       ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=1
$ cat /proc/net/tcp6stat
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Options
tcp6       0      0 ::1:6010                :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp6       0      0 ::1:6011                :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp6       0      0 :::22                   :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0

As implied by the name of the module, the format is indeed similar to the output of netstat, except of course for the last column with the socket options:

$ netstat -tan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.1:6010          0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:6011          0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:6010          127.0.0.1:59038         ESTABLISHED
tcp        0      0 127.0.0.1:59038         127.0.0.1:6010          ESTABLISHED
tcp        0      0 192.168.1.15:22         192.168.1.6:57125       ESTABLISHED
tcp        0      0 192.168.1.15:22         192.168.1.6:57965       ESTABLISHED
tcp6       0      0 ::1:6010                :::*                    LISTEN     
tcp6       0      0 ::1:6011                :::*                    LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     

Note that at the time of writing, knetstat only supports a small set of socket options and lacks support for socket types other than TCP. Check the README.md file for the current list of supported features.

Building and installing a vanilla Linux kernel on Ubuntu

This post describes a simple procedure to build and install a new Linux kernel on Ubuntu using the official source code from the kernel developers' Git repository. The aim is to produce a kernel that can be used as a drop-in replacement of the kernels shipped by Ubuntu and that neatly fits into the distribution. The procedure was tested with Linux 3.12 on Ubuntu 13.10.

  1. Ensure that you have enough free disk space. Building the kernel using the present procedure may require up to 13 GB (!) of storage.

  2. Install the necessary build tools:

    sudo apt-get install kernel-package git
    
  3. Download the kernel sources:

    git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
    
  4. Check out the tag or branch for the kernel version you want to build. For example:

    cd linux
    git checkout v3.12
    
  5. Copy the configuration of the Ubuntu kernel. For the currently running kernel, use the following command:

    cp /boot/config-$(uname -r) .config
    
  6. Initialize new configuration options to their default values (See here for an explanation):

    yes "" | make oldconfig
    
  7. Use make-kpkg to compile the kernel and create Debian packages. You may want to use --append-to-version to add something to the version number, e.g. if you intend to apply patches to the kernel:

    fakeroot make-kpkg --initrd --append-to-version=-patched kernel-image kernel-headers
    
  8. Go back to the parent directory and install the generated packages using dpkg -i. This should take care of creating the initial ramdisk and configuring the boot loader. You can now reboot your system to load the new kernel.

Monday, December 9, 2013

Broken by design: WebSphere's default StAX implementation (part 2)

A few weeks ago I posted an article describing a vulnerability in WebSphere's default StAX implementation (XLXP 2). In the meantime, IBM has acknowledged that the problem I described indeed causes a security issue and they have produced a fix (see APAR PM99450). That fix introduces a new system property called com.ibm.xml.xlxp2.api.util.encoding.DataSourceFactory.bufferLoadFactor described as follows:

The value of the property is a non-negative integer which determines the minimum number of bytes (as a percentage) that will be loaded into each buffer. The percentage is calculated with the following formula: 1/2n.

When the system property is not set its default value is 3. Setting the property to a lower value than the default can improve memory usage but may also reduce throughput.

In the last sentence IBM makes an interesting statement that raises some questions. Why would a change enabling the parser to read data into an already reserved and partially filled buffer potentially cause a reduction in throughput? To answer that question, one has to understand how IBM actually implemented that feature. Fortunately this doesn't require access to the source code. It is enough to carefully examine the visible behavior of the parser, namely by feeding it with an InputStream that returns data in small chunks and determining the correlation between read operations on that InputStream and events produced by the parser.

This reveals that the parser now uses the following algorithm: if after a first read operation the fill level determined by the new system property is not reached, a second read request will be issued immediately and this is repeated until the required fill level has been reached. The implication of this is that XMLStreamReader.next() may need to read much more data than what is necessary to produce the next event. Stated differently, a call to XMLStreamReader.next() may block even if enough data is available in the input stream.

As a matter of fact, this may have an impact on performance. Consider e.g. the processing of a SOAP message. With a well designed StAX implementation, the SOAP stack will be able to start processing the SOAP header before the SOAP body has been received completely. That is because a StAX implementation is expected to return an event as soon as enough data is available from the underlying input stream. E.g. if the next event is START_ELEMENT then a call to XMLStreamReader.next() should return as soon as the start tag for that element has been read from the stream. With the change introduced by PM99450, that assumption is no longer true for XLXP 2.