Jul 2, 2008

Writing sysctl Drivers on Linux

Writing drivers for the Linux kernel can be a grotesque experience for neither the kernel nor the documentation is anywhere near A-1 at Lloyd's. It can be a lot of fun doing things in kernels, however, and it is the right place for a lot of hardware related stuff, meta-process stuff, et cetera. I would like to demonstrate a rarely used technique for a fairly rarely used type of driver that is much simpler and faster than the oft-used character device driver.

The type of driver I am talking about handles sysctl requests and I'll think you'll find it an excellent alternative to similarly generic character devices, especially since it can easily masquerade as one. I think sysctl drivers are particularly nice if you want an interface for the command line and for a high-speed C API with a minimal amount of thrashing around with Linux's cruft. sysctl means system control and it is a BSDism that found its way into Linux. Naturally, sysctl is also available on the fine BSDs like NetBSD and FreeBSD, but, those systems aren't in much need of a solid driver system as they are quite well-designed. Therefore, Linux is my focus du jour.

There are three ways of interfacing with the kernel via a sysctl. Defining a sysctl handler can, at your discretion, add a file to /proc/sys (assuming you are using procfs; nearly everyone is) that can be accessed as a file that you can, with proper permissions, read and write (i.e. like a character device). Secondly, you can access your handler with the sysctl utility that usually lives in /sbin. And, thirdly, you can use the sysctl system call through it's aptled named library function, sysctl, found in or in a raw way. These direct access choices provide excellent efficiency.

There is a mild caveat, though. sysctl is intended for fairly low band-width communications to the kernel. You have basically two options; you can use sysctl for fairly small, discrete chunks of data (as it is almost universally used in the kernel right now) or have some sort of stateful protocol in your accesses of it (I'd like to see more of this style of complex interfacing with sysctl handlers).

One very simple use of a sysctl driver is to just make a textual or numeric setting that can be accessed through the multiple ways enumerated above. This is very easy to do. To get started, simply fill out an array of ctl_table structures and then register that array with the register_sysctl_table. When you register the table, a ctl_table_header is returned (or NULL if there is an error). If you want to remove your handler, you pass that table header to the unregister_sysctl_table function:

    #define __KERNEL__
#define MODULE

#include
#include
#include

static ctl_table my_ctl_table[] = {
{0}
};

static struct ctl_table_header *my_table_header;

int init_module (void)
{
if (!(my_table_header = register_sysctl_table(my_ctl_table, 0)))
return EPERM;
return 0;
}

void cleanup_module (void)
{
if (my_table_header)
unregister_sysctl_table(my_table_header);
}

init_module and cleanup_module are automatically called when your module is loaded. The ctl_table array expects an empty entry ("{0}") at the end of the array, which also happens to be the beginning in this example. To compile the example, if it was named ctl_test.c, we would do something like this:

    gcc -I/usr/src/linux-2.4/include -c ctl_test.c

The result would be a tiny file named ctl_test.o that you could load with insmod. You Linux source code might be somewhere else than this; change the path appropriately. You may need to download the Linux source. Although you probably have similarly named headers in your include path already (in /usr/include) these are missing information that you need; get the source if you can.

As root, you can load the module with "insmod ctl_test.o", check its existence with "lsmod", and then remove it with "rmmod ctl_test". Those utilities normally live in /sbin by the way.

Okay, now we will add a real ctl_table to the array in order to make the first functional device. The structure of ctl_table can be found in linux/sysctl.h and is this:

    struct ctl_table
{
int ctl_name; /* Binary ID */
const char *procname; /* Text ID for /proc/sys, or zero */
void *data;
int maxlen;
mode_t mode;
ctl_table *child;
proc_handler *proc_handler; /* Callback for text formatting */
ctl_handler *strategy; /* Callback function for all r/w */
struct proc_dir_entry *de; /* /proc control block */
void *extra1;
void *extra2;
};

To understand this structure, let's look at how we will fill out this structure to make a simple string-based entry.

    enum { CTL_MYTEST = 555 };
char buffer[128];

static ctl_table my_ctl_table[] = {
{
CTL_MYTEST,
"mytest",
buffer,
sizeof(char) * 128,
0666,
NULL,
&proc_dostring,
&sysctl_string,
},
{0}
};

The first member is the ctl_name and is a unique integer 'name' that is used to rapidly locate this sysctl. The Linux kernel defines the 'names' up to 10 or so (see linux/sysctl.h), so I like to use a nice high number. The second member, procname, is a string giving a human-readable name for the sysctl. This name will actually appear in /proc/sys as a file (e.g. /proc/sys/mytest for our code). If the procname is NULL, no file will be created and this sysctl will only be accessible through the sysctl function (and you can set proc_handler to NULL).

Data is transported to and from the handler with the next member, data, which in this case is a 128 character array called buffer. The size of data is given as the next member. Thinking about the size of this buffer is somewhat important. Next comes the access mode which in our example allows read and write permissions for anyone who wants to use it (i.e. including non-root users). The next member is child which allows another ctl_table array to be embedded; this allows directories:

    enum { CTL_MYTEST = 555, MYTEST_BUFFER = 1 };
char buffer[128];

static ctl_table my_inner_ctl_table[] = {
{
MYTEST_BUFFER,
"mytest",
buffer,
sizeof(char) * 128,
0666,
NULL,
&proc_dostring,
&sysctl_string,
},
{0}
};

static ctl_table my_ctl_table[] = {
{
CTL_MYTEST,
"mydir",
0,
0,
0,
my_inner_ctl_table,
},
{0}
};

Now you actually get /proc/sys/mydir/mytest. The last two members that we need to consider are proc_handler and strategy, a pair of function pointers that actually do the handling of the sysctl. Here are their types (strategy is of type, ctl_handler):

    typedef int ctl_handler (ctl_table *table, int *name, int nlen,
void *oldval, size_t *oldlenp,
void *newval, size_t newlen,
void **context);

typedef int proc_handler (ctl_table *ctl, int write, struct file * filp,
void *buffer, size_t *lenp);

We'll get to writing our own handlers in a jiffy (wink), but, in our example we are using a pair of very convenient, ready-to-use handlers called proc_dostring and sysctl_string. These will automatically handle the getting and setting of a string using buffer. We could have used their int array cousins, proc_dointvec and sysctl_intvec for lists of integers. These functions are defined in kernel/sysctl.c.

So, upon compiling and loading this module we get the file as before, mytest, in the /proc/sys tree. Now we can set the buffer by doing either of these two commands (if we use a directory with the sysctl command-line utility, we join the path elements with a "." character (e.g. mydir.mytest)):

    echo 'test string' > /proc/sys/mytest
or
    sysctl -w mytest='test string'

Both of these commands will now cause "cat /proc/sys/mytest" to return "test string". Now, if we wanted to access it from a high speed C program, this is how we would do it (note that when writing, we are passing the buffer_length by value and not by reference):

    enum { CTL_MYTEST = 555 };

int main (void)
{
int name[] = { CTL_MYTEST };
int name_length = 1;
char buffer[128];
size_t buffer_length = sizeof(char) * 128;

/* Reading */

sysctl(name, name_length, buffer, &buffer_length, 0, 0);
printf("mytest: %s\n", buffer);

/* Writing */
strcpy(buffer, "potato");
buffer_length = strlen(buffer);
sysctl(name, name_length, 0, 0, buffer, buffer_length);

return 0;
}

What remains now is to customize the proc_handler and/or ctl_handler in our driver so that we can wind up doing just about anything through sysctl calls. Optionally (if you return 1 or more from the ctl_handler, the proc_handler is skipped), only one of the two handlers is called depending on if you are accessing the sysctl through the /proc interface (e.g. with cat or the sysctl CLI utility) or with a sysctl function call. With the proc_handler you get an actual file, a read/write flag, and an input/output buffer whereas with ctl_handler you get argments that very closely mirror the sysctl function call arguments:

    typedef int ctl_handler (ctl_table *table, int *name, int nlen,
void *oldval, size_t *oldlenp,
void *newval, size_t newlen,
void **context);

typedef int proc_handler (ctl_table *ctl, int write, struct file * filp,
void *buffer, size_t *lenp);

If you don't define a strategy in your ctl_table it will automatically copy the data, by the way, and you can mix and match the prebuilts (say to produce formatted integer array output) with your own functions. What would be fun, here, though would be to hook into the call and do whatever you want in the kernel. You have an easy data transport mechanism which can readily handle your own custom C struct via a ctl_handler, multiple access methods, and more and all you have to do is define a custom handler. For example:

    #include 

static int my_ctl_handler (ctl_table *table, int *name, int nlen,
void *oldval, size_t *oldlenp,
void *newval, size_t newlen,
void **context)
{
if (oldval) {
printk("Ah, so you wanted the value. Well, Bonk!\n");
copy_to_user(oldval, "bonk!", (sizeof(char) * 6));
put_user((sizeof(char) * 6), oldlenp);
}
if (newval) {
copy_from_user(table->data, newval, newlen);
printk("Good golly, someone presented '%s'!\n", table->data);
}
return 1;
}

static ctl_table my_ctl_table[] = {
{
CTL_MYTEST,
"mytest",
buffer,
sizeof(char) * 128,
0666,
NULL,
&proc_dostring,
&my_ctl_handler,
},
{0}
};

Note that we return 1 in order to bypass the proc_handler. If we run the test program we made above it outputs, "mytest: bonk!". And, if we then execute the dmesg utility the following appears in the log:

    Ah, so you wanted the value. Well, Bonk!
Good golly, someone presented 'potato'!

You can do anything once you are in here! For example, you could dynamically create more ctl_table entries to give unique handles to a caller. On the reception of an 'open' sysctl, you could generate a newly numbered sysctl and return that number. You would then have a unique 'device' to access (much like a socket, etc).

You can also tap the proc_handler in order to change the formatted appearance of the file or to make it take the extra trip into the ctl_handler:

    static int my_proc_handler (ctl_table *ctl,
int write,
struct file *filp,
void *buffer,
size_t *lenp)
{
if (write) {
my_ctl_handler(ctl, NULL, 0, NULL, NULL, buffer, *lenp, NULL);
} else {
if (filp->f_pos) {
*lenp = 0;
} else {
my_ctl_handler(ctl, NULL, 0, buffer, lenp, NULL, 0, NULL);
filp->f_pos = 1;
}
}
printk("We were in the proc_handler\n");
return 0;
}

Now, if we do an "echo 'peekaboo' > /proc/sys/mytest", dmesg shows (note the tacked-on newline):

    Good golly, someone presented 'peekaboo
'!
We were in the proc_handler

If we do a "cat /proc/sys/mytest" we get "bonk!". If we do dmesg then we see:

    Ah, so you wanted the value. Well, Bonk!
We were in the proc_handler
We were in the proc_handler

Note that it entered to proc_handler twice. When reading, it keeps entering it until *lenp is equal to 0. So, we use the file position to mark if we've already read the value. Note that you don't have to call the ctl_handler from the proc_handler or vis versa. You could have the ctl_handler have a nice binary struct format (perfect for your own userland API that hops into the kernel for stuff) while the proc_handler could provide a more human-readable interface.

Well, that rounds up this quick look at Linux sysctl driver programming. The rest is up to you. Once you are in the kernel, you really can do just about anything you want. Happy hacking!

Stumble Upon Toolbar

0 comments:

::SITES LINKING ME::

SetEnvIfNoCase Referer "^http://(www.)?securecomputing\.com" ref=1 "(.*)" Order Allow,Deny Allow from all Deny from 206.169.110.66 Deny from env=ref