A Software Tool to Bind Allegro CL to C Libraries


Draft History:


2000Oct31 Update to version 1.6.1 - ACL 6.0
2004Jun12 Update to 1.6.3 - ACL 7.0 and 8.0


Contents


Introduction

The purpose of this tool is to facilitate the creation of an interface between Allegro CL and a library of C functions and subprograms.

The C library is typically defined by one or more header files that declare the functions and variables that make up the interface and define named constants significant at the interface. This tool parses the header files and generates appropriate Lisp and C code to create the interface. In some cases, warning messages are generated to point out areas that may need additional programmer intervention.

The Binder tools assume that the (header file) input is a complete and accurate representation of the _interface_ to an application library.

The Binder tools are not designed to examine a complete application and to somehow abstract out the interface definition. We assume that this abstraction step has already been taken by the time the input is presented to the tools.

In the case of third party libraries, this is typically the case. The library is defined by a set of header files that are needed to write correct C code that calls these libraries. The same headers are necessary and usually sufficient to generate the Lisp interface to the library.

In the case of user written applications or libraries, it is necessary for the user to create appropriate header files defining the interface. The binder tools will fail miserably if simply presented with the application code files. For example, the body of function definitions is entirely skipped by the source code analyzer; thus, if some type declaration occurs only in the scope of a function definition, it will never be seen.

Using the Tool

The Main Conversion Function

The binding process is normally invoked with the function ff:build-c-binding with the following arguments:

ff:build-c-binding c-or-h-file             Function
                          &key (c-args "")
                               (lisp-out t)
                               (c-out t)

                               include
                               exclude

                               (package nil)

                               (case ff:*decode-intern-case*)
                               (hyphen ff:*decode-intern-hyphen*)
                               (dash ff:*decode-intern-dash*)
                               (res ff:*decode-intern-res*)

                               (verbose nil)

The first and only required argument, c-or-h-file, is the pathname to a file of C code. This file, and any files that it includes, will be parsed in order to determine what interface components need to be generated. This is normally a header file (.h file type or extension), but may also be a C source file.

The keyword argument c-args is very important to the correct functioning of the interface generator. It is a string containing all the -D and -I switches required for a complete and correct compilation of the file specified in the c-or-h-file argument. If this argument is missing or incorrect, the effect is usually to print many pre-processor and parser warnings and error messages; but it may also silently result in an incorrect interface definition.

The keyword argument lisp-out is the pathname of a file where the generated Lisp code is placed. If this argument is NIL, nothing is generated. If this argument is t, the output is printed to the current value of *standard-output*.

The keyword argument c-out is the pathname of a file where the generated C code is placed. If this argument is NIL, nothing is generated. If this argument is t, the output is printed to the current value of *standard-output*. C program text is generated when wrapper functions are needed to transmit data correctly between Lisp and C; one situation is when we need to pass structure arguments by value between Lisp and C.

The keyword arguments include and exclude determine which C files are used to generate Lisp interface components. Only one of these arguments may be used in any call to ff:build-c-binding. When specified, the value must be a pathname or a list of pathnames. The effect of the include keyword is to generate Lisp interface components only from the specified file or files; any other C files included in the compilation are ignored by the Lisp binder but may be essential to the C parsing stage of the process. The effect of the exclude keyword is to ignore the specified files in the Lisp binder. For large libraries using complex collections of include files, it may be necessary to make several passes through the binding process in order to sort out the files needed in the Lisp interface. The file selection arguments apply only to interface components generated from C statements. The current implementation cannot determine the origin of a C macro and therefore all constant definitions are always included in the generated output.

The keyword arguments case, hyphen, dash, and res control how foreign symbol strings are translated into Lisp symbols. The default values are taken from corresponding special variables described in the sections on Lisp output and customization. The default behavior is to convert foreign name strings according to the current readtable case and to signal an error if a conflict occurs. A conflict occurs when two different foreign strings map to the same Lisp symbol or when the Lisp symbol is already bound or defined as a function or macro.

The keyword argument package is the name of a package where foreign symbols are interned. The default is NIL. The effect is to use the value of the global variable ff:*default-foreign-symbol-package*. If the value of this variable is NIL, the effect is to use a package defined as follows

(defpackage "C" (:use common-lisp foreign-functions))

It is a good idea to place all the foreign symbols in a Lisp package that does not use any other Lisp packages. This is the only definitive way to avoid symbol name conflicts.

The keyword argument verbose controls the amount of information printed when a parse or translation error is encountered. When T, a fragment of the parse tree is printed.

Generated Lisp Output

The generated Lisp output consists of

The pupose of these macros is to emit an ff:bind-c-export form to cause the conditional export of Lisp symbols corresponding to foreign identifiers, and to allow further customization of the generated code when so desired by the user.

When the C file contains a function declaration, a Lisp form such as the one below is generated:


;; c-ex01.h:7 <2> 
;; struct hostent* gethostbyname( const char*, struct hostent*, char*, int,
;;              int* h_errnop);
(bind-c-function gethostbyname
     :unconverted-entry-name "gethostbyname"
     :c-return-type ("struct" "hostent" "*")
     :return-type (* hostent)
     :c-arg-types (("const" "char" "*") ("struct" "hostent" "*") ("char" "*")
                   ("int") ("int" "*"))
     :c-arg-names (Arg0 Arg1 Arg2 Arg3 h_errnop)
     :arguments ((* :char) (* hostent) (* :char) :int (* :int))
     :prototype t
     )

When compiled, this form macroexpands into a suitable ff:defforeign form.

The leading comment identifies the C statement number in the source file and shows a reconstruction of the original C source declaration. The Lisp type specified for each argument in the :arguments list is the most specific Lisp type that includes all the possible Lisp argument types.

When the C file contains a struct declaration or a typedef, a corresponding ff:bind-c-type or ff:bind-c-typedef form is generated:


;; c-ex01.h:10 <3> 
;; struct servent {
;;         char* s_name; char** s_aliases; int s_port; char* s_proto; };
(bind-c-type servent (:struct
  (s_name (* :char))   ;; char* s_name
  (s_aliases (* :char))   ;; char** s_aliases
  (s_port :int)   ;; int s_port
  (s_proto (* :char))   ;; char* s_proto
  ))   ;; bind-c-type servent

;; c-ex01.h:17 <4> typedef unsigned long ulong;
(bind-c-type ulong :unsigned-long)  

Constants defined in the C source files are translated to ff:bind-c-constant forms.

;; #define BAR 17 
(BIND-C-CONSTANT BAR 17)     ;; 0x11  
;; #define XST_DATARCVD 6 
(BIND-C-CONSTANT XST_DATARCVD 6)     ;; 0x6  

;; #define SETRGBSTRINGA "commdlg_SetRGBColor" 
(BIND-C-CONSTANT SETRGBSTRINGA "commdlg_SetRGBColor")     

;; #define WM_DDE_FIRST 992 
(bind-c-constant WM_DDE_FIRST 992)     ;; 0x3e0  
(bind-c-constant WM_DDE_TERMINATE (+ WM_DDE_FIRST 1))     
(bind-c-constant WM_DDE_ADVISE (+ WM_DDE_FIRST 2))     

When the C file contains a macro definition that defines an alternate name for a declared function, the additional names appear in a :all-names keyword argument in the ff:bind-c-function form. The value is an alist of a symbol and foreign string pair for each name of the foreign function.

By default this list is sorted on the length of the foreign string. The first item in the list is used for the name of the primary lisp function defined by defforeign. The other names are used to define alternate macros with a bind-c-alternate form such as

;; #define AddAtom AddAtomA
(BIND-C-ALTERNATE ADDATOMA (&rest args) `(ADDATOM ,@args))

Generated C Output

When a C function is declared to receive a struct by value, we need to generate some new C program code because ACL only allows struct pointers as arguments. We create an intermediate C function that receives a pointer argument and passes the dereferenced pointer to the intended function. When this situation is encountered in the C source file, the following comment and definition are generated.


;; c-ex02.h:7 <5> void passHostEnt( struct hostent x);

;;NOTE: C wrapper needed to pass structure or union type
;;   hostent
;;   as argument.

(bind-c-function passHostEnt
     :unconverted-entry-name "ACL_passHostEnt"
     :c-return-type ("void")
     :return-type :void
     :c-arg-types (("struct" "hostent" "*"))
     :c-arg-names (x)
     :arguments ((* hostent))
     :prototype t
     )

Note how the foreign name in this case is not identical to the Lisp name of the C function.

The additional C definition is generated in the C output file as follows:


/* Wrapper function to dereference pointers to structure arguments. */
void  ACL_passHostEnt(  struct hostent * x)
{
  passHostEnt(*x);
}

When a C function returns a struct by value, a similar wrapper is generated to return a pointer to the structure in freshly allocated malloc memory.


;; c-ex02.h:9 <6> struct hostent ReturnHostEnt( int);

;;NOTE: C wrapper needed to return structure or union type
;;   hostent.

(bind-c-function ReturnHostEnt
     :unconverted-entry-name "ACL_ReturnHostEnt"
     :c-return-type ("struct" "hostent" "*")
     :return-type (* hostent)
     :c-arg-types (("int"))
     :c-arg-names (Arg0)
     :arguments (:int)
     :prototype t
     )

Generated C wrapper:


/* Wrapper function to return pointer to structure. */
int  ACL_ReturnHostEnt(  int Arg0)
{
  int ptr = (int)malloc(sizeof(struct hostent ));
  *((struct hostent *)ptr) = ReturnHostEnt(Arg0);
  return(ptr);
}

When the C function returns an unsigned value, we generate a Lisp wrapper to extract the value correctly.

Customizing Lisp Output

The macros ff:bind-c-function and friends are generated in the Lisp output file in order to allow convenient user customization of the actual foreign interface. The built-in definition in file cdbind.cl emits a ff:defforeign form formed by simply selecting the appropriate components of the ff:def-c-binding form. Users with their own foreign type layer may use other components of that form to generate a more specific foreign interface call.

The function ff::decode-intern is used exclusively to convert strings to symbols in the binder. It is a function of one argument, a string or symbol. If the argument is a symbol, the function assumes it is already converted and does nothing.

If the argument is a string, this function uses the following special variables (re-bound by build-c-binding) to determine how the conversion is done:


*decode-intern-hyphen*

   NIL - no effect
    T  - Insert a hyphen at every lower-to-upper transition and every
         case-sensitive-to-insensitive transition:

	MenuItemFromPoint  ==>  Menu-Item-From-Point
        Menu3              ==>  Menu-3
         
*decode-intern-dash*

   NIL - no effect
    T  - translate underscore characters to hyphens


*decode-intern-case*
   
   a readtable   - use the value of readtable-case for that readtable
   :READER       - use the value of readtable-case for *readtable*
   :PRESERVE     - keep the case of the string unchanged
   :DOWNCASE     - convert the string to lowercase letters
   :UPCASE       - convert the string to uppercase letters
   :INVERT       - if all the case-sensitive letters are of one case
                   switch them all to the other case, otherwise leave it be


*decode-intern-res*

   :error - signal an error if a name conflict occurs
   :index - try adding 0, 1, 2, ... to the end of the string until 
            the conflict is resolved
   a list - take suffix strings in order from the list and append 
            to the foreign until the conflict is resolved.  If the
            list runs out, signal an error.

If a different name translation scheme is desired, the function decode-intern must be redefined as required.

Warnings and Notes in Generated Output

Messages of the form

 ... ;;WARNING: ... 

are emitted when the binder detects a situation where the generated code may function incorrectly or when the binder is unable to generate any correct code at all.

Messages of the form

 ... ;;NOTE: ... 

are emitted when the binder generates additional Lisp or C code that may be needed to use the interface effectively.

In many cases, inspection of the generated code and specific application knowledge will reveal that the generated code is adequate. In some cases it may be necessary to modify the generated code.

The binder generates possibly incorrect code with a warning to prevent a possible cascade of subsequent warnings that might be caused by generating nothing. This might be the case if a type cannot be generated correctly

(ff:bind-c-constant "foo" "StrData")  ;;WARNING: C code expects wide string"

This warning is emitted when the C string is defined with the L modifier.


;; c-ex03.h:6 <7> typedef long long LongLong;

;;WARNING:  'long long' is implemented as a struct of 2 long!
(bind-c-type LongLong long-long)  

The binder defines and emits the following 64 and 128-bit types:

long long              -> FF:LONG-LONG
unsigned long long     -> FF:UNSIGNED-LONG-LONG
long double            -> FF:LONG-DOUBLE

These are defines as structs of two long or two double values. This definition has the correct storage size but may or may not have the correct alignment. In addition, none of these definitions have a numeric equivalent in ACL, and thus do not behave as numbers in Lisp code.

;;WARNING: Eval of above Lisp form resulted in error:
;;   Lisp error condition

This message is usually emitted following a foreign type definition. This result is likely to produce a cascade of subsequent error messages.

C Functions with Varying Number of Arguments


;; c-ex04.h:7 <9> extern int exec1( const char*, const char*, ELLIPSIS);

;;NOTE: Lisp args to this function will get default conversions only.
(bind-c-function exec1
     :unconverted-entry-name "exec1"
     :c-return-type ("int")
     :return-type :int
     :c-arg-types (("const" "char" "*") ("const" "char" "*") "...")
     :c-arg-names (Arg0 Arg1)
     :arguments nil
     )

This foreign function may be called with any number of arguments, but all the necessary type conversions must be implicit in the Lisp type of each argument.

When the C function requires a wrapper, only a fixed number of arguments may be passed through the wrapper. In that case we generate the warning of the form


;; c-ex04.h:9 <10> extern int exec2( LongLong, const char*, ELLIPSIS);

;;NOTE: C wrapper needed to pass structure or union type
;;   LongLong
;;   as argument.

/* Wrapper function to dereference pointers to structure arguments. */
int  ACL_exec2(  LongLong * Arg0, const char * Arg1)
{
  return(exec2(*Arg0, Arg1));
}


;;WARNING: This wrapper function will only pass exactly 2 arguments
(bind-c-function exec2
     :unconverted-entry-name "ACL_exec2"
     :c-return-type ("int")
     :return-type :int
     :c-arg-types (("LongLong" "*") ("const" "char" "*"))
     :c-arg-names (Arg0 Arg1)
     :arguments ((* LongLong) (* :char))
     :prototype t
     )

Reporting Problems

The C parser and parse-tree decoder we use is far from complete from the point of view of C language semantics. We have attempted to handle a large collection of commonly found cases, but many others are still possible. The structure of the parse-tree decoder is modular and extensible so that new cases may be added easily when necessary.

When a new unhandled case is encountered, and the :verbose argument to build-c-binding is T, we print a warning message of the form

;;WARNING Unknown STATEMENT
#|
:tag1
parse-tree-dump-1
:tag2
parse-tree-dump-2
...
|#

and continue. In most cases, if you send us

we may be able to create an extension to the parse-tree decoder in short order.

Technical Details

The C Grammar

The parser used in this tool is derived from the GNU C compiler. It uses the Bison grammar for C included with the distribution of GCC. The specific version of the grammar and parser can be obtained in the usual manner by calling our modified version of cpp with the -version switch.

C Macros

C macros are processed from the output of the GNU C pre-processor called with the -dM switch. This produces a list of #define lines that does not reflect the input order of the definitions. When dependencies between definitions can be determined, we order the corresponding Lisp definitions accordingly. Otherwise, the Lisp definitions are ordered alphabetically by name.

Macro definitions are parsed by our own ad hoc parser that recognizes the following patterns:

Some other possible error messages are:

;;WARNING: Multiple definitions of symbol

;;WARNING: Ill-formed macro

;;WARNING: SysValErr: sys-info-dump

Installing the Tool

The tool consists of two Unix executable files and 3 Lisp source files. The Lisp files are loaded in the following sequence:

   :ld loadcb
All three Lisp files are required in order to run the Binder.
Only one file, cdbind, is needed to compile and/or load
the output of the binder.  This file may also need to be modified 
if the output of the binder needs to be customized.

The following two variables must be initialized in loadcb.cl to the correct pathname for the two Unix executables:

ff:*c-parser-cpp*
ff:*c-parser-cc1*

Other variables:

ff:*default-foreign-symbol-package*
The setting of this variable is discussed above.
ff:*export-foreign-symbols*
When this variable is set to a non-NIL value, all symbols created in the 
above package are exported.  When this variable is non-NIL, foreign
symbols are referenced with single-colon notation from other Lisp
packages.  When this variable is set to NIL, double-colon notation 
must be used for all symbols in the foreign-symbol package.

NOTE: in the current implementation, only symbols passed to ff:bind-c-export are exported.

ff:*c-compiler-macro-names*
The value of this variable is a list of strings naming variables
that should be ignored by the interface generator.  These are typically
state variables for the compilation and do not affect the usage of the 
interface.