|
Allegro CL version 11.0 |
When the :explain
declaration (described in Help with declarations: the :explain declaration in compiling.html) is in effect during compilation, the compiler prints information about what it is doing. This information is labelled with abbreviations which identify what is being explained (boxing, types, calls, inlining, etc.) and provide additional information. For example, the lines:
;Tgen1:Examined a (possibly unboxed) call to +_2op with arguments:
;Targ2: symeval x type (single-float * *)
;Tinf1: variable-information: lexical: ((type (single-float * *)))
all relate to :types explanations (as indicated by the initial T).
This document lists all these labels and tells when they might come up and what they mean.
Because the explain formatting can sometimes be verbose, the explanations now place a "While compiling ..." message before each lambda expression it is working on, to separate the sections in a file or a function which has internal functions (e.g. flets or labels).
After discussing the labels, there are examples of compilations with various types of explanations enabled, see Examples using the :explain declaration.
Compiler explanations are printed in the following format:
;[Label] [details]
For example:
;Igen1: Arg0: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
Note this particular explanation runs over two lines. The same label starts each line.
The first letter of the label identifies the type of explanation:
Letter | Explain type |
B | :boxing |
C | :calls |
I | :inlining |
M | :tailmerging |
T | :types |
V | :variables |
The labels are abbreviations describing what the compiler is trying to do. In the subsections to this section, we list all these abbreviations and explain what they mean. Note the sample messages may contain format directives (e.g. ;Call2:Generated a non-inline self tail jump to ~s:
), indicating some value will be displayed when the actual message is printed.
See Boxing explanation for a definition of boxing. There is only one label for :boxing explanations, Bgen1:
;Bgen1:Generating a single-float box
;Bgen1:Generating a double-float box
;Bgen1:Generating a bignum box
For each of these messages, the compiler is noting that some consing will be done as a result of making a "box" to hold a result. To avoid consing for these situations, it is sometimes possible to propagate the generated value into a predefined box, i.e. a specialized array of element type single-float, double-float, or (signed-byte 32) or (signed-byte 64), depending on the width of the lisp. Note however, that if arr is declared to be type (simple-array single-float (*))
and the compiler sees a form like
(setf (aref arr n) <unboxed calculation>)
then a single-float box may still be generated, because the setf returns the value, and if the value is used (say, by returning) it will tend to be boxed. If the final destination of a calculation is to one of these specialized arrays, then it is better to ensure that the value is not used at the end of the function, e.g.:
(progn
(setf (aref a n) <unboxed calculation>)
nil)
The explanation generated by :calls all start with C. They are numbered.
;Call1:Generated a non-inline call to ~s:
The compiler is noting that a call is being generated to the function named in the message. After the function returns, execution continues in the caller.
;Call2:Generated a non-inline self tail jump to ~s:
The compiler is noting that a call is being generated to the function named in the message, which happens to be the function being compiled. The call is made by tail-jumping. The caller's frame is reused for the recursion, and it appears to a debugger that there is only one level of call to the function in question, no matter how many recursions are entered by tail-jump.
;Call2:Generated a non-inline non-self tail jump to ~s:
The compiler is noting that a call is being generated to the function named in the message, explicitly by tail-jumping. The caller is no longer on the stack, and it appears to a debugger that the caller's caller is calling the callee directly (though ghost frames can usually detect this situation).
;Call3:Generated a non-inline call to internal primitive ~s:
The compiler is noting that a call is being generated to the internal function named in the message. After the function returns, execution continues in the caller.
;Call4:Generated a non-inline non-self tail jump to internal primitive ~s:
The compiler is noting that a call is being generated to the internal function named in the message, explicitly by tail-jumping. The caller is no longer on the stack, and it appears to a debugger that the caller's caller is calling the primitive function directly (though ghost frames can usually detect this situation).
There are quite a few :inlining explanations. We describe each type in a subsection.
For this section and its subsections, all :inlining
explanations are identified immediately after their labels as either "Node" or "Arg~d". Thus if you consider a typical function call form:
(foo a b c)
then any examination of a's type is referred to as Arg0, examinations of b are referred to as Arg1, and examinations of c are referred to as Arg2. Any examinations of the whole result of the form are referred to as Node.
The Arg~d and Node identifiers can be positionally correlated to their types within the Targ and Tres messages, respectively. However, unlike :types
explanations, :inlining
examinations are not necessarily made in order - arguments will be examined in the most expedient way possible to get the best result, but that may mean that an argument is examined out of order, or two or more times, or not at all. Likewise for the node (the result of the form) might be examined before or after the arguments, or not at all.
These related to addition and subtraction.
;Iadd1: Node: can fixnums be ruled out? (failure to inline may be desirable so type checks can be removed):
Normally slow safe + and - operations check their arguments for fixnums inline, and if they are both fixnums, a function call is not needed. But if the possibility for the arguments to be fixnums is ruled out, then it would be more efficient to compile the code without the inline fixnum test, since we know that the test will always fail. Hence, a failure to inline is more desirable when the arguments are known not to be fixnums.
;Iadd2: Node: must check for fixnum at runtime.
If as in Iadd1 it cannot be proved that both arguments are not fixnums, then the fixnum test is needed, and the inlining will (unfortunately) succeed.
;Iadd3: Arg~d: only one arg needs checking at runtime.
When compiling excl::+_2op or -_2op (the two operand versions of + and -), some architectures will optimize the fixnum check if one of the two arguments is a fixnum constant. Whichever argument is the constant, the other argument will be noted as being checked. Note that the result of the operation must still be checked for overflow, in which case the actual function call might still be made. Not all architectures benefit from this optimization; at the time of writing only x86 and x86-64 architectures have it.
These related to the number of arguments.
;Iarg1: Node: looking for ~d args - got ~d
;Iarg2: Node: looking for at least ~d args - got ~d
;Iarg3: Node: looking for between ~d and ~d args, got ~d
In all three cases, the number of arguments does not match the number of arguments required for a particular section of inlining to take effect.
These relate to calls to ash.
;Iash1: Arg~d: looking for shift type to be subtype of (integer -31 31) - got type (integer -536870912 536870911).
;Iash1: Arg~d: looking for shift type to be subtype of (integer -31 31) - got type (integer -10 10).
;Iash1: Arg~d: looking for shift type to be subtype of (integer -31 31) - got type (integer 5 10).
ash does best when it can shift integers that are 32 bits or less (or 64 bits, on a 64-bit machine). Note that in all three of these examples the shift amount has been declared an integer, but in the first case, the shift amount is a fixnum (if it happens to be a large positive integer, the result would tend to be a bignum, or if it is a large negative-integer then the result would tend to be 0 or -1. In the second case, the shift amount is a reasonable size for opencoding, but since it crossed the 0 boundary and thus could either be positive or negative, an extra test would have to be generated if inlining is decided. In the last case, the shift amount is positive and small, so there is a class of values to be shifted that could be shifted without causing any consing and for which it makes sense to inline.
;Iash2: Arg~d: shift amount both negative and positive; must generate runtime test.
In one case for Iash1, the declared integer is both negative and positive (i.e. the type has the form (integer m n)
where (< m 0) => t
and (> n 0) => t
If branches can be established in outer code which deal separately with the negative and positive branches, then the test within the ash expansion can be eliminated for each branch.
These relate to calls to comparisons.
;Icmp1: Node: will check only for fixnums inline.
If nothing is known about the arguments to a comparison, it is still more efficient to compile inline a test that both arguments are fixnum, and then if the test fails, call the actual out-of-line function which will handle all other types. This message appears if no other combination of argument types allowed the comparison to be completely commpiled inline.
;Icmp2: Node: checking for comparison of an integer with a fixnum constant:
This message states that there is a posibility to compare an unknown integer value with a fixnum constant, which can be done completely inline. The general messages that follow will indicate what actual tests are being performed.:
These relate to calls to constants used for things like array dimensions.
;Icon1: Arg~d: looking for fixnum constant - got 0
;Icon1: Arg~d: looking for fixnum constant - got nil
;Icon1: Arg~d: looking for one of these constants: <list of desired values> - got <nil or result>
One or more constants are useful for certain inlining efforts. The first example notes that the fixnum 0 was found, and the second form means that no constant was found. The third form allows the compiler to select from several different constant values, with the result displayed.
;Icon2: Arg~d: checking that the constant value is 0 - got 1
array-dimension is only inlined when its argument is 0. If it is not, then the actual function is called to get the result.
These relate to calls to dynamic-extent.
;Idyn1: Consider declaring <var> dynamic-extent: (node: ~s).
The variable was not declared dynamic-extent, but the compiler determined that it might be useful to do so. Some let init forms which can generate stack-allocated lisp objects will do so if conditions are right, starting with a declaration of dynamic-extent for the let variable being initialized.
;Idyn2: Looking for stack-allocation for <var> (call to excl::.primcall):
If has been declared dynamic extent, and if the let init-form initializing is appropriate, then the compiler is saying that it will try to transform the form to one which is to be compiled as stack-allocated. Another "dyn" message will follow - either Idyn3 or Idyn4, based on success or failure. Typical forms which can be transformed are calls to cons, list, list*, make-list, and make-array.
;Idyn3: - transformed to ~s
The compiler has succeeded in transforming an allocating form to a stack-allocating form. The node of the new call (which will include the node's name) is also shown. See Idyn1.
;Idyn4: - couldn't transform ~s to stack-allocated form.
For one or more reasons, the compiler could not transform a consing form to a stack-consing form. See Idyn1 for background. In particular, stack consing requires constants in many cases; for example, neither (make-array n) nor (make-list n) can be stack-allocated because the array is of variable size. Also, not all make-array options have corresponding stack-allocatable functionality; for example, specifying a :weak option will result in no transformation, as will specifying :initial-contents (although it is possible to specify :initial-element for many element-types). Also, some :element-type specifications cannot be transformed to stack-allocations.
;Idyn5: - Possible dynamic-extent transformation of a requires trusting declarations.
The Idyn5 message may be seen if declarations are not being trusted and a form is a candidate for dynamic-extent optimization. Idyn1 (described above) already displays a message if the optimization is missed due to the lack of dynamic-extent declaration, but a dynamic-extent declaration is not enough; declarations must also be trusted (so trust-declarations-switch must be true). So for the following code:
(defun clear-the-stack ()
(declare (:explain :types :inlining)
#+ignore (optimize speed))
(let ((a (make-array 1000 :initial-element nil)))
#+ignore (declare (dynamic-extent a))
(identity a))
nil)
You might get one or both of Idyn{1,5}:
;Idyn1: Consider declaring a dynamic-extent:
;Idyn1: (node: #<call .primcall new-simple-vector @ #x10013832c2>).
;Idyn5: Possible dynamic-extent transformation of a requires trusting declarations.
;Idyn5: (node: ##<call .primcall new-simple-vector @ #x10013832c2#>)
The former is given if the dynamic-extent declaration is missing, and the latter if declarations are not trusted.
These relate to calls to equality tests.
;Ieql1: Arg~d: looking for atomic type wrt eql:
;Ieql1: Arg~d: looking for atomic type wrt equal:
If eql or equal receive at least one argument for which an eql/eq comparison is equivalent to an eq comparison, then the call to eql/equal can be changed to a call to eq. This is generally true for fixnums and symbols, and for eql, it is also true for lists.
;Ieql2: Node: must call excl::eql-not-eq.
;Ieql2: Node: must call excl::equal-not-eq.
If neither argument to eql/equal are of a type for which the call can be transformed to a call to eq, then out-of-line calls must be generated. Note that even if such an out-of-line call is required, an inline eq test is still done to determine if the two arguments are identical, and then the out-of-line call is to a function which assumes that the arguments are not eq. Note also that although an out-of-line call is made, there is some inlining done, so the compiler considers the call to be successfully inlined (the only way to fail to inline calls to eql/equal is to declare it to be notinline).
These relate to calls to funcall.
;Ifnc1: Node: must primcall checking funcall.
When calling funcall, there are several options. The function argument can be declared as a symbol or a function, in which case inline instructions to call through the respective types of objects is done. Otherwise, (assuming funcall is not being declared notinline) if the type of the function argument is unknown, or if the comp:verify-funcalls-switch returns true, then a "checking funcall" is compiled, which is a runtime-system call to funcall_check, which figures out what to do. Funcall_check knows how to call a symbol, a function object, a flavor instance, and even a list (as an interpreted function). It is of course slower than a fully-inline funcall, but is still considered by the compiler to be a successfully inlined call.
These labels relate to calls to foreign functions which request a direct foreign call via the :call-direct
option to ff:def-foreign-call.
;Ifor1: Arg~d: looking for direct foreign call to ~s.
If the function name has been defined with :call-direct
, then the foreign-call specification and the call itself are being examined to see if this optimization can be performed.
;Ifor2: Node: cannot convert <type> into immediate <immed> return value
If the :returning
specification given in the ff:def-foreign-call is not appropriate for the type of the returned value, the optimization cannot be performed.
;Ifor3: Node: argument count ~d doesn't match argspec <argspec>.
If the argument count of the call isn't appropriate for the ff:def-foreign-call, then the optimization will not be performed.
;Ifor4: Arg~d: looking for vector or one of <type list> but found <type>.
The types of each argument must match those of the Lisp arguments. If they don't, then the optimization is not performed.
;Ifor5: Node: return type ~a cannot be converted.
The given return type is either on the list of bad return types for a direct call, or else is not on the list of possible return types for a direct call.
;Ifor6: Node: not wanting to see any of ~{ ~s~} but found ~a
The type of the argument was found to be in the list of bad foreign-arg types.
There are general messages.
;Igen1: Arg~d: checking for implied unboxable type (floats, or machine-integer):
;Igen1: Node: checking for implied unboxable type (floats, or machine-integer):
When a node or an argument is compiled, the compiler checks to see if it is appropriate or advisable to try to compile the node in an unboxed manner. If a node can be compiled in an unboxed manner, then it is likely that some boxing (see Boxing explanation) must be done later on, but if that boxing occurs much later, or if the boxing can be done into a pre-supplied box, then it is always desirable to perform the compilation unboxed. There are three types of unboxed value that can be represented by the compiler: single-float, double-float, and machine-integer (aka (signed-byte 32) in a 32-bit lisp, and (signed-byte 64) in a 64-bit lisp).
Note that a machine-integer is only compiled for if the type of the node is too large for a fixnum and at least as small as a (signed-byte 32/64). It may seem as though this restricts the range of unboxed integers that are likely, but such compilation does occur more often than might be originally thought. It is always possible to force an unboxed compilation by declaring the node to be (signed-byte 32) or (signed-byte 64).
The results of Igen1 are always given by an Igen2 message.
;Igen2: Arg~d: type is not trusted
;Igen2: Node: type is not trusted
;Igen2: Arg~d: type is not unboxable
;Igen2: Node: type is not unboxable
;Igen2: Arg~d: type is unboxable as <type>
;Igen2: Node: type is unboxable as <type>
One of these three basic messages will always follow an Igen1 message. The first pair informs that the compiler cannot trust the declarations, thus the node or argument cannot be compiled as unboxed. The second pair says that declarations are trusted, but that the type of the node or argument is not known or is one which cannot be compiled unboxed. The third pair of messages indicates success with unboxing, and will include the type of the node. (See Boxing explanation for a definition of boxing.)
;Igen3: Attempt to inline an unboxed call to ~s while not trusting declarations:
;Igen3: Attempt to inline a boxed call to ~s while not trusting declarations:
;Igen3: Attempt to inline an unboxed call to ~s while trusting declarations:
;Igen3: Attempt to inline a boxed call to ~s while trusting declarations:
Whenever the compiler sees a call expression (i.e. node) it will try to compile it based on its ability to be inlined and based on the surrounding context:
If the surrounding code is being compiled in an unboxed manner, an attempt to compile this node unboxed is made, and if it succeeds, the code is most efficient. If (in an unboxed context) this node cannot be compiled unboxed, then an attempt to compile it boxed is made, and whether or not the inline attempt is successful, the result ends up needing to be unboxed (for the context).
If the surrounding code is not being compiled in a boxed manner, then care is taken to see if this node might be compiled in a boxed manner. Two attempts at inlining are performed:
An attempt is made to compile the node unboxed, and then to re-box the result. If successful, the second attempt is skipped:
An attempt is made to compile the node boxed. Whether or not the inlining is successful, the result need not be boxed, because it already is boxed.
Additionally, you are told whether declarations are or are not being trusted.
(See Boxing explanation for a definition of boxing.)
;Igen4: Inline attempt failed.
Inline attempt succeeded with a hybrid of inline code and a call.
;Igen4: Inline attempt failed, but succeeded in removing hybrid test code.
;Igen4: Inline attempt succeeded.
;Igen4: Inline attempt succeeded with tail-merging.
;Igen4: Inline attempt succeeded with primcall.
;Igen4: Inline attempt succeeded with tail-merged primcall.
;Igen4: Inline attempt succeeded with (primcall to ~s).
;Igen4: Inline attempt succeeded with short array.
;Igen4: Inline attempt succeeded with test between long or short array.
For every attempt to inline (both boxed and unboxed) a success or failure message is displayed. Success in inlining is not necessarily the highest goal; it may be that an out-of-line call is more efficient because any inline code generation is just a waste of time, due to what is known about the arguments to the call. You will often see the Inline attempt failed, but succeeded in removing hybrid test code qualified "failure" message when a "failure" could be viewed as a success.
The Inline attempt succeeded with a hybrid of inline code and a call. messages means that some cases (for example, args are determined to be fixnums) suceeded in inlining while other cases resulted in a call out. This message often means more declaration (if possible) would improve things. In particular, if you expected an earlier declaration to propagate and it (for whatever reason) did not, you might see this message. Adding declarations often improve things in that case.
(See Boxing explanation for a definition of boxing.)
If inlining attempts fail, then a straight out-of-line call to the function is generated. It is possible for an unboxed inlining to fail, but the consequent boxed inlining compilation to succeed. There are two failure messages: that inlining failed and that inlining failed but hybrid code was removed. (See Boxing explanation for a definition of boxing.)
Hybrid code is a mixture of inline code (just instructions) and out-of-line code (i.e. a call to a function). Normally we want to make a decision between inline code and a call, but sometimes there is something that can be done inline that performs part of the necessary functionality, combined with a test to see if this inline code is appropriate, and if not, the call is made to the function. For example: the inline expander for excl::+_2op (the two-operand-add) will normally generate hybrid code that tests that both operands are fixnums, and then perform the add and then check for overflow. If the operands are not both fixnums, or if there is an overflow after the add, then the excl::+_2op function must be called to perform the addition. On the pro side for hybrid code is that if it is appropriate for the operand types, it is very fast. On the con side for hybrid code are two issues: first, when the inline code is not appropriate, extra tests are made to determine that the out-of-line call should have been made; and second, if the hybrid code is appropriate 100% of the time, then the tests are also extraneous, and the out-of-line code would never be called. If at compile time the types of the arguments can be narrowed down, then the hybrid tests can be removed and the appropriate more-specific code can be generated without the tests. In the excl::+_2op example, if the operands can be determined never to be both fixnums, then the hybrid test code can be removed and a simple, out-of-line call to excl::+_2op can be generated. Or, if the operands can be determined to be fixnum and the result of the addition can also be determined to be fixnum, then the hybrid tests can also be pulled out and the add operation done in one instruction, without any out-of-line calls. The second failure message says that even though the call is not inlined, the test was determined not to be necessary (so the call is always made) and so was removed.
Success in inlining may come with one of many modifiers; if no modifiers are present, then the inlining was simple and the call was replaced directly with inline code. If "with tail-merging" is present, then the call has in some way become a jump out of the function currently being compiled. If any of the "primcall" modifiers are present, then the inlining is in fact a call to a runtime-system function.
Two special success modifiers have to do with short-arrays and long (i.e. normal) arrays. If an array operation can be inlined, but must distinguish between short arrays and normal arrays, then the success might be mitigated by the need for an inline test on the array type, to see if it is short or long. This extra test consumes run-time, and should be avoided unless it is the goal of the code to make the test. The exact code generated can be controlled by the kind of declaration used:
(simple-array * *)
indicates a normal or "long" array(excl:short-simple-array * *)
indicates a short array(or (simple-array * *) (excl:short-simple-array * *))
indicates either short or long, and a run-time test will be made to determine which code path is taken.The type (excl:dual-simple-array * *) is a deftype which expands to (or (simple-array * *) (excl:short-simple-array * *)) for notational convenience.
;Igen5: Node: checking for notinline declaration for ~s: none
;Igen5: Node: checking for notinline declaration for ~s: present
Before any inlining can be performed, the presence of a notinline declaration for this function must be tested. If there is one, either in the global environment or in a lexical environment, and that notinline declaration is not shadowed by an inline declaration, then the inlining will immediately fail. If no such unshadowed notinline declaration is present, then the inlining attempts may proceed.
;Igen6: Checking compiler:~s-switch for ~s (got ~s) - succeeded/failed
At various places in the compiler, the many compiler switches are tested, to see if they return true or false. In this message, the name of the switch is given, as well as an indicator of the desired return value (t or nil for true or false, respectively) and also the actual result and whether that constituted success or failure. In this context success is required in order for the inlining attempts to proceed, and so success is presumably desired. If the switch test fails, the inlining attempt itself may still succeed, but without the particular inlining that would have been tried if the switch had been successful.
;Igen7: Call to ~s elided because it is side-effect-free and unused.
If built-in inliners are present for a function which does not produce side-effects, but the result of that function is never used, then the function call to the function itself can be elided. Note that the arguments are always compiled, so that if they have any side-effects then the correct effect is achieved. This message indicates that such a side-effect-free function has been removed as a reult of its compilation.
;Igen8: Unboxed result must be boxed - consider expanding scope of unboxed forms.
This message alludes to the fact that a box has had to be consed, and that it may be possible to forego that boxing if some of the code around this code could be compiled unboxed. (See Boxing explanation for a definition of boxing.) For example, consider the following:
(defun foo (a b)
(declare (optimize speed)
(single-float a b)
(:explain :inlining))
(+ a b 1.0))
There are two excl::+_2op nodes in this compilation, and they are both successfully compiled in unboxed manner. However, the result of the line of computations must be boxed before returning from foo. But what if the result of the calculation were not returned from foo, but instead had been stored into a pre-allocated box, and nil returned instead? Consider this modified example:
(defun foo (a b res)
(declare (optimize speed)
(single-float a b)
((simple-array single-float (*)) res)
(:explain :inlining))
(setf (aref res 0) (+ a b 1.0))
nil)
Note now that the Igen8 message disappears, because the store into the single-float array satisfied the need for the boxing of the result of the calculation (because the single-float array itself serves as a box), and no floats are being returned from foo.
Note well: The nil
return is necessary; if it were absent, then the result of foo would be the result of the (setf aref), and since the setf returns as its result the last value stored, the unboxed value would have had to be boxed in order to return it, even though it had successfully stored unboxed into the array.
It is possible, using an unofficially documented feature called "immediate args", to pass arguments to and return a value from a function unboxed, which allows a further increase in the scope of unboxed compilation. If this unofficial documentation is desired, contact Franz Inc.
;Igen9: No info available to inline ~s as boxed
If there is no built-in inliner for the call, then this message is displayed. Note that we do not support user-declared inlining via the inline declaration; users should define a compiler-macro for the function instead.
These relate to calls to "high order functions".
;Ihof1: This architecture has no instruction for ~s.
Some high-order-functions are implemented directly in some architectures. In particular, a floating-point abs is implemented in most architectures as a single instruction, and sqrt is implemented in several. On the other hand, only the x86 architecture implements sin, cos, and tan as single instructions, or exp and log as short sequences of instructions, as they are implemented on their x87 floating point units. Note that the AMD64/EM32T architecture is not considered an x86 for these purposes; even though they do have x87 units, they are not part of the standard runtime architecture and thus we do not make use of the x87 unit on these versions of Allegro CL.
If the function being called is a high-order-function supported by any architecture, but is not one supported by this architecture, then this message will be given, if other conditions are right.
;Ihof2: Arg~d: must be a float type.
The floating point high-order-functions are optimized for float arguments. If an unboxed inlining would have tried to compile one of these with a machine-integer argument, then this message appears and the unboxed inlining fails. A boxed inlining might succeed later but there is no guarantee. (See Boxing explanation for a definition of boxing.)
;Ihof3: Arg~d: type <type> is out of expected range:
Some high-order-functions are implemented mostly as single instructions in the architecture, but only match Common Lisp's requirements within a certain range of arguments. For example, cl:sqrt returns a complex value when the argument is negative, but the sqrt instruction available on many architectures signals or returns a NaN. On the x86 (x87) architecture, accurate results are not guaranteed beyond a certain range of argument values. Thus, in order for the inlining to succeed for one of these high-order-functions, the argument types must be within the stated ranges.
It is possible, using an unofficially documented feature called "immediate args", to pass arguments to and return a value from a function unboxed, which allows a further increase in the scope of unboxed compilation. If this unofficial documentation is desired, contact Franz Inc.
These labels relate to calls to immediate-args functions.
;Iimm1: Arg~d: looking for immed-args call to <functio>.
If the function name has an immediate-args property, then it and the call are being examined to see if this optimization can be performed.
;Iimm2: Node: cannot convert <type> into immediate <immed> return value.
If the immediate-args specification for the return value is not appropriate for the type of the return value, the optimization cannot be performed.
;Iimm3: Node: argument count <number> doesn't match argspec <argspec>.
If the argument count of the call isn't appropriate for the immediate-args specification, then the optimization will not be performed.
;Iimm4: Arg~d: looking for one of <type list> but found <type>.
The types of each argument must match those of the arguments of the call. If they don't, then the optimization is not performed.
These relate to calls to length.
;Ilen1: Arg~d: checking that type is explicitly non-list:
;Ilen1: Arg~d: checking that type is explicitly listp:
For a call to length, if it is not known explicitly that the argument is or is not a list, then run-time code must be generated to test to see whether the argument is a list, and inline code must be generated to perform the length operation for both cases. This message is preparing for some tests to see if this run-time test and thus one set of inline code can be elided.
;Ilen2: Node: using fast cdr access.
In a lisp which has :verify-car-cdr on the features list, if the verify-car-cdr-switch is returning nil, then the list-walking that is done to determine the length of the list is performed by single-instruction cdr operations. This is dangerous if the list is dotted, because there is no protection against trying to take the cdr of a non-list (the cdr of the dotted list).
;Ilen3: Node: must use call to qcdr.
In a lisp which has :verify-car-cdr on the features list, a safe walk of the list argument to length has been selected. On the x86 (and some amd64 versions) this means that cdr operations are performed by calling the "qcdr" runtime-system function, which is slighty slower than using a single cdr access instruction, but it is safe.
;Ilen4: Node: must check cons type for cdr access.
In a lisp which has :verify-car-cdr on the features list, a safe walk of the list argument to length has been selected. On Power architectures (AIX, MacOSX on PowerPc, and YellowDog PPC linux) this means that inline code is generated to determine if the object being cdr'd down is indeed a cons, and an error is generated at run-time if not.
These relate to multiply/accumulate instructions.
;Imac1: Node: trying a multiply/accumlate instruction combination:
Some architectures have "multiply/accumulate" instructions. These are three-argument instructions which multiply the first two arguments and then to that result either add or subtract the other argument. Some of these architectures' multiply/accumulate instructions are appropriate to compilation by Allegro CL. If so, then this message is displayed and arguments checked whenever (+ (* op1 op3) op2) or (- (* op1 op3) op2) is seen.
;Imac2: Node: - multiply/accumlate attempt succeeded.
;Imac2: Node: - multiply/accumlate attempt failed.
One of these messages appears whenever Imac1 is seen. Either the operation succeeded or failed, based on what was given as arguments.
These relate to array references.
;Iref1: Arg~d: looking for (simple-array type ~d) where type is one of <element-types> - got <type>
;Iref1: Arg~d: looking for (simple-array type ~d) where type is one of <element-types> - failed
This message states that an array access is looking for a simple-array of some set of element-types and of a specific dimensionality. If successful, the actual type that was found is stated. In order to inline an array access, it may be declared as a short-simple-array, a simple-array (i.e., a long simple-array) or a dual-simple-array. Any of these three kinds of declarations will cause the inlining to continue, although if the array is dual-simple-array, a run-time test for short-ness must be included and two accesses code segments generated (only one of which will be taken for any given array). Thus it is more desirable to specify either simple-array or short-simple-array, since less code and testing is generated. The dimensionality is either a specific integer or * if it doesn't matter.
[general array index calculation: A multi-dimensional array has as its underlying implementation a simple one-dimensional array, large enough to hold all of the index combinations of the array indices. For example, a (simple-array t (2 2)) will have as its base a (simple-array t 4), and a (simple-array character (3 3 3)) will have as its base a (simple-array character 27). The final index is calculated by multiplying successive indices by the next dimension over and adding the next index, until all indices are exhausted. This calculation might be done at run-time or at compile-time, depending on how much information the compiler has.]
;Iref2: Arg~d: looking for all dimensions declared for faster indexing - failed
;Iref2: Arg~d: looking for all dimensions declared for faster indexing - got (1 2)
In any array access, if the dimensions of an array are known then faster code can be generated, since at least some of the indexing can be calculated at compile-time. If this test succeeds the message will show what dimensions it saw declared for the array. [see general array index calculation]
;Iref3: Node: must multiply actual dimensions and indices at runtime.
If Iref1 fails, then some multiplication must be done at runtime. The compiler generates the multiplication code to do this, and gives this message. [see general array index calculation]
;Iref4: Checking for all array index args to be fixnum constants
;Iref4: Checking for all array index args to be fixnum constants - failed
If all of the index args are fixnum constants, then compile-time calculations can be done. If not all are constants, the compiler drops down into the Iref5 section to see how many constants it can find. [see general array index calculation]
;Iref5: Node: checking if there are any constant index values to avoid multiplication
This message will be succeeded by any number of Icon1 messages followed by an Iref6 or an Iref7 message; if the first few arguments to the array access are constants and are useful to calculating the array index at compile-time, then this message will show how many such useful constants were found. [see general array index calculation]
;Iref6: Node: first index values not constant - must multiply at runtime.
This message will be preceded by an Iref5 message and a single failed Icon1 message. It indicates that because the first index was not a constant, it was not possible to pre-calculate any portion of the final index at compile-time. [see general array index calculation]
;Iref7: Node: found ~d useful index constant~:p.
This message will be preceded by an Iref5 message and any number of Icon1 messages; if the first few arguments to the array access are constants and are useful to calculating the array index at compile-time, then this message will show how many such useful constants were found. [see general array index calculation]
These relate to table-driven case (defined below).
;Itab1:Attempt to optimize table-driven case.
When a case macro's keys are all fixnums close together or t
or otherwise
, then the case form can be optimized to what is known as a "table-driven case" or "table case" for short. In this situation the keyform is calculated and the proper match key is selected through a jump table, rather than by successive comparisons and conditional branches. This also results in a constant dispatch time for as many as 256 possible values for a case, rather than having to rely on placing the keys that are most likely at the beginning of the case form so they'll be tested sonner. Unfortunately the keyform calculations represent overhead that is not insignificant. In order to reduce the overhead of testing for overflows or fixnums, some architectures make use of a new compiler switch - trust-table-case-argument-switch, which only fires at speed = 3 and safety = 0 (because it is a non-safe optimization and thus relegated to code which has been carefully tested already). This message signals the start of the attempt to peform this optimization. Two things must happen in order for this optimization to complete: speed must be 3 and safety 0, and the type of the key-form must be declared to be (integer
Warning: This optimization creates the possibility for "correct" code to run incorrectly, if the case macro has a t/otherwise clause; note that the t/otherwise clause specifies action for any key-form value that is not one of the key values, even if that value is not even an integer. With this optimization such non-integer or out-of-range values can cause gross trouble. However, since it is being compiled with unsafe code, and a type of (integer x y) is also required, any violation of that type constitutes undefined behavior by the lisp, and the unsafe-ness is justified. The only valid values that the t/otherwise clause should see are those which do not match any keys but which are within the (integer x y) range.
;Itab2:looking for test type range to be ~d max - got ~d.
Each architecture has a maximum spread that all keys can be in order for a table-case to be performed (see Itab1). This statement shows that architecture-dependent value and the range specified by the key-form's type. If the key-form's type is too large, the optimization will fail.
;Itab3:looking for match range to be subtype of ~s - got (integer ~d ~d).
When it is determined by Itab2 that a table-case's key-form has an acceptable type range, the range of actual keys is compared to the declared type of the key-form (the keys are gathered and all holes filled to form a type of the form (integer min max) where min is the smallest key and max is the largest. If the key range is not a subtype of the declared key-form's type, the optimization fails.
;Itab4:Optimization attempt succeeded. Table case will compile more efficiently if supported.
This message is shown when all high-level criteria are satisfied for a table-case (see Itab1, Itab2, Itab3). If the architecture supports the optimization, then it will be implemented, otherwise it will be silently ignored and the safe table-driven case will be compiled.
These are general messages about types.
;Ityp1: Arg~d: looking for <type> - failed
;Ityp1: Arg~d: looking for <type> - got <normalized subtype>
;Ityp1: Arg~d: looking for one of <list of types> - failed
;Ityp1: Arg~d: looking for one of <list of types> - succeeded
;Ityp1: Arg~d: looking for (not fixnum) - got (double-float * *)
These general messages appear whenever the compiler is looking specifically for a value to have a certain type or types. The first two cases are looking for a single type - if successful the normalized-type is shown (see normalize-type) the third and fourth forms are looking for the value to be one of a list of types. The actual type found is not shown, but can be seen in a Tres1 message when :explain :types is enabled. The last message is similar to the first two; a single type is being looked for, but the type is of the form (not
;Ityp2: Node: checking that this node is purely <type>
Many times an expression of the form
(op arg1 arg2) -> result
can be optimized if all of arg1, arg2 and result are of the same type. Float values are handled by Ityp3, but fixnums and machine-integers (where the compiler might not be able to assume that the result is of that type, even if both arguments are of that type) are handled by this message. Ityp1 tests will be done on each of the arguments and on the node (the result of the expression). Note that the compiler cannot assume that two fixnums added together will result in a fixnum, except under special circumstances or under unsafe compilations where comp:declared-fixnums-remain-fixnum-switch returns true. Thus,
(let ((a x) (b y))
(declare (fixnum a b))
(+ a b))
May fail to be "purely fixnum", but
(let ((a x) (b y))
(declare (fixnum a b))
(the fixnum (+ a b)))
will always be purely fixnum (as long as declarations are trusted). If the Ityp1 messages after this one succeed, then the optimization may also succeed.
;Ityp3: Node: checking for only <type> args (including coercible constants):
For float values, the compiler can infer the type of the result of an expression, based on floating-point contagion of the types of the arguments. Some Ityp1 messages will be shown along with their results, and if successful, allow the optimization to succeed.
These relate to successful tail merging.
;Merg1: merged a call to ~s as a self-tail-jump
This message indicates that a call to the current function being compiled was changed to a jump to the beginning. The normal calling sequence is short-circuited, so arguments are not saved and tracing/stepping/call-counting will not be performed on this "call".
;Merg2: merged a call to ~s as a non-self-tail-jump
This message indicates that a call to a function was changed to a jump to the normal lisp function entry code. The function being called might be the same as the one being compiled, but the call-turned-jump is performed as if the call were to another function (i.e. the jump is through the name, the arguments are saved if argument-saving is on, and trace/step/call-counting all will work for this particular call).
These relate to unsuccessful tail merging.
;Mnot1: failed to merge a call to ~s as a self-tail-jump because function is closed-over
This message states that tail merging was thwarted because the function being compiled/called is creating a closure over some variables, which makes it not conducive to tail-merging; the function that is a closure is not complete without its closure environment, which is set up as a part of a normal function call. Note that the self-call might still be merged as a jump in a non-self manner (where the closure environment is set up by a jump through the normal function calling sequence).
;Mnot2: failed to merge a call to ~s as a self-tail-jump because function is declared notinline
This message states that tail merging was thwarted because the function being called was declared notinline (the notinline declaration may or may not have been in effect at the start of the compilation of the function). The compiler takes the notinline declaration to be a request to treat an expression with that operator as a simple function call. This allows a call to foo within foo to be traced, etc. even though the call would otherwise have been mergable.
;Mnot3: failed to merge a call to ~s as a self-tail-jump because unstructured stack allocations are present in the calling function
This message states that tail merging was thwarted because the function being compiled has unstructured stack-allocations. When objects are allocated in that unstructure way, it is impossible to tell whether they are alive or dead at the time. Unless it could be proved that these stack-allocated values are not passed in as arguments to the call, it can thus not be proved that the stack is not needed for the function being called again. Compare with Mnot9.
;Mnot4: failed to merge a call to ~s as a non-self-tail-jump because function is specially listed as not tailmergable
This message states that tail merging was thwarted because the function being called is used for debug purposes and is listed as one for which tail-merging should not be done. This guarantees that the stack will have enough information in it to do effective debugging of any problems.
;Mnot5: failed to merge a call to ~s as a non-self-tail-jump because tail-call-non-self-merge-switch is returning nil
This message states that tail merging was thwarted because the compiler switch that controls non-self tail-merging is disallowing the merging to take place.
;Mnot6: failed to merge a call to ~s as non-self-tail-jump because of
machine-dependent reasons (such as register limits)
This message states that tail merging was thwarted because the architecture didn't allow it. This message is usually given when the number of arguments is greater than both the number of arguments passed in registers on this architecture and also greater than the number of arguments that the calling function received. The reason this is important is: the tail merging operation will "take over" the stack space alloted to the calling function, for the purpose of whatever stack space the callee needs. The caller's caller allocated stack space for all arguments that the caller will use, but if the callee must receive more arguments than that, there is not enough stack space.
For example:
(defun foo (x1 x2)
(bar (x1 x2 1 2)))
(defun bar (w x y z)
(bas w x y z 10 20))
In this example, when compiled, foo will allocate at least slots for 4 arguments. On some architectures, such as Powerpc, it will allocate slots for 8 arguments, because the first 8 arguments are always passed in registers and the allocation always shadows the register set. On the x86, which passes only 2 arguments in registers, there are only exactly 4 slots allocated. But when bar is compiled, there must be at least 6 slots allocated for arguments. On the Powerpc, this is not a problem, because it is already known that bar's caller has allocated at least 8 aarguments, and so there is room for at least 6 arguments for the tail-merge. But for the x86, the compilation of bar doesn't know how much stack space bar's caller allocated (in this situation, it was only 4 slots) so it can't assume that enough space was allocated to accomodate the call to bas. Thus the merge will fail on x86 but will succeed (or, at least, not fail for this reason) on Powerpc.
;Mnot7: failed to merge a call to ~s as a non-self-tail-jump because there are catches in the calling function
This message states that tail merging was thwarted because there are active catches at the point of the tail call, which means that the catch-frame must be unlinked on the way back from executing the callee. Note that if a catch was entered but then exited again, the tail-merging is still possible.
;Mnot8: failed to merge a call to ~s as a non-self-tail-jump because the call is an applyn call
This message states that tail merging was thwarted because an applyn call (see doc on applyn in Stack consing, avoiding consing using apply, and stack allocation in compiling.html) requires the stack to be a specific format.
;Mnot9: failed to merge a call to ~s as a non-self-tail-jump because unstructured stack allocations are present in the calling function
This message states that tail merging was thwarted because the function being compiled has unstructured stack-allocations. When objects are allocated in that unstructure way, it is impossible to tell whether they are alive or dead at the time. Unless it could be proved that these stack-allocated values are not passed in as arguments to the call, it can thus not be proved that the stack is not needed for the new function being called. Compare with Mnot3.
;Mnota: failed to merge a call to ~s as a non-self-tail-jump because of an unknown reason
This message states that tail merging was thwarted because there is an internal compiler inconsistency. This message should never be seen. Please report if it is seen.
;Mnotb: failed to merge a call to ~s as non-self-tail-jump because structured stack allocations are not dead in the calling function
This message states that tail merging was thwarted because the function being compiled has structured lifo (last in, first out) stack-allocations. When objects are lifo allocated, the compiler knows that some of the objects are still alive. Unless it could be proved that these stack-allocated values are not passed in as arguments to the call, the stack is needed. Compare with Mnot9.
;Mnotc: failed to merge a call to ~s as self-tail-jump because
structured stack allocations are not dead in the calling function
This message states that tail merging was thwarted because the function being compiled has structured lifo (last in, first out) stack-allocations. When objects are lifo allocated, the compiler knows that some of the objects are still alive. Unless it could be proved that these stack-allocated values are not passed in as arguments to the call, the stack is needed. Compare with Mnot3.
These relate to arguments.
;Targ1: call to ~s
Within Targ1, arguments are examined to see what they are. This message indicates that a "call" node was found; i.e. the argument to the call is a call itself.
;Targ2: symeval ~s
;Targ2: symeval ~s (a special)
;Targ2: symeval ~s (an instance var)
;Targ2: symeval ~s (closed over)
Within Targ2, arguments are examined to see what they are. This message states that a symbolic variable was found, of the kind shown. Note that a symeval node that has as its type only one possible value is often treated by the compiler as a constant. So, for example, (let ((x 1)) (foo x)) can treat x as if it were the constant 1 - the type propagation assigns the type range (integer 1 1) to the value of x, and since it is only assigned in that spot, its value can be considered constant.
;Targ3: constant ~s
Within Targ3, arguments are examined to see what they are. This message indicates that a constant was seen.
;Targ4 a <(some other kind of node)>:
Within Targ3, arguments are examined to see what they are. This message indicates that the node was something other than a call, symbolic-variable, or constant. The kind of node is shown as an indicator.
These relate to general calls.
;Tgen1: Examined a call to ~s with arguments:
;Tgen1: Examined a (possibly unboxed) call to ~s with arguments:
This message starts the type examination of each call node. The node consists of arguments to an operator and the node itself, which is the result of the operation. Further messages will state what has been found. It is in the past tense because the examination has already been done and the reporting is just showing what was found. This is in contrast to how other :explain options work, since the other operations report what is being seen as progress is being made.
This message will be followed by a number of Targ messages (see [above][(#targ-label-3)), one for each argument in the function call being examined. Each complete Targ message will correspond to arguments in the function call, in order from left to right. Following the arguments, a Tres message (see below) might be given if a result is present. This ordering can also be correlated to Arg~d and Node sublabels in :inlining explanations (see :inlining (I) labels).
These relate to calls to function or variable information.
;Tinf1: function-information: ~s
;Tinf1: variable-information: ~s
If in addition to regularly propagated type information, there is information available about a function or variable within the current lexical and/or dynamic environment, this message is displayed with that information. The propagated type and the environment type information are used in concert by the compiler which looks for as much information as possible in order to provide as much optimization as is possible.
;Tres1: which returns a value of type ~s
;Tres1: which returns a value in fixnum range of type ~s
This message indicates the type of the node being examined by the :types
explanation; the node describes the value of the form. In the first form, the type is given simply as a normalized type. The second form is similar to the first, except that the compiler has also recognized that the type is a subtype of the fixnum type. This is useful information, because it is sometimes hard to know whether an integer range is within fixnum range or not.
These messages are about data stored in registers.
;Vreg1: Variables stored in general data registers: ~s.
This message indicates that the list given names variables that are placed into general data registers. Not all architectures generate this message, because it would clutter up the output and cause it to become confusing. Ultra-temporary values which do not have variable names are not listed.
;Vreg2: Variables stored in permanent general registers: ~s.
This message indicates that the list given names variables that are assigned permanently (at least, during the life of the variable) to a particular register. These would be caller-saves registers, which might either be saved on the stack or via register windows during a function call. Ultra-temporary values which do not have variable names are not listed.
These messages are about where floating-point data is stored.
;Vflt1: Variables stored in floating point registers: ~s.
This message indicates that the list given names variables that are placed into floating-point registers. Ultra-temporary values which do not have variable names are not listed.
;Vflt2: Variables stored in floating point memory: ~s.
This message indicates that the list given names variables that are stored onto the stack, in an area reserved for non-gc'd data. This area is not volatile across function calls.
This message is about where non-floating-point data is stored.
;Vmem1: Variables stored in (non-floating-point) memory: ~s.
This message indicates that the list given names variables that are stored into the stack, in an area that contains lisp data. This area is not volatile across function calls.
The format of the :explain declaration is described in Help with declarations: the :explain declaration in compiling.html. The sections above in this document list the labels used in :explain output. The subsections below describe particular :explain types and show examples.
Compilers in Allegro CL are specific to platforms, and particularly to the processor chip used. Because compilers work differently on different platforms, :explain output will differ on different platforms. The examples below are designed to show how to use the :explain feature and how to interpret the output. You will very likely see different results with your Lisp on your machine.
:calls
should not be used with :inlining
since calls information is provided by the :inlining
explanation. :types
provides different information and can be used with :inlining
.
The example below shows simple use of the :explain
declaration. The user is working with two-dimensional arrays of 8-bit data returned by an image scanner, and writes a function that computes the average data value in an array. Environment information is also available in Allegro CL (identifying, for example, lexical variables). See environments.html for information on how you might be able to augment the compilation environment, thus allowing variables that have not been setq'd within those blocks to be type-specified to a tighter type spec (resulting in better compilations).
cl-user(15): (defun 2d-avg (ar)
(declare (:explain :types :calls)
(optimize (speed 3) (safety 1))
(type (array (unsigned-byte 8) (* *)) ar))
(let ((sum 0))
(declare (fixnum sum))
(dotimes (x 64)
(dotimes (y 64)
(declare (:explain :calls :types))
(incf sum (aref ar x y))))
(/ sum 4096)))
2d-avg
cl-user(16): (compile *)
; While compiling 2d-avg:
;Tgen1:Examined a call to >=_2op with arguments:
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 64 type in fixnum range (integer 64 64)
;Tres1: which returns a value of type t
;Tgen1:Examined a call to >=_2op with arguments:
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 64 type in fixnum range (integer 64 64)
;Tres1: which returns a value of type t
;Call1:Generated a non-inline call to aref_2d:
;Tgen1:Examined a call to aref_2d with arguments:
;Targ2: symeval ar type (array (integer 0 255) (* *))
;Tinf1: variable-information: lexical: ((type (array # #)))
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Tres1: which returns a value in fixnum range of type (integer 0 255)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval sum type in fixnum range (integer -536870912 536870911)
;Tinf1: variable-information: lexical: ((type
(integer -536870912 536870911)))
;Targ2: symeval g27 type in fixnum range (integer 0 255)
;Tres1: which returns a value in fixnum range of type
;Tres1: (integer -536870912 536870911)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 1 type in fixnum range (integer 1 1)
;Tres1: which returns a value in fixnum range of type (integer 0 64)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 1 type in fixnum range (integer 1 1)
;Tres1: which returns a value in fixnum range of type (integer 0 64)
;Call1:Generated a non-inline call to /_2op:
;Tgen1:Examined a call to /_2op with arguments:
;Targ2: symeval sum type in fixnum range (integer -536870912 536870911)
;Tinf1: variable-information: lexical: ((type
(integer -536870912 536870911)))
;Targ3: constant 4096 type in fixnum range (integer 4096 4096)
;Tres1: which returns a value of type (rational * *)
2d-avg
nil
nil
cl-user(17):
See :type (T) labels and :calls (C) labels: CallN for explanations of the labels in the output in the examples in this section.
Note a couple of points: (1) whenever the system prints that it examined a call to a function, that means the function is a candidate for opencoding; (2) if opencoding does not succeed, a message, typically with a :Call label, is printed saying that the system generated a non-in-line call to the function (often before the examined messages). Therefore, if no generated..., the function was inlined. In the example, there are non-inline calls to aref_2d and /_2op.
Your first reaction might be "what are those functions"? When optimizing code, Allegro CL often uses internal variants of standard functions. The associated standard function can usually be easily recognized: aref_2d is aref with a 2-d array argument. /_2op is two-argument divide.
So why didn't they opencode? In Supported operations in compiling.html we list various candidates for opencoding. There is says that aref of a simple array can opencode. Here, the argument to aref is declared to be a (general) array. First we change that declaration.
The last form divides the sum of values by 4096, we are dividing two fixnums (we have declared that sum
is a fixnum). But / doesn't opencode for fixnums at least on the platform where this operation was tested. (Behavior may differ on different platforms: you should always try the examples yourself. They are here to illustrate the procedure only.) We could fix that by floating the values and getting a single-float result.
user(6): (defun 2d-avg (ar)
(declare (optimize (speed 3) (safety 1)) (:explain :types :calls)
(type (simple-array (unsigned-byte 8) (* *)) ar))
(let ((sum 0))
(declare (fixnum sum))
(dotimes (x 64)
(dotimes (y 64)
(declare (:explain :calls :types))
(incf sum (aref ar x y))))
(/ (float sum 1.0f0) 4096.0f0)))
2d-avg
user(7): (compile '2d-avg)
;Examining a call to >=_2op with arguments:
;Tgen1:Examined a call to >=_2op with arguments:
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 64 type in fixnum range (integer 64 64)
;Tres1: which returns a value of type t
;Tgen1:Examined a call to >=_2op with arguments:
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 64 type in fixnum range (integer 64 64)
;Tres1: which returns a value of type t
;Tgen1:Examined a call to aref_2d with arguments:
;Targ2: symeval ar type (simple-array (integer 0 255) (* *))
;Tinf1: variable-information: lexical: ((type (simple-array # #)))
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Tres1: which returns a value in fixnum range of type (integer 0 255)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval sum type in fixnum range (integer -536870912 536870911)
;Tinf1: variable-information: lexical: ((type
(integer -536870912 536870911)))
;Targ2: symeval g36059 type in fixnum range (integer 0 255)
;Tres1: which returns a value in fixnum range of type (integer
-536870912
536870911)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval y type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 1 type in fixnum range (integer 1 1)
;Tres1: which returns a value in fixnum range of type (integer 0 64)
;Tgen1:Examined a call to +_2op with arguments:
;Targ2: symeval x type in fixnum range (integer 0 64)
;Tinf1: variable-information: lexical: ((type (integer 0 64)))
;Targ3: constant 1 type in fixnum range (integer 1 1)
;Tres1: which returns a value in fixnum range of type (integer 0 64)
;Tgen1:Examined a (possibly unboxed) call to to-single-float with arguments:
;Targ2: symeval sum type in fixnum range (integer -536870912 536870911)
;Tinf1: variable-information: lexical: ((type
(integer -536870912 536870911)))
;Tres1: which returns a value of type (single-float * *)
;Tgen1:Examined a (possibly unboxed) call to /_2op with arguments:
;Targ1: call to to-single-float type (single-float * *)
;Targ3: constant 4096.0 type (single-float 4096.0 4096.0)
;Tres1: which returns a value of type (single-float * *)
2d-avg
nil
nil
cl-user(8):
This time, success.
Note, by the way, the declaration (optimize
(speed 3)
(safety 1))
. The trust-declarations-switch must be true (as it is when speed is greater than safety) for the :explain
:types
and :calls
declaration to have effect.
:boxing
should not be used with :inlining
since boxing information is provided by the :inlining explanation.
A number is boxed when it is converted from its machine representation to the Lisp representation. For floats, the machine representation is one (for singles) or two (for doubles) words. Lisp adds an extra word, which contains a pointer and a type code. For fixnums, boxing simply involves a left shift, of two bits in 32-bit Lisps and 3 in 64-bit Lisp.. For bignums which are in the range of machine integers, boxing again adds an additional word.
A box can also exist for multiple Lisp numbers. For single-floats, an array specialized with :element-type 'single-float
can serve as a box for as many single-floats as the size of the array. For double-floats, an array of double-floats serves as the multiple box. And for machine-integers, an array of :element-type '(signed-byte N)
, where N is 32 or 64, depending on whether the Lisp is a 32-bit lisp or a 64-bit lisp, will house machine-integer floats.
Boxing obviously involves a computational overhead, but more important it involves a space overhead. If a calculation involves the calculation of thousands of floats, for example, thousands of bytes of space will be used. Often that space need not be used. Let us consider a simple example. darray
is a vector of double floats. The function foo-insert takes a vector, an index, and a double float as arguments, does something to the double float (adds 2.0d0 but it could be anything) and stores the result into the vector at the indicated index. Suppose things were defined as follows:
(setq darray (make-array 10 :element-type 'double-float
:initial-element 0.0d0))
(defun foo-insert (arr index value)
(declare (type (simple-array double-float 1) arr)
(fixnum index)
(double-float value)
(optimize (speed 3))
(:explain :boxing))
(setf (aref arr index)(+ value 2.0d0)))
When we compile foo-insert, we are warned that one double-float is boxed:
user(16): (compile 'foo-insert)
;Bgen1:Generated a double-float box
foo-insert
nil
nil
user(17):
See :boxing (B) labels: Bgen1 for explanations of the labels in the output in the examples in this section.
Examining the code, we notice that: foo-insert returns a double-float (returned by the setf form). That value must be boxed before it can be returned. This can be fixed by adding nil
at the end of the definition of foo-insert, so it returns nil
instead:
(defun foo-insert (arr index value)
(declare (type (simple-array double-float 1) arr)
(fixnum index)
(double-float value)
(optimize (speed 3))
(:explain :boxing))
(setf (aref arr index)(+ value 2.0d0))
nil)
We try compiling again and no boxing is reported:
user(28): (compile 'foo-insert)
foo-insert
nil
nil
user(29):
Local variables in functions can be stored, during a function call, in memory or in registers. Storing in registers is faster, but there are only so many registers to go around. The compiler works out the live ranges of locals (the live range goes from when a local is first assigned a value until that value is last accessed) and then schedules register assignments. Unassigned locals are assigned to a memory location, so an access requires a fetch from memory and a store to memory.
To illustrate explaining variables, consider the following function, which has many arguments (which are treated as locals) and locals. The call to bar between the let bindings and the call to list ensures that the live range of all the variables lasts until the call to list. We show compilations on various platforms to make clear how behavior is platform-dependent.
On x86 (such as Linux or Windows):
cl-user(1): (defun bar (&rest args) nil)
bar
cl-user(2): (defun foo (a b c d e f g)
(declare (:explain :variables))
(declare (single-float a b c) (double-float d))
(let ((h (+ a b))
(i (- a c))
(j (* c e))
(k (- d c))
(l f)
(m e)
(n g))
(bar)
(list a b c d e f g h i j k l m n)))
foo
cl-user(3): (compile *)
;Vmem1: Variables stored in (non-floating-point) memory: (a h b i j k).
foo
nil
nil
cl-user(4):
On sparc:
cl-user(3): (compile *)
;Vreg1: Variables stored in general data registers:
;Vreg1: (a b c d e b h c i c e j k a c d e).
;Vreg2: Variables stored in permanent general registers:
;Vreg2: (h h i i j k k).
foo
nil
nil
cl-user(4):
On PowerPC (used on MacOSX):
cl-user(3): (compile *)
;Vreg2: Variables stored in permanent general registers:
;Vreg2: (a d f g h b c e i).
;Vmem1: Variables stored in (non-floating-point) memory: (j k).
foo
nil
nil
cl-user(4):
See :boxing (B) labels: Bgen1 for explanations of the labels in the output in the examples in this section.
Note that it is not recommended that you proclaim (with proclaim, so that it happens all the time) explanation of variables since reports will then be made on things compiled automatically (such as methods and foreign functions), causing unexpected messages to be printed to the terminal from time to time.
See tailmerging (M) labels for explanations of the labels in the output in the examples in this section.
There are two tailmerging compiler switches: tail-call-non-self-merge-switch and tail-call-self-merge-switch. When the :explain (:tailmerging t)
declaration is in effect, failure to tailmerge for any reason will be noted. A common reason is the setting of the optimization qualities are making the switches return nil
, as in the following example:
cl-user(14): :opt
A response of ? gets help on possible answers.
compiler optimize safety setting (0 is fastest): [1]
compiler optimize space setting: [1]
compiler optimize speed setting (3 is fastest): [1]
compiler optimize debug setting (3 is maximum): [2]
compiler optimize compilation-speed setting (3 is maximum): [1]
Compiler optimize setting is
(declaim (optimize (safety 1) (space 1) (speed 1) (debug 2)
(compilation-speed 1)))
cl-user(15): (print-startup-info :compiler-switches)
;; These are the values returned by the compiler switch functions given
;; the specified speed, safety, space and debug optimization qualities
;; (current values are used when values are not specified as arguments):
;;
;; [...]
;; compiler:tail-call-non-self-merge-switch nil
;; compiler:tail-call-self-merge-switch t
;; [...]
;;
nil
cl-user(16): (defun foo (x) (declare (:explain :tailmerging)) (bar x))
foo
cl-user(17): (compile 'foo)
;Mnot5: failed to merge a call to bar as non-self-tail-jump because comp:tail-call-non-self-merge-switch is returning nil
foo
nil
nil
cl-user(18): :opt
A response of ? gets help on possible answers.
compiler optimize safety setting (0 is fastest): [1]
compiler optimize space setting: [1]
compiler optimize speed setting (3 is fastest): [1] 3
compiler optimize debug setting (3 is maximum): [2] 0
compiler optimize compilation-speed setting (3 is maximum): [1]
Compiler optimize setting is
(declaim (optimize (safety 1) (space 1) (speed 3) (debug 0)
(compilation-speed 1)))
cl-user(19): (defun foo (x) (declare (:explain :tailmerging)) (bar x))
foo
cl-user(20): (compile 'foo)
;Merg1: merged a call to bar as a non-self-tail-jump
foo
nil
nil
cl-user(21):
Self tail merges are done (by default) unless speed is 0. Continuing from above:
cl-user(25): (defun fact (n)
(declare (:explain :tailmerging))
(if (= n 0) 1) (fact (- n 1)))
fact
cl-user(26): (compile 'fact)
;Merg1: merged a call to fact as a self-tail-jump
fact
nil
nil
cl-user(27): :opt
A response of ? gets help on possible answers.
compiler optimize safety setting (0 is fastest): [1] 1
compiler optimize space setting: [1]
compiler optimize speed setting (3 is fastest): [1] 0
compiler optimize debug setting (3 is maximum): [2] 2
compiler optimize compilation-speed setting (3 is maximum): [1]
Compiler optimize setting is
(declaim (optimize (safety 1) (space 1) (speed 0) (debug 2)
(compilation-speed 1)))
cl-user(28): (defun fact (n)
(declare (:explain :tailmerging))
(if (= n 0) 1) (fact (- n 1)))
fact
cl-user(29): (compile 'fact)
;Mnot5: failed to merge a call to fact as non-self-tail-jump because comp:tail-call-non-self-merge-switch is returning nil
fact
nil
nil
cl-user(30):
The compiler will now provide information on attempts to inline functions in compiled code. This information will be supplied when the :explain :inlining
declaration is in effect during compilation. The :explain
declaration is described in the section Help with declarations: the :explain declaration.
You can turn on inlining explanation with a declaration like
(declare (:explain :inlining t))
(See Help with declarations: the :explain declaration in compiling.html for a complete description of the :explain declaration format.)
You can make the declaration pervasive with proclaim or declaim but be warned that inline explanations produce a lot of output. It is recommended that you add the declaration only to the functions where inlining is an issue.
There are also :explain :calls
and :explain :boxing
declarations. The information provided by these calls is contained in the :inlining information so we recommend not asking for :calls or :boxing info when you ask for :inlining info.
To repeat what is said in compiling.html: the inline
declaration is ignored by the compiler. At appropriate settings of speed and safety, the compiler will inline whatever it can. Only predefined system functions can be inlined. User defined functions are never compiled inline. (The compiler will observe the notinline
declaration, however, so you can suppress inlining of specific functions if you want.)
The (:explain :inlining)
declaration instructs the compiler to report what it is doing when it tries to inline functions as it compiles code. the information is reported in a series of short messages. When calls are inlined successfully, no information is reported. (The ((:explain :calls)
also report on whether calls are inlined but provides less specific information.
See :inlining (I) labels for explanations of the labels in the output in the examples in this section.
Here we have a short example just to give the flavor of the declaration. There are three aspects where we start wrong and then do right:
vec
to be an array.The first function definition does the wrong thing in all four cases. Here is the compilation:
cl-user(51): (defun foo (x y vec n)
(declare (:explain (:inlining t))
(optimize (speed 1) (safety 1))
(float x y) (fixnum n)
(type (array single-float (*)) vec))
(setf (aref vec n) (+ x y)))
foo
cl-user(51): (compile 'foo)
;Igen1: Node: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Node: type is not trusted
;Igen3: Attempt to inline a boxed call to +_2op:
;Igen5: Node: checking for notinline declaration for excl::+_2op: none
;Igen6: Checking compiler:trust-declarations-switch for t (got nil) - failed
;Iadd2: Node: must check for fixnum at runtime.
;Igen4: Inline attempt succeeded.
;Igen1: Node: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Node: type is not trusted
;Igen3: Attempt to inline a boxed call to .inv-s-aref:
;Igen5: Node: checking for notinline declaration for excl::.inv-s-aref: none
;Iref1: Arg1: looking for (simple-array type 1)
;Iref1: where type is one of (t character (unsigned-byte 4) fixnum
;Iref1: (unsigned-byte 8) (signed-byte 8)
;Iref1: (unsigned-byte 16) (signed-byte 16)
;Iref1: (signed-byte 32) single-float double-float bit)
;Iref1: - failed
;Igen4: Inline attempt failed.
foo
nil
nil
cl-user(52):
We are told several things: that trust-declarations-switch is nil and declarations are not trusted; that we wanted a simple-array (of a particular type) and did not find one; and that even though the call to + did inline, the inlined code has various checks such as a check for fixnums.
Here is the revised code where speed is 3 and safety 1, we return nil
, and we declare x and y to be single-floats and vec to be a simple-array:
cl-user(53): (defun foo (x y vec n)
(declare (:explain (:inlining t))
(optimize (speed 3) (safety 1))
(single-float x y) (fixnum n)
(type (simple-array single-float (*)) vec))
(setf (aref vec n) (+ x y))
nil)
foo
cl-user(54): (compile 'foo)
;Igen1: Node: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Node: type is unboxable as single-float
;Igen3: Attempt to inline an unboxed call to +_2op:
;Igen5: Node: checking for notinline declaration for excl::+_2op: none
;Igen1: Arg0: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Arg0: type is unboxable as single-float
;Igen1: Arg1: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Arg1: type is unboxable as single-float
;Igen4: Inline attempt succeeded.
;Igen1: Node: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Node: type is unboxable as single-float
;Igen3: Attempt to inline an unboxed call to .inv-aref_1d:
;Igen5: Node: checking for notinline declaration for excl::.inv-aref_1d: none
;Iref1: Arg1: looking for (simple-array type 1)
;Iref1: where type is one of (single-float double-float (signed-byte 32)) -
;Iref1: got (simple-array (single-float * *) (*))
;Igen6: Checking compiler:trust-declarations-switch for t (got t) - succeeded
;Icon1: Arg2: looking for fixnum constant - got nil
;Igen1: Arg0: checking for implied unboxable type (floats, or
;Igen1: machine-integer):
;Igen2: Arg0: type is unboxable as single-float
;Igen4: Inline attempt succeeded.
;Igen4: Inline attempt succeeded.
foo
nil
nil
cl-user(55):
Things seemed to go much better.
Here is another example using the trusted-type declaration (see The excl:trusted-type declaration in compiling.html).
cl-user(1): (compile (defun foo (a y)
(declare (:explain :types :inlining)
(trusted-type (simple-array t (1 2)) a))
(aref a 0 y)))
; While compiling foo:
;Tgen1:Examined a call to aref with arguments:
;Targ2: symeval a type (simple-array t (1 2))
;Tinf1: variable-information: lexical: ((trusted-type (simple-array t #)))
;Targ3: constant 0 type in fixnum range (integer 0 0)
;Targ2: symeval y type t
;Tres1: which returns a value of type t
;Igen2: Node: type is not trusted
;Igen3: Attempt to inline a boxed call to aref while not trusting declarations:
;Igen5: Node: checking for notinline declaration for aref: none
;Iref1: Arg0: looking for (simple-array type 2)
;Iref1: where type is one of (t character (unsigned-byte 4) fixnum
;Iref1: (unsigned-byte 8) (signed-byte 8)
;Iref1: (unsigned-byte 16) (signed-byte 16)
;Iref1: (signed-byte 32) single-float double-float)
;Iref1: - got (simple-array t (1 2)) [trusted-type]
;Iref2: Arg0: looking for all dimensions declared for faster indexing - got
;Iref2: (1 2)
;Iref4: Checking for all array index args to be fixnum constants
;Iref4: - failed
;Iref5: Node: checking if there are any constant index values to avoid
;Iref5: multiplication
;Icon1: Arg1: looking for fixnum constant - got 0
;Icon1: Arg2: looking for fixnum constant - got nil
;Iref7: Node: found 1 useful index constant.
;Igen4: Inline attempt succeeded.
foo
nil
nil
cl-user(2):
Copyright (c) 2023, Franz Inc. Lafayette, CA., USA. All rights reserved.
|
Allegro CL version 11.0 |