diff options
Diffstat (limited to 'gcc/cp/gxxint.texi')
-rwxr-xr-x | gcc/cp/gxxint.texi | 1867 |
1 files changed, 1867 insertions, 0 deletions
diff --git a/gcc/cp/gxxint.texi b/gcc/cp/gxxint.texi new file mode 100755 index 0000000..3b8242d --- /dev/null +++ b/gcc/cp/gxxint.texi @@ -0,0 +1,1867 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename g++int.info +@settitle G++ internals +@setchapternewpage odd +@c %**end of header + +@node Top, Limitations of g++, (dir), (dir) +@chapter Internal Architecture of the Compiler + +This is meant to describe the C++ front-end for gcc in detail. +Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}. + +@menu +* Limitations of g++:: +* Routines:: +* Implementation Specifics:: +* Glossary:: +* Macros:: +* Typical Behavior:: +* Coding Conventions:: +* Templates:: +* Access Control:: +* Error Reporting:: +* Parser:: +* Copying Objects:: +* Exception Handling:: +* Free Store:: +* Mangling:: Function name mangling for C++ and Java +* Concept Index:: +@end menu + +@node Limitations of g++, Routines, Top, Top +@section Limitations of g++ + +@itemize @bullet +@item +Limitations on input source code: 240 nesting levels with the parser +stacksize (YYSTACKSIZE) set to 500 (the default), and requires around +16.4k swap space per nesting level. The parser needs about 2.09 * +number of nesting levels worth of stackspace. + +@cindex pushdecl_class_level +@item +I suspect there are other uses of pushdecl_class_level that do not call +set_identifier_type_value in tandem with the call to +pushdecl_class_level. It would seem to be an omission. + +@cindex access checking +@item +Access checking is unimplemented for nested types. + +@cindex @code{volatile} +@item +@code{volatile} is not implemented in general. + +@end itemize + +@node Routines, Implementation Specifics, Limitations of g++, Top +@section Routines + +This section describes some of the routines used in the C++ front-end. + +@code{build_vtable} and @code{prepare_fresh_vtable} is used only within +the @file{cp-class.c} file, and only in @code{finish_struct} and +@code{modify_vtable_entries}. + +@code{build_vtable}, @code{prepare_fresh_vtable}, and +@code{finish_struct} are the only routines that set @code{DECL_VPARENT}. + +@code{finish_struct} can steal the virtual function table from parents, +this prohibits related_vslot from working. When finish_struct steals, +we know that + +@example +get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0) +@end example + +@noindent +will get the related binfo. + +@code{layout_basetypes} does something with the VIRTUALS. + +Supposedly (according to Tiemann) most of the breadth first searching +done, like in @code{get_base_distance} and in @code{get_binfo} was not +because of any design decision. I have since found out the at least one +part of the compiler needs the notion of depth first binfo searching, I +am going to try and convert the whole thing, it should just work. The +term left-most refers to the depth first left-most node. It uses +@code{MAIN_VARIANT == type} as the condition to get left-most, because +the things that have @code{BINFO_OFFSET}s of zero are shared and will +have themselves as their own @code{MAIN_VARIANT}s. The non-shared right +ones, are copies of the left-most one, hence if it is its own +@code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is +a non-left-most one. + +@code{get_base_distance}'s path and distance matters in its use in: + +@itemize @bullet +@item +@code{prepare_fresh_vtable} (the code is probably wrong) +@item +@code{init_vfields} Depends upon distance probably in a safe way, +build_offset_ref might use partial paths to do further lookups, +hack_identifier is probably not properly checking access. + +@item +@code{get_first_matching_virtual} probably should check for +@code{get_base_distance} returning -2. + +@item +@code{resolve_offset_ref} should be called in a more deterministic +manner. Right now, it is called in some random contexts, like for +arguments at @code{build_method_call} time, @code{default_conversion} +time, @code{convert_arguments} time, @code{build_unary_op} time, +@code{build_c_cast} time, @code{build_modify_expr} time, +@code{convert_for_assignment} time, and +@code{convert_for_initialization} time. + +But, there are still more contexts it needs to be called in, one was the +ever simple: + +@example +if (obj.*pmi != 7) + @dots{} +@end example + +Seems that the problems were due to the fact that @code{TREE_TYPE} of +the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type +of the referent (like @code{INTEGER_TYPE}). This problem was fixed by +changing @code{default_conversion} to check @code{TREE_CODE (x)}, +instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it +was @code{OFFSET_TYPE}. + +@end itemize + +@node Implementation Specifics, Glossary, Routines, Top +@section Implementation Specifics + +@itemize @bullet +@item Explicit Initialization + +The global list @code{current_member_init_list} contains the list of +mem-initializers specified in a constructor declaration. For example: + +@example +foo::foo() : a(1), b(2) @{@} +@end example + +@noindent +will initialize @samp{a} with 1 and @samp{b} with 2. +@code{expand_member_init} places each initialization (a with 1) on the +global list. Then, when the fndecl is being processed, +@code{emit_base_init} runs down the list, initializing them. It used to +be the case that g++ first ran down @code{current_member_init_list}, +then ran down the list of members initializing the ones that weren't +explicitly initialized. Things were rewritten to perform the +initializations in order of declaration in the class. So, for the above +example, @samp{a} and @samp{b} will be initialized in the order that +they were declared: + +@example +class foo @{ public: int b; int a; foo (); @}; +@end example + +@noindent +Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be +initialized with 1, regardless of how they're listed in the mem-initializer. + +@item The Explicit Keyword + +The use of @code{explicit} on a constructor is used by @code{grokdeclarator} +to set the field @code{DECL_NONCONVERTING_P}. That value is used by +@code{build_method_call} and @code{build_user_type_conversion_1} to decide +if a particular constructor should be used as a candidate for conversions. + +@end itemize + +@node Glossary, Macros, Implementation Specifics, Top +@section Glossary + +@table @r +@item binfo +The main data structure in the compiler used to represent the +inheritance relationships between classes. The data in the binfo can be +accessed by the BINFO_ accessor macros. + +@item vtable +@itemx virtual function table + +The virtual function table holds information used in virtual function +dispatching. In the compiler, they are usually referred to as vtables, +or vtbls. The first index is not used in the normal way, I believe it +is probably used for the virtual destructor. + +@item vfield + +vfields can be thought of as the base information needed to build +vtables. For every vtable that exists for a class, there is a vfield. +See also vtable and virtual function table pointer. When a type is used +as a base class to another type, the virtual function table for the +derived class can be based upon the vtable for the base class, just +extended to include the additional virtual methods declared in the +derived class. The virtual function table from a virtual base class is +never reused in a derived class. @code{is_normal} depends upon this. + +@item virtual function table pointer + +These are @code{FIELD_DECL}s that are pointer types that point to +vtables. See also vtable and vfield. +@end table + +@node Macros, Typical Behavior, Glossary, Top +@section Macros + +This section describes some of the macros used on trees. The list +should be alphabetical. Eventually all macros should be documented +here. + +@table @code +@item BINFO_BASETYPES +A vector of additional binfos for the types inherited by this basetype. +The binfos are fully unshared (except for virtual bases, in which +case the binfo structure is shared). + + If this basetype describes type D as inherited in C, + and if the basetypes of D are E anf F, + then this vector contains binfos for inheritance of E and F by C. + +Has values of: + + TREE_VECs + + +@item BINFO_INHERITANCE_CHAIN +Temporarily used to represent specific inheritances. It usually points +to the binfo associated with the lesser derived type, but it can be +reversed by reverse_path. For example: + +@example + Z ZbY least derived + | + Y YbX + | + X Xb most derived + +TYPE_BINFO (X) == Xb +BINFO_INHERITANCE_CHAIN (Xb) == YbX +BINFO_INHERITANCE_CHAIN (Yb) == ZbY +BINFO_INHERITANCE_CHAIN (Zb) == 0 +@end example + +Not sure is the above is really true, get_base_distance has is point +towards the most derived type, opposite from above. + +Set by build_vbase_path, recursive_bounded_basetype_p, +get_base_distance, lookup_field, lookup_fnfields, and reverse_path. + +What things can this be used on: + + TREE_VECs that are binfos + + +@item BINFO_OFFSET +The offset where this basetype appears in its containing type. +BINFO_OFFSET slot holds the offset (in bytes) from the base of the +complete object to the base of the part of the object that is allocated +on behalf of this `type'. This is always 0 except when there is +multiple inheritance. + +Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example. + + +@item BINFO_VIRTUALS +A unique list of functions for the virtual function table. See also +TYPE_BINFO_VIRTUALS. + +What things can this be used on: + + TREE_VECs that are binfos + + +@item BINFO_VTABLE +Used to find the VAR_DECL that is the virtual function table associated +with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual +function table pointer, see CLASSTYPE_VFIELD. + +What things can this be used on: + + TREE_VECs that are binfos + +Has values of: + + VAR_DECLs that are virtual function tables + + +@item BLOCK_SUPERCONTEXT +In the outermost scope of each function, it points to the FUNCTION_DECL +node. It aids in better DWARF support of inline functions. + + +@item CLASSTYPE_TAGS +CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a +class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans +these and calls pushtag on them.) + +finish_struct scans these to produce TYPE_DECLs to add to the +TYPE_FIELDS of the type. + +It is expected that name found in the TREE_PURPOSE slot is unique, +resolve_scope_to_name is one such place that depends upon this +uniqueness. + + +@item CLASSTYPE_METHOD_VEC +The following is true after finish_struct has been called (on the +class?) but not before. Before finish_struct is called, things are +different to some extent. Contains a TREE_VEC of methods of the class. +The TREE_VEC_LENGTH is the number of differently named methods plus one +for the 0th entry. The 0th entry is always allocated, and reserved for +ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE. +Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL, +there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a +given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next +method that has the same name (but a different signature). It would +seem that it is not true that because the DECL_CHAIN slot is used in +this way, we cannot call pushdecl to put the method in the global scope +(cause that would overwrite the TREE_CHAIN slot), because they use +different _CHAINs. finish_struct_methods setups up one version of the +TREE_CHAIN slots on the FUNCTION_DECLs. + +friends are kept in TREE_LISTs, so that there's no need to use their +TREE_CHAIN slot for anything. + +Has values of: + + TREE_VECs + + +@item CLASSTYPE_VFIELD +Seems to be in the process of being renamed TYPE_VFIELD. Use on types +to get the main virtual function table pointer. To get the virtual +function table use BINFO_VTABLE (TYPE_BINFO ()). + +Has values of: + + FIELD_DECLs that are virtual function table pointers + +What things can this be used on: + + RECORD_TYPEs + + +@item DECL_CLASS_CONTEXT +Identifies the context that the _DECL was found in. For virtual function +tables, it points to the type associated with the virtual function +table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT. + +The difference between this and DECL_CONTEXT, is that for virtuals +functions like: + +@example +struct A +@{ + virtual int f (); +@}; + +struct B : A +@{ + int f (); +@}; + +DECL_CONTEXT (A::f) == A +DECL_CLASS_CONTEXT (A::f) == A + +DECL_CONTEXT (B::f) == A +DECL_CLASS_CONTEXT (B::f) == B +@end example + +Has values of: + + RECORD_TYPEs, or UNION_TYPEs + +What things can this be used on: + + TYPE_DECLs, _DECLs + + +@item DECL_CONTEXT +Identifies the context that the _DECL was found in. Can be used on +virtual function tables to find the type associated with the virtual +function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a +better access method. Internally the same as DECL_FIELD_CONTEXT, so +don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and +DECL_CLASS_CONTEXT. + +Has values of: + + RECORD_TYPEs + + +What things can this be used on: + +@display +VAR_DECLs that are virtual function tables +_DECLs +@end display + + +@item DECL_FIELD_CONTEXT +Identifies the context that the FIELD_DECL was found in. Internally the +same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT, +DECL_FCONTEXT and DECL_CLASS_CONTEXT. + +Has values of: + + RECORD_TYPEs + +What things can this be used on: + +@display +FIELD_DECLs that are virtual function pointers +FIELD_DECLs +@end display + + +@item DECL_NAME + +Has values of: + +@display +0 for things that don't have names +IDENTIFIER_NODEs for TYPE_DECLs +@end display + +@item DECL_IGNORED_P +A bit that can be set to inform the debug information output routines in +the back-end that a certain _DECL node should be totally ignored. + +Used in cases where it is known that the debugging information will be +output in another file, or where a sub-type is known not to be needed +because the enclosing type is not needed. + +A compiler constructed virtual destructor in derived classes that do not +define an explicit destructor that was defined explicit in a base class +has this bit set as well. Also used on __FUNCTION__ and +__PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and +c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,'' +and ``user-invisible variable.'' + +Functions built by the C++ front-end such as default destructors, +virtual destructors and default constructors want to be marked that +they are compiler generated, but unsure why. + +Currently, it is used in an absolute way in the C++ front-end, as an +optimization, to tell the debug information output routines to not +generate debugging information that will be output by another separately +compiled file. + + +@item DECL_VIRTUAL_P +A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is +wrong.) Used in VAR_DECLs to indicate that the variable is a vtable. +It is also used in FIELD_DECLs for vtable pointers. + +What things can this be used on: + + FIELD_DECLs and VAR_DECLs + + +@item DECL_VPARENT +Used to point to the parent type of the vtable if there is one, else it +is just the type associated with the vtable. Because of the sharing of +virtual function tables that goes on, this slot is not very useful, and +is in fact, not used in the compiler at all. It can be removed. + +What things can this be used on: + + VAR_DECLs that are virtual function tables + +Has values of: + + RECORD_TYPEs maybe UNION_TYPEs + + +@item DECL_FCONTEXT +Used to find the first baseclass in which this FIELD_DECL is defined. +See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT. + +How it is used: + + Used when writing out debugging information about vfield and + vbase decls. + +What things can this be used on: + + FIELD_DECLs that are virtual function pointers + FIELD_DECLs + + +@item DECL_REFERENCE_SLOT +Used to hold the initialize for the reference. + +What things can this be used on: + + PARM_DECLs and VAR_DECLs that have a reference type + + +@item DECL_VINDEX +Used for FUNCTION_DECLs in two different ways. Before the structure +containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a +FUNCTION_DECL in a base class which is the FUNCTION_DECL which this +FUNCTION_DECL will replace as a virtual function. When the class is +laid out, this pointer is changed to an INTEGER_CST node which is +suitable to find an index into the virtual function table. See +get_vtable_entry as to how one can find the right index into the virtual +function table. The first index 0, of a virtual function table it not +used in the normal way, so the first real index is 1. + +DECL_VINDEX may be a TREE_LIST, that would seem to be a list of +overridden FUNCTION_DECLs. add_virtual_function has code to deal with +this when it uses the variable base_fndecl_list, but it would seem that +somehow, it is possible for the TREE_LIST to pursist until method_call, +and it should not. + + +What things can this be used on: + + FUNCTION_DECLs + + +@item DECL_SOURCE_FILE +Identifies what source file a particular declaration was found in. + +Has values of: + + "<built-in>" on TYPE_DECLs to mean the typedef is built in + + +@item DECL_SOURCE_LINE +Identifies what source line number in the source file the declaration +was found at. + +Has values of: + +@display +0 for an undefined label + +0 for TYPE_DECLs that are internally generated + +0 for FUNCTION_DECLs for functions generated by the compiler + (not yet, but should be) + +0 for ``magic'' arguments to functions, that the user has no + control over +@end display + + +@item TREE_USED + +Has values of: + + 0 for unused labels + + +@item TREE_ADDRESSABLE +A flag that is set for any type that has a constructor. + + +@item TREE_COMPLEXITY +They seem a kludge way to track recursion, poping, and pushing. They only +appear in cp-decl.c and cp-decl2.c, so the are a good candidate for +proper fixing, and removal. + + +@item TREE_HAS_CONSTRUCTOR +A flag to indicate when a CALL_EXPR represents a call to a constructor. +If set, we know that the type of the object, is the complete type of the +object, and that the value returned is nonnull. When used in this +fashion, it is an optimization. Can also be used on SAVE_EXPRs to +indicate when they are of fixed type and nonnull. Can also be used on +INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor. + + +@item TREE_PRIVATE +Set for FIELD_DECLs by finish_struct. But not uniformly set. + +The following routines do something with PRIVATE access: +build_method_call, alter_access, finish_struct_methods, +finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType, +CWriteUseObject, compute_access, lookup_field, dfs_pushdecl, +GNU_xref_member, dbxout_type_fields, dbxout_type_method_1 + + +@item TREE_PROTECTED +The following routines do something with PROTECTED access: +build_method_call, alter_access, finish_struct, convert_to_aggr, +CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject, +compute_access, lookup_field, GNU_xref_member, dbxout_type_fields, +dbxout_type_method_1 + + +@item TYPE_BINFO +Used to get the binfo for the type. + +Has values of: + + TREE_VECs that are binfos + +What things can this be used on: + + RECORD_TYPEs + + +@item TYPE_BINFO_BASETYPES +See also BINFO_BASETYPES. + +@item TYPE_BINFO_VIRTUALS +A unique list of functions for the virtual function table. See also +BINFO_VIRTUALS. + +What things can this be used on: + + RECORD_TYPEs + + +@item TYPE_BINFO_VTABLE +Points to the virtual function table associated with the given type. +See also BINFO_VTABLE. + +What things can this be used on: + + RECORD_TYPEs + +Has values of: + + VAR_DECLs that are virtual function tables + + +@item TYPE_NAME +Names the type. + +Has values of: + +@display +0 for things that don't have names. +should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and + ENUM_TYPEs. +TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but + shouldn't be. +TYPE_DECL for typedefs, unsure why. +@end display + +What things can one use this on: + +@display +TYPE_DECLs +RECORD_TYPEs +UNION_TYPEs +ENUM_TYPEs +@end display + +History: + + It currently points to the TYPE_DECL for RECORD_TYPEs, + UNION_TYPEs and ENUM_TYPEs, but it should be history soon. + + +@item TYPE_METHODS +Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with +@code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a +class. + + +@item TYPE_DECL +Used to represent typedefs, and used to represent bindings layers. + +Components: + + DECL_NAME is the name of the typedef. For example, foo would + be found in the DECL_NAME slot when @code{typedef int foo;} is + seen. + + DECL_SOURCE_LINE identifies what source line number in the + source file the declaration was found at. A value of 0 + indicates that this TYPE_DECL is just an internal binding layer + marker, and does not correspond to a user supplied typedef. + + DECL_SOURCE_FILE + +@item TYPE_FIELDS +A linked list (via @code{TREE_CHAIN}) of member types of a class. The +list can contain @code{TYPE_DECL}s, but there can also be other things +in the list apparently. See also @code{CLASSTYPE_TAGS}. + + +@item TYPE_VIRTUAL_P +A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is +a virtual function table or a pointer to one. When used on a +@code{FUNCTION_DECL}, indicates that it is a virtual function. When +used on an @code{IDENTIFIER_NODE}, indicates that a function with this +same name exists and has been declared virtual. + +When used on types, it indicates that the type has virtual functions, or +is derived from one that does. + +Not sure if the above about virtual function tables is still true. See +also info on @code{DECL_VIRTUAL_P}. + +What things can this be used on: + + FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs + + +@item VF_BASETYPE_VALUE +Get the associated type from the binfo that caused the given vfield to +exist. This is the least derived class (the most parent class) that +needed a virtual function table. It is probably the case that all uses +of this field are misguided, but they need to be examined on a +case-by-case basis. See history for more information on why the +previous statement was made. + +Set at @code{finish_base_struct} time. + +What things can this be used on: + + TREE_LISTs that are vfields + +History: + + This field was used to determine if a virtual function table's + slot should be filled in with a certain virtual function, by + checking to see if the type returned by VF_BASETYPE_VALUE was a + parent of the context in which the old virtual function existed. + This incorrectly assumes that a given type _could_ not appear as + a parent twice in a given inheritance lattice. For single + inheritance, this would in fact work, because a type could not + possibly appear more than once in an inheritance lattice, but + with multiple inheritance, a type can appear more than once. + + +@item VF_BINFO_VALUE +Identifies the binfo that caused this vfield to exist. If this vfield +is from the first direct base class that has a virtual function table, +then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the +direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL} +on result to find out if it is a virtual base class. Related to the +binfo found by + +@example +get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) +@end example + +@noindent +where @samp{t} is the type that has the given vfield. + +@example +get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) +@end example + +@noindent +will return the binfo for the given vfield. + +May or may not be set at @code{modify_vtable_entries} time. Set at +@code{finish_base_struct} time. + +What things can this be used on: + + TREE_LISTs that are vfields + + +@item VF_DERIVED_VALUE +Identifies the type of the most derived class of the vfield, excluding +the class this vfield is for. + +Set at @code{finish_base_struct} time. + +What things can this be used on: + + TREE_LISTs that are vfields + + +@item VF_NORMAL_VALUE +Identifies the type of the most derived class of the vfield, including +the class this vfield is for. + +Set at @code{finish_base_struct} time. + +What things can this be used on: + + TREE_LISTs that are vfields + + +@item WRITABLE_VTABLES +This is a option that can be defined when building the compiler, that +will cause the compiler to output vtables into the data segment so that +the vtables maybe written. This is undefined by default, because +normally the vtables should be unwritable. People that implement object +I/O facilities may, or people that want to change the dynamic type of +objects may want to have the vtables writable. Another way of achieving +this would be to make a copy of the vtable into writable memory, but the +drawback there is that that method only changes the type for one object. + +@end table + +@node Typical Behavior, Coding Conventions, Macros, Top +@section Typical Behavior + +@cindex parse errors + +Whenever seemingly normal code fails with errors like +@code{syntax error at `\@{'}, it's highly likely that grokdeclarator is +returning a NULL_TREE for whatever reason. + +@node Coding Conventions, Templates, Typical Behavior, Top +@section Coding Conventions + +It should never be that case that trees are modified in-place by the +back-end, @emph{unless} it is guaranteed that the semantics are the same +no matter how shared the tree structure is. @file{fold-const.c} still +has some cases where this is not true, but rms hypothesizes that this +will never be a problem. + +@node Templates, Access Control, Coding Conventions, Top +@section Templates + +A template is represented by a @code{TEMPLATE_DECL}. The specific +fields used are: + +@table @code +@item DECL_TEMPLATE_RESULT +The generic decl on which instantiations are based. This looks just +like any other decl. + +@item DECL_TEMPLATE_PARMS +The parameters to this template. +@end table + +The generic decl is parsed as much like any other decl as possible, +given the parameterization. The template decl is not built up until the +generic decl has been completed. For template classes, a template decl +is generated for each member function and static data member, as well. + +Template members of template classes are represented by a TEMPLATE_DECL +for the class' parameters around another TEMPLATE_DECL for the member's +parameters. + +All declarations that are instantiations or specializations of templates +refer to their template and parameters through DECL_TEMPLATE_INFO. + +How should I handle parsing member functions with the proper param +decls? Set them up again or try to use the same ones? Currently we do +the former. We can probably do this without any extra machinery in +store_pending_inline, by deducing the parameters from the decl in +do_pending_inlines. PRE_PARSED_TEMPLATE_DECL? + +If a base is a parm, we can't check anything about it. If a base is not +a parm, we need to check it for name binding. Do finish_base_struct if +no bases are parameterized (only if none, including indirect, are +parms). Nah, don't bother trying to do any of this until instantiation +-- we only need to do name binding in advance. + +Always set up method vec and fields, inc. synthesized methods. Really? +We can't know the types of the copy folks, or whether we need a +destructor, or can have a default ctor, until we know our bases and +fields. Otherwise, we can assume and fix ourselves later. Hopefully. + +@node Access Control, Error Reporting, Templates, Top +@section Access Control +The function compute_access returns one of three values: + +@table @code +@item access_public +means that the field can be accessed by the current lexical scope. + +@item access_protected +means that the field cannot be accessed by the current lexical scope +because it is protected. + +@item access_private +means that the field cannot be accessed by the current lexical scope +because it is private. +@end table + +DECL_ACCESS is used for access declarations; alter_access creates a list +of types and accesses for a given decl. + +Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return +codes of compute_access and were used as a cache for compute_access. +Now they are not used at all. + +TREE_PROTECTED and TREE_PRIVATE are used to record the access levels +granted by the containing class. BEWARE: TREE_PUBLIC means something +completely unrelated to access control! + +@node Error Reporting, Parser, Access Control, Top +@section Error Reporting + +The C++ front-end uses a call-back mechanism to allow functions to print +out reasonable strings for types and functions without putting extra +logic in the functions where errors are found. The interface is through +the @code{cp_error} function (or @code{cp_warning}, etc.). The +syntax is exactly like that of @code{error}, except that a few more +conversions are supported: + +@itemize @bullet +@item +%C indicates a value of `enum tree_code'. +@item +%D indicates a *_DECL node. +@item +%E indicates a *_EXPR node. +@item +%L indicates a value of `enum languages'. +@item +%P indicates the name of a parameter (i.e. "this", "1", "2", ...) +@item +%T indicates a *_TYPE node. +@item +%O indicates the name of an operator (MODIFY_EXPR -> "operator ="). + +@end itemize + +There is some overlap between these; for instance, any of the node +options can be used for printing an identifier (though only @code{%D} +tries to decipher function names). + +For a more verbose message (@code{class foo} as opposed to just @code{foo}, +including the return type for functions), use @code{%#c}. +To have the line number on the error message indicate the line of the +DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want, +use @code{%+D}, or it will default to the first. + +@node Parser, Copying Objects, Error Reporting, Top +@section Parser + +Some comments on the parser: + +The @code{after_type_declarator} / @code{notype_declarator} hack is +necessary in order to allow redeclarations of @code{TYPENAME}s, for +instance + +@example +typedef int foo; +class A @{ + char *foo; +@}; +@end example + +In the above, the first @code{foo} is parsed as a @code{notype_declarator}, +and the second as a @code{after_type_declarator}. + +Ambiguities: + +There are currently four reduce/reduce ambiguities in the parser. They are: + +1) Between @code{template_parm} and +@code{named_class_head_sans_basetype}, for the tokens @code{aggr +identifier}. This situation occurs in code looking like + +@example +template <class T> class A @{ @}; +@end example + +It is ambiguous whether @code{class T} should be parsed as the +declaration of a template type parameter named @code{T} or an unnamed +constant parameter of type @code{class T}. Section 14.6, paragraph 3 of +the January '94 working paper states that the first interpretation is +the correct one. This ambiguity results in two reduce/reduce conflicts. + +2) Between @code{primary} and @code{type_id} for code like @samp{int()} +in places where both can be accepted, such as the argument to +@code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies +that these ambiguous constructs will be interpreted as @code{typename}s. +This ambiguity results in six reduce/reduce conflicts between +@samp{absdcl} and @samp{functional_cast}. + +3) Between @code{functional_cast} and +@code{complex_direct_notype_declarator}, for various token strings. +This situation occurs in code looking like + +@example +int (*a); +@end example + +This code is ambiguous; it could be a declaration of the variable +@samp{a} as a pointer to @samp{int}, or it could be a functional cast of +@samp{*a} to @samp{int}. Section 6.8 specifies that the former +interpretation is correct. This ambiguity results in 7 reduce/reduce +conflicts. Another aspect of this ambiguity is code like 'int (x[2]);', +which is resolved at the '[' and accounts for 6 reduce/reduce conflicts +between @samp{direct_notype_declarator} and +@samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r +conflicts between @samp{expr_or_declarator} and @samp{primary} over code +like 'int (a);', which could probably be resolved but would also +probably be more trouble than it's worth. In all, this situation +accounts for 17 conflicts. Ack! + +The second case above is responsible for the failure to parse 'LinppFile +ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave +Math.h++) as an object declaration, and must be fixed so that it does +not resolve until later. + +4) Indirectly between @code{after_type_declarator} and @code{parm}, for +type names. This occurs in (as one example) code like + +@example +typedef int foo, bar; +class A @{ + foo (bar); +@}; +@end example + +What is @code{bar} inside the class definition? We currently interpret +it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an +@code{after_type_declarator}. I believe that xlC is correct, in light +of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that +could possibly be a type name is taken as the @i{decl-specifier-seq} of +a @i{declaration}." However, it seems clear that this rule must be +violated in the case of constructors. This ambiguity accounts for 8 +conflicts. + +Unlike the others, this ambiguity is not recognized by the Working Paper. + +@node Copying Objects, Exception Handling, Parser, Top +@section Copying Objects + +The generated copy assignment operator in g++ does not currently do the +right thing for multiple inheritance involving virtual bases; it just +calls the copy assignment operators for its direct bases. What it +should probably do is: + +1) Split up the copy assignment operator for all classes that have +vbases into "copy my vbases" and "copy everything else" parts. Or do +the trickiness that the constructors do to ensure that vbases don't get +initialized by intermediate bases. + +2) Wander through the class lattice, find all vbases for which no +intermediate base has a user-defined copy assignment operator, and call +their "copy everything else" routines. If not all of my vbases satisfy +this criterion, warn, because this may be surprising behavior. + +3) Call the "copy everything else" routine for my direct bases. + +If we only have one direct base, we can just foist everything off onto +them. + +This issue is currently under discussion in the core reflector +(2/28/94). + +@node Exception Handling, Free Store, Copying Objects, Top +@section Exception Handling + +Note, exception handling in g++ is still under development. + +This section describes the mapping of C++ exceptions in the C++ +front-end, into the back-end exception handling framework. + +The basic mechanism of exception handling in the back-end is +unwind-protect a la elisp. This is a general, robust, and language +independent representation for exceptions. + +The C++ front-end exceptions are mapping into the unwind-protect +semantics by the C++ front-end. The mapping is describe below. + +When -frtti is used, rtti is used to do exception object type checking, +when it isn't used, the encoded name for the type of the object being +thrown is used instead. All code that originates exceptions, even code +that throws exceptions as a side effect, like dynamic casting, and all +code that catches exceptions must be compiled with either -frtti, or +-fno-rtti. It is not possible to mix rtti base exception handling +objects with code that doesn't use rtti. The exceptions to this, are +code that doesn't catch or throw exceptions, catch (...), and code that +just rethrows an exception. + +Currently we use the normal mangling used in building functions names +(int's are "i", const char * is PCc) to build the non-rtti base type +descriptors for exception handling. These descriptors are just plain +NULL terminated strings, and internally they are passed around as char +*. + +In C++, all cleanups should be protected by exception regions. The +region starts just after the reason why the cleanup is created has +ended. For example, with an automatic variable, that has a constructor, +it would be right after the constructor is run. The region ends just +before the finalization is expanded. Since the backend may expand the +cleanup multiple times along different paths, once for normal end of the +region, once for non-local gotos, once for returns, etc, the backend +must take special care to protect the finalization expansion, if the +expansion is for any other reason than normal region end, and it is +`inline' (it is inside the exception region). The backend can either +choose to move them out of line, or it can created an exception region +over the finalization to protect it, and in the handler associated with +it, it would not run the finalization as it otherwise would have, but +rather just rethrow to the outer handler, careful to skip the normal +handler for the original region. + +In Ada, they will use the more runtime intensive approach of having +fewer regions, but at the cost of additional work at run time, to keep a +list of things that need cleanups. When a variable has finished +construction, they add the cleanup to the list, when the come to the end +of the lifetime of the variable, the run the list down. If the take a +hit before the section finishes normally, they examine the list for +actions to perform. I hope they add this logic into the back-end, as it +would be nice to get that alternative approach in C++. + +On an rs6000, xlC stores exception objects on that stack, under the try +block. When is unwinds down into a handler, the frame pointer is +adjusted back to the normal value for the frame in which the handler +resides, and the stack pointer is left unchanged from the time at which +the object was thrown. This is so that there is always someplace for +the exception object, and nothing can overwrite it, once we start +throwing. The only bad part, is that the stack remains large. + +The below points out some things that work in g++'s exception handling. + +All completely constructed temps and local variables are cleaned up in +all unwinded scopes. Completely constructed parts of partially +constructed objects are cleaned up. This includes partially built +arrays. Exception specifications are now handled. Thrown objects are +now cleaned up all the time. We can now tell if we have an active +exception being thrown or not (__eh_type != 0). We use this to call +terminate if someone does a throw; without there being an active +exception object. uncaught_exception () works. Exception handling +should work right if you optimize. Exception handling should work with +-fpic or -fPIC. + +The below points out some flaws in g++'s exception handling, as it now +stands. + +Only exact type matching or reference matching of throw types works when +-fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and +-mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000, +PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not +work. HPPA is mostly done, but throwing between a shared library and +user code doesn't yet work. Some targets have support for data-driven +unwinding. Partial support is in for all other machines, but a stack +unwinder called __unwind_function has to be written, and added to +libgcc2 for them. The new EH code doesn't rely upon the +__unwind_function for C++ code, instead it creates per function +unwinders right inside the function, unfortunately, on many platforms +the definition of RETURN_ADDR_RTX in the tm.h file for the machine port +is wrong. See below for details on __unwind_function. RTL_EXPRs for EH +cond variables for && and || exprs should probably be wrapped in +UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved. + +We only do pointer conversions on exception matching a la 15.3 p2 case +3: `A handler with type T, const T, T&, or const T& is a match for a +throw-expression with an object of type E if [3]T is a pointer type and +E is a pointer type that can be converted to T by a standard pointer +conversion (_conv.ptr_) not involving conversions to pointers to private +or protected base classes.' when -frtti is given. + +We don't call delete on new expressions that die because the ctor threw +an exception. See except/18 for a test case. + +15.2 para 13: The exception being handled should be rethrown if control +reaches the end of a handler of the function-try-block of a constructor +or destructor, right now, it is not. + +15.2 para 12: If a return statement appears in a handler of +function-try-block of a constructor, the program is ill-formed, but this +isn't diagnosed. + +15.2 para 11: If the handlers of a function-try-block contain a jump +into the body of a constructor or destructor, the program is ill-formed, +but this isn't diagnosed. + +15.2 para 9: Check that the fully constructed base classes and members +of an object are destroyed before entering the handler of a +function-try-block of a constructor or destructor for that object. + +build_exception_variant should sort the incoming list, so that it +implements set compares, not exact list equality. Type smashing should +smash exception specifications using set union. + +Thrown objects are usually allocated on the heap, in the usual way. If +one runs out of heap space, throwing an object will probably never work. +This could be relaxed some by passing an __in_chrg parameter to track +who has control over the exception object. Thrown objects are not +allocated on the heap when they are pointer to object types. We should +extend it so that all small (<4*sizeof(void*)) objects are stored +directly, instead of allocated on the heap. + +When the backend returns a value, it can create new exception regions +that need protecting. The new region should rethrow the object in +context of the last associated cleanup that ran to completion. + +The structure of the code that is generated for C++ exception handling +code is shown below: + +@example +Ln: throw value; + copy value onto heap + jump throw (Ln, id, address of copy of value on heap) + + try @{ ++Lstart: the start of the main EH region +|... ... ++Lend: the end of the main EH region + @} catch (T o) @{ + ...1 + @} +Lresume: + nop used to make sure there is something before + the next region ends, if there is one +... ... + + jump Ldone +[ +Lmainhandler: handler for the region Lstart-Lend + cleanup +] zero or more, depending upon automatic vars with dtors ++Lpartial: +| jump Lover ++Lhere: + rethrow (Lhere, same id, same obj); +Lterm: handler for the region Lpartial-Lhere + call terminate +Lover: +[ + [ + call throw_type_match + if (eq) @{ + ] these lines disappear when there is no catch condition ++Lsregion2: +| ...1 +| jump Lresume +|Lhandler: handler for the region Lsregion2-Leregion2 +| rethrow (Lresume, same id, same obj); ++Leregion2 + @} +] there are zero or more of these sections, depending upon how many + catch clauses there are +----------------------------- expand_end_all_catch -------------------------- + here we have fallen off the end of all catch + clauses, so we rethrow to outer + rethrow (Lresume, same id, same obj); +----------------------------- expand_end_all_catch -------------------------- +[ +L1: maybe throw routine +] depending upon if we have expanded it or not +Ldone: + ret + +start_all_catch emits labels: Lresume, + +@end example + +The __unwind_function takes a pointer to the throw handler, and is +expected to pop the stack frame that was built to call it, as well as +the frame underneath and then jump to the throw handler. It must +restore all registers to their proper values as well as all other +machine state as determined by the context in which we are unwinding +into. The way I normally start is to compile: + + void *g; + foo(void* a) @{ g = a; @} + +with -S, and change the thing that alters the PC (return, or ret +usually) to not alter the PC, making sure to leave all other semantics +(like adjusting the stack pointer, or frame pointers) in. After that, +replicate the prologue once more at the end, again, changing the PC +altering instructions, and finally, at the very end, jump to `g'. + +It takes about a week to write this routine, if someone wants to +volunteer to write this routine for any architecture, exception support +for that architecture will be added to g++. Please send in those code +donations. One other thing that needs to be done, is to double check +that __builtin_return_address (0) works. + +@subsection Specific Targets + +For the alpha, the __unwind_function will be something resembling: + +@example +void +__unwind_function(void *ptr) +@{ + /* First frame */ + asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */ + asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ + + /* Second frame */ + asm ("ldq $15, 8($30)"); /* fp */ + asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ + + /* Return */ + asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */ +@} +@end example + +@noindent +However, there are a few problems preventing it from working. First of +all, the gcc-internal function @code{__builtin_return_address} needs to +work given an argument of 0 for the alpha. As it stands as of August +30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c} +will definitely not work on the alpha. Instead, we need to define +the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe), +@code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new +definition for @code{RETURN_ADDR_RTX}. + +In addition (and more importantly), we need a way to reliably find the +frame pointer on the alpha. The use of the value 8 above to restore the +frame pointer (register 15) is incorrect. On many systems, the frame +pointer is consistently offset to a specific point on the stack. On the +alpha, however, the frame pointer is pushed last. First the return +address is stored, then any other registers are saved (e.g., @code{s0}), +and finally the frame pointer is put in place. So @code{fp} could have +an offset of 8, but if the calling function saved any registers at all, +they add to the offset. + +The only places the frame size is noted are with the @samp{.frame} +directive, for use by the debugger and the OSF exception handling model +(useless to us), and in the initial computation of the new value for +@code{sp}, the stack pointer. For example, the function may start with: + +@example +lda $30,-32($30) +.frame $15,32,$26,0 +@end example + +@noindent +The 32 above is exactly the value we need. With this, we can be sure +that the frame pointer is stored 8 bytes less---in this case, at 24(sp)). +The drawback is that there is no way that I (Brendan) have found to let +us discover the size of a previous frame @emph{inside} the definition +of @code{__unwind_function}. + +So to accomplish exception handling support on the alpha, we need two +things: first, a way to figure out where the frame pointer was stored, +and second, a functional @code{__builtin_return_address} implementation +for except.c to be able to use it. + +Or just support DWARF 2 unwind info. + +@subsection New Backend Exception Support + +This subsection discusses various aspects of the design of the +data-driven model being implemented for the exception handling backend. + +The goal is to generate enough data during the compilation of user code, +such that we can dynamically unwind through functions at run time with a +single routine (@code{__throw}) that lives in libgcc.a, built by the +compiler, and dispatch into associated exception handlers. + +This information is generated by the DWARF 2 debugging backend, and +includes all of the information __throw needs to unwind an arbitrary +frame. It specifies where all of the saved registers and the return +address can be found at any point in the function. + +Major disadvantages when enabling exceptions are: + +@itemize @bullet +@item +Code that uses caller saved registers, can't, when flow can be +transferred into that code from an exception handler. In high performance +code this should not usually be true, so the effects should be minimal. + +@end itemize + +@subsection Backend Exception Support + +The backend must be extended to fully support exceptions. Right now +there are a few hooks into the alpha exception handling backend that +resides in the C++ frontend from that backend that allows exception +handling to work in g++. An exception region is a segment of generated +code that has a handler associated with it. The exception regions are +denoted in the generated code as address ranges denoted by a starting PC +value and an ending PC value of the region. Some of the limitations +with this scheme are: + +@itemize @bullet +@item +The backend replicates insns for such things as loop unrolling and +function inlining. Right now, there are no hooks into the frontend's +exception handling backend to handle the replication of insns. When +replication happens, a new exception region descriptor needs to be +generated for the new region. + +@item +The backend expects to be able to rearrange code, for things like jump +optimization. Any rearranging of the code needs have exception region +descriptors updated appropriately. + +@item +The backend can eliminate dead code. Any associated exception region +descriptor that refers to fully contained code that has been eliminated +should also be removed, although not doing this is harmless in terms of +semantics. + +@end itemize + +The above is not meant to be exhaustive, but does include all things I +have thought of so far. I am sure other limitations exist. + +Below are some notes on the migration of the exception handling code +backend from the C++ frontend to the backend. + +NOTEs are to be used to denote the start of an exception region, and the +end of the region. I presume that the interface used to generate these +notes in the backend would be two functions, start_exception_region and +end_exception_region (or something like that). The frontends are +required to call them in pairs. When marking the end of a region, an +argument can be passed to indicate the handler for the marked region. +This can be passed in many ways, currently a tree is used. Another +possibility would be insns for the handler, or a label that denotes a +handler. I have a feeling insns might be the best way to pass it. +Semantics are, if an exception is thrown inside the region, control is +transferred unconditionally to the handler. If control passes through +the handler, then the backend is to rethrow the exception, in the +context of the end of the original region. The handler is protected by +the conventional mechanisms; it is the frontend's responsibility to +protect the handler, if special semantics are required. + +This is a very low level view, and it would be nice is the backend +supported a somewhat higher level view in addition to this view. This +higher level could include source line number, name of the source file, +name of the language that threw the exception and possibly the name of +the exception. Kenner may want to rope you into doing more than just +the basics required by C++. You will have to resolve this. He may want +you to do support for non-local gotos, first scan for exception handler, +if none is found, allow the debugger to be entered, without any cleanups +being done. To do this, the backend would have to know the difference +between a cleanup-rethrower, and a real handler, if would also have to +have a way to know if a handler `matches' a thrown exception, and this +is frontend specific. + +The stack unwinder is one of the hardest parts to do. It is highly +machine dependent. The form that kenner seems to like was a couple of +macros, that would do the machine dependent grunt work. One preexisting +function that might be of some use is __builtin_return_address (). One +macro he seemed to want was __builtin_return_address, and the other +would do the hard work of fixing up the registers, adjusting the stack +pointer, frame pointer, arg pointer and so on. + + +@node Free Store, Mangling, Exception Handling, Top +@section Free Store + +@code{operator new []} adds a magic cookie to the beginning of arrays +for which the number of elements will be needed by @code{operator delete +[]}. These are arrays of objects with destructors and arrays of objects +that define @code{operator delete []} with the optional size_t argument. +This cookie can be examined from a program as follows: + +@example +typedef unsigned long size_t; +extern "C" int printf (const char *, ...); + +size_t nelts (void *p) +@{ + struct cookie @{ + size_t nelts __attribute__ ((aligned (sizeof (double)))); + @}; + + cookie *cp = (cookie *)p; + --cp; + + return cp->nelts; +@} + +struct A @{ + ~A() @{ @} +@}; + +main() +@{ + A *ap = new A[3]; + printf ("%ld\n", nelts (ap)); +@} +@end example + +@section Linkage +The linkage code in g++ is horribly twisted in order to meet two design goals: + +1) Avoid unnecessary emission of inlines and vtables. + +2) Support pedantic assemblers like the one in AIX. + +To meet the first goal, we defer emission of inlines and vtables until +the end of the translation unit, where we can decide whether or not they +are needed, and how to emit them if they are. + +@node Mangling, Concept Index, Free Store, Top +@section Function name mangling for C++ and Java + +Both C++ and Jave provide overloaded function and methods, +which are methods with the same types but different parameter lists. +Selecting the correct version is done at compile time. +Though the overloaded functions have the same name in the source code, +they need to be translated into different assembler-level names, +since typical assemblers and linkers cannot handle overloading. +This process of encoding the parameter types with the method name +into a unique name is called @dfn{name mangling}. The inverse +process is called @dfn{demangling}. + +It is convenient that C++ and Java use compatible mangling schemes, +since the makes life easier for tools such as gdb, and it eases +integration between C++ and Java. + +Note there is also a standard "Jave Native Interface" (JNI) which +implements a different calling convention, and uses a different +mangling scheme. The JNI is a rather abstract ABI so Java can call methods +written in C or C++; +we are concerned here about a lower-level interface primarily +intended for methods written in Java, but that can also be used for C++ +(and less easily C). + +Note that on systems that follow BSD tradition, a C identifier @code{var} +would get "mangled" into the assembler name @samp{_var}. On such +systems, all other mangled names are also prefixed by a @samp{_} +which is not shown in the following examples. + +@subsection Method name mangling + +C++ mangles a method by emitting the function name, followed by @code{__}, +followed by encodings of any method qualifiers (such as @code{const}), +followed by the mangling of the method's class, +followed by the mangling of the parameters, in order. + +For example @code{Foo::bar(int, long) const} is mangled +as @samp{bar__C3Fooil}. + +For a constructor, the method name is left out. +That is @code{Foo::Foo(int, long) const} is mangled +as @samp{__C3Fooil}. + +GNU Java does the same. + +@subsection Primitive types + +The C++ types @code{int}, @code{long}, @code{short}, @code{char}, +and @code{long long} are mangled as @samp{i}, @samp{l}, +@samp{s}, @samp{c}, and @samp{x}, respectively. +The corresponding unsigned types have @samp{U} prefixed +to the mangling. The type @code{signed char} is mangled @samp{Sc}. + +The C++ and Java floating-point types @code{float} and @code{double} +are mangled as @samp{f} and @samp{d} respectively. + +The C++ @code{bool} type and the Java @code{boolean} type are +mangled as @samp{b}. + +The C++ @code{wchar_t} and the Java @code{char} types are +mangled as @samp{w}. + +The Java integral types @code{byte}, @code{short}, @code{int} +and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i}, +and @samp{x}, respectively. + +C++ code that has included @code{javatypes.h} will mangle +the typedefs @code{jbyte}, @code{jshort}, @code{jint} +and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i}, +and @samp{x}. (This has not been implemented yet.) + +@subsection Mangling of simple names + +A simple class, package, template, or namespace name is +encoded as the number of characters in the name, followed by +the actual characters. Thus the class @code{Foo} +is encoded as @samp{3Foo}. + +If any of the characters in the name are not alphanumeric +(i.e not one of the standard ASCII letters, digits, or '_'), +or the initial character is a digit, then the name is +mangled as a sequence of encoded Unicode letters. +A Unicode encoding starts with a @samp{U} to indicate +that Unicode escapes are used, followed by the number of +bytes used by the Unicode encoding, followed by the bytes +representing the encoding. ASSCI letters and +non-initial digits are encoded without change. However, all +other characters (including underscore and initial digits) are +translated into a sequence starting with an underscore, +followed by the big-endian 4-hex-digit lower-case encoding of the character. + +If a method name contains Unicode-escaped characters, the +entire mangled method name is followed by a @samp{U}. + +For example, the method @code{X\u0319::M\u002B(int)} is encoded as +@samp{M_002b__U6X_0319iU}. + + +@subsection Pointer and reference types + +A C++ pointer type is mangled as @samp{P} followed by the +mangling of the type pointed to. + +A C++ reference type as mangled as @samp{R} followed by the +mangling of the type referenced. + +A Java object reference type is equivalent +to a C++ pointer parameter, so we mangle such an parameter type +as @samp{P} followed by the mangling of the class name. + +@subsection Squangled type compression + +Squangling (enabled with the @samp{-fsquangle} option), utilizes the +@samp{B} code to indicate reuse of a previously seen type within an +indentifier. Types are recognized in a left to right manner and given +increasing values, which are appended to the code in the standard +manner. Ie, multiple digit numbers are delimited by @samp{_} +characters. A type is considered to be any non primitive type, +regardless of whether its a parameter, template parameter, or entire +template. Certain codes are considered modifiers of a type, and are not +included as part of the type. These are the @samp{C}, @samp{V}, +@samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting +constant, volatile, pointer, array, reference, unsigned, and restrict. +These codes may precede a @samp{B} type in order to make the required +modifications to the type. + +For example: +@example +template <class T> class class1 @{ @}; + +template <class T> class class2 @{ @}; + +class class3 @{ @}; + +int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @} + + B0 -> class2<class1<class3> + B1 -> class1<class3> + B2 -> class3 +@end example +Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}. +The int parameter is a basic type, and does not receive a B encoding... + +@subsection Qualified names + +Both C++ and Java allow a class to be lexically nested inside another +class. C++ also supports namespaces (not yet implemented by G++). +Java also supports packages. + +These are all mangled the same way: First the letter @samp{Q} +indicates that we are emitting a qualified name. +That is followed by the number of parts in the qualified name. +If that number is 9 or less, it is emitted with no delimiters. +Otherwise, an underscore is written before and after the count. +Then follows each part of the qualified name, as described above. + +For example @code{Foo::\u0319::Bar} is encoded as +@samp{Q33FooU5_03193Bar}. + +Squangling utilizes the the letter @samp{K} to indicate a +remembered portion of a qualified name. As qualified names are processed +for an identifier, the names are numbered and remembered in a +manner similar to the @samp{B} type compression code. +Names are recognized left to right, and given increasing values, which are +appended to the code in the standard manner. ie, multiple digit numbers +are delimited by @samp{_} characters. + +For example +@example +class Andrew +@{ + class WasHere + @{ + class AndHereToo + @{ + @}; + @}; +@}; + +f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @} + + K0 -> Andrew + K1 -> Andrew::WasHere + K2 -> Andrew::WasHere::AndHereToo +@end example +Function @samp{f()} would be mangled as : +@samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo} + +There are some occasions when either a @samp{B} or @samp{K} code could +be chosen, preference is always given to the @samp{B} code. Ie, the example +in the section on @samp{B} mangling could have used a @samp{K} code +instead of @samp{B2}. + +@subsection Templates + +A class template instantiation is encoded as the letter @samp{t}, +followed by the encoding of the template name, followed +the number of template parameters, followed by encoding of the template +parameters. If a template parameter is a type, it is written +as a @samp{Z} followed by the encoding of the type. + +A function template specialization (either an instantiation or an +explicit specialization) is encoded by an @samp{H} followed by the +encoding of the template parameters, as described above, followed by an +@samp{_}, the encoding of the argument types to the template function +(not the specialization), another @samp{_}, and the return type. (Like +the argument types, the return type is the return type of the function +template, not the specialization.) Template parameters in the argument +and return types are encoded by an @samp{X} for type parameters, or a +@samp{Y} for constant parameters, an index indicating their position +in the template parameter list declaration, and their template depth. + +@subsection Arrays + +C++ array types are mangled by emitting @samp{A}, followed by +the length of the array, followed by an @samp{_}, followed by +the mangling of the element type. Of course, normally +array parameter types decay into a pointer types, so you +don't see this. + +Java arrays are objects. A Java type @code{T[]} is mangled +as if it were the C++ type @code{JArray<T>}. +For example @code{java.lang.String[]} is encoded as +@samp{Pt6JArray1ZPQ34java4lang6String}. + +@subsection Static fields + +Both C++ and Java classes can have static fields. +These are allocated statically, and are shared among all instances. + +The mangling starts with a prefix (@samp{_} in most systems), which is +followed by the mangling +of the class name, followed by the "joiner" and finally the field name. +The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special +separator character. For historical reasons (and idiosyncracies +of assembler syntax) it can @samp{$} or @samp{.} (or even +@samp{_} on a few systems). If the joiner is @samp{_} then the prefix +is @samp{__static_} instead of just @samp{_}. + +For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax) +would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var} +(or rarely @samp{__static_Q23Foo3Bar_var}). + +If the name of a static variable needs Unicode escapes, +the Unicode indicator @samp{U} comes before the "joiner". +This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}. + +@subsection Table of demangling code characters + +The following special characters are used in mangling: + +@table @samp +@item A +Indicates a C++ array type. + +@item b +Encodes the C++ @code{bool} type, +and the Java @code{boolean} type. + +@item B +Used for squangling. Similar in concept to the 'T' non-squangled code. + +@item c +Encodes the C++ @code{char} type, and the Java @code{byte} type. + +@item C +A modifier to indicate a @code{const} type. +Also used to indicate a @code{const} member function +(in which cases it precedes the encoding of the method's class). + +@item d +Encodes the C++ and Java @code{double} types. + +@item e +Indicates extra unknown arguments @code{...}. + +@item E +Indicates the opening parenthesis of an expression. + +@item f +Encodes the C++ and Java @code{float} types. + +@item F +Used to indicate a function type. + +@item H +Used to indicate a template function. + +@item i +Encodes the C++ and Java @code{int} types. + +@item J +Indicates a complex type. + +@item K +Used by squangling to compress qualified names. + +@item l +Encodes the C++ @code{long} type. + +@item n +Immediate repeated type. Followed by the repeat count. + +@item N +Repeated type. Followed by the repeat count of the repeated type, +followed by the type index of the repeated type. Due to a bug in +g++ 2.7.2, this is only generated if index is 0. Superceded by +@samp{n} when squangling. + +@item P +Indicates a pointer type. Followed by the type pointed to. + +@item Q +Used to mangle qualified names, which arise from nested classes. +Also used for namespaces. +In Java used to mangle package-qualified names, and inner classes. + +@item r +Encodes the GNU C++ @code{long double} type. + +@item R +Indicates a reference type. Followed by the referenced type. + +@item s +Encodes the C++ and java @code{short} types. + +@item S +A modifier that indicates that the following integer type is signed. +Only used with @code{char}. + +Also used as a modifier to indicate a static member function. + +@item t +Indicates a template instantiation. + +@item T +A back reference to a previously seen type. + +@item U +A modifier that indicates that the following integer type is unsigned. +Also used to indicate that the following class or namespace name +is encoded using Unicode-mangling. + +@item u +The @code{restrict} type qualifier. + +@item v +Encodes the C++ and Java @code{void} types. + +@item V +A modifier for a @code{volatile} type or method. + +@item w +Encodes the C++ @code{wchar_t} type, and the Java @code{char} types. + +@item W +Indicates the closing parenthesis of an expression. + +@item x +Encodes the GNU C++ @code{long long} type, and the Java @code{long} type. + +@item X +Encodes a template type parameter, when part of a function type. + +@item Y +Encodes a template constant parameter, when part of a function type. + +@item Z +Used for template type parameters. + +@end table + +The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p} +also seem to be used for obscure purposes ... + +@node Concept Index, , Mangling, Top + +@section Concept Index + +@printindex cp + +@bye |