Paul Khuong [Tue, 21 May 2013 19:11:54 +0000 (15:11 -0400)]
Additional niceties and middle end support for short vector SIMD packs
* Allow FASL loading/dumping of (boxed) SIMD packs, and mark them as
trivially (i.e. without going through make-load-form) dumpable.
* SIMD packs print nicely, and take the element type into account while
doing so.
* (C)TYPE-OF is more accurate for SIMD packs; this enables IR2 conversion
to choose the right primitive type and storage class for constants.
The FASL code was kept on life support by Alexander Gavrilov for too many years,
and the printing logic is a very light adaptation of the output code he developed
for his branch.
Paul Khuong [Tue, 21 May 2013 19:11:26 +0000 (15:11 -0400)]
Back end work for short vector SIMD packs
* Platform-agnostic changes:
- Declare type testing/checking routines.
- Define three primitive types: simd-pack-double for packs
of doubles, simd-pack-single for packs of singles, and
simd-pack-int for packs of integer/unknown.
- Define a heap-representation for 128-bit SIMD packs,
along with reserving a widetag and filling the corresponding
entries in gencgc's tables.
- Make the simd-pack class definition fully concrete.
- Teach IR1 how to expand SIMD-PACK type checks.
- IR2-conversion maps SIMD-PACK types to the right primitive type.
- Increase the limit on the number of storage classes: SIMD packs
went way past the previous (arbitrary?) limit of 40.
* Platform-specific changes, in src/compiler/target/simd-pack:
- Create new storage classes (that are backed by the float-reg [i.e. SSE]
storage base): one for each of double, single and integer sse packs.
- Also create the corresponding immediate-constant and stack storage
classes.
- Teach the assembler and the inline constant code about this new kind
of registers/constants, and how to map constant SIMD-PACKs to which SC.
- Define movement/conversion VOPs for SSE packs, along with VOP routines
needed for basic creation/manipulation of SSE packs.
- The type-checking VOP in generic/late-type-vops is extremely
x86-64-specific... IIRC, there are ordering issues I do not
want to tangle with.
* Implementation idiosyncrasy: while type *tests* (i.e. TYPEP calls) consider
the element type, type *checks* (e.g. THE or DECLARE) only check for
SIMD-PACKness, without looking at the element type. This is allowed by the
standard, is similar to what Python does for FUNCTION types, and helps
code remain efficient even when type checks can't be fully elided.
The vast majority of the code is verbatim or heavily inspired by Alexander
Gavrilov's branch.
Paul Khuong [Tue, 21 May 2013 19:10:50 +0000 (15:10 -0400)]
Front end infrastructure for short vector SIMD packs
* new feature, sb-simd-pack.
* define a new IR1 type for SIMD packs:
- (SB!KERNEL:SIMD-PACK [eltype]), where [eltype] is a subtype
of the plaform-specific SIMD element type universe, or * (default),
the union of all these possibilities;
- Element types are always upgraded to the platform's element type
(small) universe, so we can easily manipulate unions of SIMD-PACK
types by working in terms of the element types.
* immediately specify the universe of SIMD pack element types
(sb!kernel:*simd-pack-element-types*) for x86-64, to ensure
#!+sb-simd-pack buildability.
* declare basic functions to create/manipulate SIMD packs:
- simd-pack-p is the basic type predicate;
- %simd-pack-tag returns a fixnum tag associated with each SIMD-PACK;
currently, we suppose it only encodes the element type, as the
position of the element type in *simd-pack-element-types*;
- %make-simd-pack creates a 128-bit SIMD pack from a tag and two
64 bit integers;
- %make-simd-pack-double creates an appropriately-tagged pack from
two double floats;
- %make-simd-pack-single creates a tagged pack from four single
floats;
- %make-simd-pack-ub{32,64} creates a tagged pack from four 32 bit
or two 64 bit integers;
- %simd-pack-{low,high} returns the low/high integer half of a
128 bit pack;
- %simd-pack-ub{32,64}s returns the four integer quarters or two
integer halves of a 128 bit pack;
- %simd-pack-singles returns the four singles in a 128 bit pack;
- %simd-pack-doubles returns the two doubles in a 128 bit pack.
Alexander Gavrilov kept a branch alive for the last couple years. The
creation/manipulation primitives are largely taken from that branch,
or informed by the branch's usage.
Stas Boukarev [Tue, 21 May 2013 11:05:19 +0000 (15:05 +0400)]
Fix foreign-symbol-address transform on +sb-dynamic-core.
Badly placed ` was resulting in a wrong result.
Paul Khuong [Tue, 21 May 2013 00:02:04 +0000 (20:02 -0400)]
Make some instances of IF/IF conversion more direct
When faced with CFGs that look like (if (if ...) ...), we duplicate
the outer NULL test forward in the branches (and jump to the correct
branch, so very little code is duplicated). However, this transform
depends on later ir1 optimisation to handle patterns like
(if (if ... nil t) ...). Try and get them right with a specialised
rewrite to get good code even when ir1opt doesn't run until fixpoint.
Also, refactored the code a bit while working on it.
Paul Khuong [Mon, 20 May 2013 22:14:43 +0000 (18:14 -0400)]
Exploit specialised VOPs for EQL of anything/constant fixnum
By swapping constant arguments to the right ourselves before
strength reducing EQL into EQ, rather than erroneously using
commutative-arg-swap.
Spotted by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 21:38:19 +0000 (17:38 -0400)]
More efficient integer=>word conversion and fixnump tests on x86-64
* Special-case on 63-bit fixnums to detect non-zero fixnum tag bits
with a shift right when converting fixnum-or-bignum to ub64.
* In fixnump/unsigned-byte-64, use MOVE to avoid useless mov x, x.
* In fixnump/signed-byte-64, use the conversion's left shift to
detect overflows.
* Based on a patch by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 20:58:30 +0000 (16:58 -0400)]
Cleverer handling of medium (32 < bit width <= 64) constants on x86-64
* Exploit sign-extension for large unsigned constants.
* Always force the remaining operand and the result in a register:
in the worst case, we use a RIP-relative unboxed constant.
* Based on a patch by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 19:26:44 +0000 (15:26 -0400)]
POPCNT instruction on x86-64
Patch by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 19:17:36 +0000 (15:17 -0400)]
Fix disassembly for BT* instructions on x86oids
* A dedicated instruction format gets the details right.
* Patch by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 19:02:45 +0000 (15:02 -0400)]
Annotate disassembly with unboxed constant values
* Only on x86-64, for qword-sized values.
* Patch by Douglas Katzman.
Paul Khuong [Mon, 20 May 2013 18:49:33 +0000 (14:49 -0400)]
Improved local call analysis for inlined higher-order functions
Locall analysis greatly benefits from forwarding function arguments to
their use site. Do that in locall and hopefully trigger further rewrites,
rather than waiting for a separate ir1opt phase to do its magic.
Paul Khuong [Mon, 20 May 2013 18:11:48 +0000 (14:11 -0400)]
Constant-fold backquote of constant expressions
* There is no guarantee that backquote expressions cons up fresh
storage, so we are free to allocate (sub)lists or vectors at
compile-time. In addition to regular constant-folding, perform
part of LIST/LIST*/APPEND at compile-time.
* Fix one instance of CL:SORT of now-literal data.
* Implement SB!IMPL:PROPER-LIST-P because BACKQ-APPEND needed that.
* Based on a patch by James Y Knight; closes lp#1026439.
Paul Khuong [Mon, 20 May 2013 16:19:27 +0000 (12:19 -0400)]
Enable (type-directed) constant folding for LOGTEST on x86oids and PPC
* COMBINATION-IMPLEMENTATION-STYLE can return :maybe. Like :default,
it enables transforms, but transforms can call C-I-S themselves to
selectively disable rewrites.
* Implement type-directed constant folding for LOGTEST. !x86oids/PPC
platforms get that for free via inlining.
* Use :maybe to enable all LOGTEST transforms except inlining.
Paul Khuong [Mon, 20 May 2013 15:36:21 +0000 (11:36 -0400)]
Exploit associativity to fold more constants
* Implement transforms for logand, logior, logxor and logtest to
detect patterns like (f (f x k1) k2) => (f x (f k1 k2)).
* Same for + and * of rational values.
* Similar logic for mask-signed-field: we only need to keep the
narrowest width.
Alastair Bridgewater [Mon, 20 May 2013 19:43:19 +0000 (15:43 -0400)]
room: Fix reconstituting CONS cells with unbound-marker in the CAR.
* When I originally rewrote ROOM in terms of RECONSTITUTE-OBJECT,
I looked at what constitutes a valid CONS according to the runtime.
I noticed that one of the immediate types was an unbound marker
and said to myself "nobody's going to put one of those in a list".
This turned out to be a mistake.
* x86 systems (and plausibly not any others) put unbound-markers
in lists when loading FASLs. I have no real idea how or why, but
they do. This would lead to an error, "Unrecognized widetag #x4A
in reconstitute-object".
* Fix, by recording unbound-marker-widetag as being valid as the
first word of a CONS cell.
* Issue reported by "scymtym" on #sbcl.
Alastair Bridgewater [Sun, 19 May 2013 17:00:52 +0000 (13:00 -0400)]
gencgc: Decide earlier about pinning large object pages.
* The old logic here called maybe_adjust_large_object(), and
then re-checked the pointer to preserve for validity. This is
non-optimal, as it means that maybe_adjust_large_object can't
promote pages to newspace directly, it instead merely adjusts the
page allocation to fit the possibly-shrunken object.
* It turns out that large_object pages can contain bignums,
vectors, code-objects, or in unusual cases instances. Neither
bignums, vectors, nor instances can contain embedded objects.
Code-objects can contain only functions or LRAs. None of these
objects have list-pointer-lowtag on their references. The "tail"
of a shrunken object is comprised of conses with both cells as
fixnum zero. The minor catch is that we allow untagged pointers
to pin code-allocated pages, but the saving grace here is that
code-objects don't shrink.
* Alter preserve_pointer() to test the lowtag and page type to
check for invalid pointers to large-object pages before calling
maybe_adjust_large_object() instead of bounds-checking the pointer
after the fact.
Alastair Bridgewater [Tue, 14 May 2013 22:45:30 +0000 (18:45 -0400)]
gencgc: Fix potential out-of-bounds access in page_ends_contiguous_block_p().
* If we're testing to see if the LAST page in dynamic space is
the end of a contiguous block, and it is a full page (bytes_used
is GENCGC_CARD_BYTES), we turn around and start investigating the
next page table entry... but there isn't one, it's beyond the end
of the allocation.
* Fix, by bounds-testing the page index against the index of the
high-water mark for dynamic space. This is guaranteed to be no
more than the total maximum for the page table, and is slightly
more micro-efficient than using the actual maximum, as any page
after the high-water mark will be page_free_p().
Alastair Bridgewater [Tue, 14 May 2013 22:39:06 +0000 (18:39 -0400)]
gencgc: Introduce a new predicate, page_ends_contiguous_block_p().
* There are a number of places in gencgc where a number of
attributes of a page and possibly the subsequent page are tested
for various values. Invariably, this is actually testing to see
if a page ends a contiguous block.
* Extract the various tests to a new inlined predicate function,
page_ends_contiguous_block_p(), thus revealing the intent of
what's going on far better than the bare tests, and coalescing the
code to a single copy to make it easier to fix if there is a bug
in it (and there is, but this is a refactoring commit, not a
behavior change commit).
Alastair Bridgewater [Tue, 14 May 2013 00:57:03 +0000 (20:57 -0400)]
gencgc: Introduce a new predicate, page_starts_contiguous_block_p().
* There are a number of places in gencgc where scan_start_offset
for a page is tested for zero. Invariably, this is actually
testing to see if a page starts a contiguous block... Or starts
on an object boundary.
* Extract the various tests for a zero scan_start_offset to a
new inlined predicate function, page_starts_contiguous_block_p(),
thus revealing the intent of what's going on far better than the
bare test.
Alastair Bridgewater [Mon, 13 May 2013 23:19:43 +0000 (19:19 -0400)]
gencgc: Rename page_table field region_start_offset to scan_start_offset.
* Let's call it what it is: The offset from where to start any
scan through the page to the start of the page. The only relation
this field has to an alloc_region is the way it is initialized.
Alastair Bridgewater [Mon, 13 May 2013 22:41:11 +0000 (18:41 -0400)]
gencgc: Commentary fix for struct page, field region_start_offset.
* Simply describing region_start_offset as being related to an
allocation region which contains a page is disingenuous at best,
and misleading at worst. Its relation with an alloc_region is due
to its initialization strategy, and has nothing to do with what
the value is for.
* Say it like it is, it's an offset to a known object boundary,
from where we can start a call to gc_search_space() or scavenge().
That's what it's for, not for keeping track of alloc_regions.
Alastair Bridgewater [Sun, 12 May 2013 14:49:55 +0000 (10:49 -0400)]
gencgc: Defer moving pinned pages to newspace as late as possible.
* Rather than moving pinned pages to newspace immediately, defer
moving them until just before we start to scavenge (evacuate) all
of the oldpsace pages.
* This, in theory, makes it easier to move pages to newspace if
they are mostly-live, rather than having to allocate new pages for
the data (increasing peak address-space use during GC), assuming
that we know that some page meets such criteria.
* While we're here, commentary updates also replace an "XX I'd
rather not do this but the GC logic can't cope with not doing it"
with an actual explanation of WHY it needs to be done. In fact,
commentary updates explain it twice, in two different locations.
Alastair Bridgewater [Sun, 12 May 2013 15:43:09 +0000 (11:43 -0400)]
gencgc: Fix commentary for page table allocation field.
* The commentary for the page table allocation field was
misleading, presumably not updated when the definitions for the
constants used for its actual contents were last changed, and cost
me a bit of surprise and time spent trying to figure out why core
file saving and loading worked at all.
* Updated the commentary on the allocation field to match
current reality, and added cross-references between the field
itself and the definitions for its contents, so that a future
desync between commentary and reality is less likely.
Paul Khuong [Mon, 20 May 2013 14:40:00 +0000 (10:40 -0400)]
More robust function-name testing in CUT-TO-WIDTH
Let's use lvar-fun-name instead of replicating half the logic; as
a bonus, modularity transforms now heeds NOTINLINE.
Paul Khuong [Mon, 20 May 2013 05:03:01 +0000 (01:03 -0400)]
Fix (CONCATENATE 'null ...) for generic sequences
* (CONCATENATE 'NULL SEQUENCE1 SEQUENCE2 ...) ensures that SEQUENCE1,
SEQUENCE2, ... are empty, but only did so for lists and
vectors. Instead, use new function EMPTYP which works for all
sequences. EMPTYP is not exported.
* Add generic function SEQUENCE:EMPTYP to which EMPTYP dispatches for
generic sequences. Methods for lists, vectors and generic sequences
use NULL or (ZEROP (LENGTH ...)).
* Test cases in seq.impure.lisp.
* Patch by Jan Moringen; fixes lp#1162301.
Paul Khuong [Mon, 20 May 2013 04:40:33 +0000 (00:40 -0400)]
Print intermediate evaluation results for some ASSERTed expressions
* The reports of errors signaled by ASSERT now print intermediate
evaluation results under the following conditions:
1. The ASSERTed expression is known to be a function call.
2. Arguments in the call are not constants.
* Test the new feature in condition.impure.lisp.
* Original patch from Alexandra Barchunova; closes lp#789497.
Paul Khuong [Mon, 20 May 2013 03:47:13 +0000 (23:47 -0400)]
Take bitwidth into account in BOOLEAN alien type
Some ABIs (x86/x86-64) allow garbage in unused upper bits of return
values; take that into account when converting BOOLEAN return types
to CL BOOLEANs.
Paul Khuong [Mon, 20 May 2013 03:27:26 +0000 (23:27 -0400)]
Declare the argument type for float-radix
Otherwise, inlined copies sometimes skip the type check.
Paul Khuong [Sun, 19 May 2013 23:35:30 +0000 (19:35 -0400)]
Enable dumping huge (> 64k) pages in genesis
bvectors for a page can be smaller than the page size; zero-padding
explicitly as needed lets the build proceed further along before
failing, when configured with large GC card size.
Stas Boukarev [Sun, 19 May 2013 16:37:27 +0000 (20:37 +0400)]
Make ir1-convert-hairy-lambda safe for non-local exits.
The function it calls may throw a tag, locall-already-let-converted,
which will leave a partially initialized optional-dispatch structure
in new-functionals of the current component, which may cause problems
down the line.
Fixes lp#1180992.
Also add a test-case for f3a2cd.. "Add a stub for %other-pointer-p.".
Paul Khuong [Sun, 19 May 2013 15:12:43 +0000 (11:12 -0400)]
Free-er form FILTER-LVAR
The DUMMY argument can now be in any argument position. Use that
in CUT-TO-WIDTH instead of ((lambda (...) ...) ...) hack.
Paul Khuong [Sun, 19 May 2013 14:14:35 +0000 (10:14 -0400)]
More robust FILTER-LVAR through CASTs
* IR1-conversion can insert casts between a combination and its
arguments. Handle that case via principal-lvar{-use}.
* Fixes a regression in b111015 (lp#1181684).
Christophe Rhodes [Sat, 18 May 2013 20:46:33 +0000 (21:46 +0100)]
NEWS entries for Unicode normalization work
Christophe Rhodes [Sat, 18 May 2013 12:37:55 +0000 (13:37 +0100)]
implement primary and canonical composition, and hence NFC/NFKC
Read in the non-algorithmically-specified composition exclusions from
Unicode's CompositionExclusions.txt file, and generate a hash table
using the concatenated 42 bits of code points. This is a bit of a
sucky hash-table key, particularly on 32-bit platforms; I have a plan
to reduce the key to 24 bits (using some auxiliary information in ucd)
but the advantage of getting this try in is...
... hook in NFC/NFKC into normalization tests, and check that tests
pass.
Christophe Rhodes [Sat, 18 May 2013 09:54:12 +0000 (10:54 +0100)]
actually run Part3 of Unicode Normalization tests
Christophe Rhodes [Thu, 9 May 2013 14:22:26 +0000 (15:22 +0100)]
better UCD treatment of characters not allocated by Unicode
fixes lp#1178038 (reported by Ken Harris)
Christophe Rhodes [Fri, 19 Apr 2013 08:17:15 +0000 (09:17 +0100)]
finish handling NormalizationTest test vectors
NFC/NFKC still not hooked in, but otherwise complete.
Christophe Rhodes [Thu, 18 Apr 2013 21:20:58 +0000 (22:20 +0100)]
first cut at testing unicode normalization
Parts 0 and 1 from Unicode NormalizationTest.txt, fully tested for
NFD and NFKD.
Christophe Rhodes [Thu, 18 Apr 2013 19:14:21 +0000 (20:14 +0100)]
add a comment about one-basing the character tables
Christophe Rhodes [Thu, 18 Apr 2013 17:03:09 +0000 (18:03 +0100)]
apply recursive decomposition in DECOMPOSE-STRING
We should really precompute the result of the recursion during the build;
working on getting tests up and running so that we can check whether
we've done that correctly.
Christophe Rhodes [Sun, 14 Apr 2013 19:02:08 +0000 (20:02 +0100)]
fix test for Blocked condition in canonical normalization
Would most likely otherwise fail in Jamo with combining characters in
between.
Christophe Rhodes [Sun, 14 Apr 2013 19:01:18 +0000 (20:01 +0100)]
improve normalize-string
* now works on non-simple strings;
* more likely to be correct under #!-sb-unicode
Christophe Rhodes [Sun, 14 Apr 2013 15:40:24 +0000 (16:40 +0100)]
comment on LSTRING implementation
Christophe Rhodes [Wed, 27 Mar 2013 12:58:19 +0000 (12:58 +0000)]
handle Hangul syllable decomposition
Entries for the codepoint range (#xac00 -- #xd7a3) have 1 for
their decomposition-info, a decomposition length of 2 or 3, but
a zero decomposition index (the decomposition is handled
algorithmically instead).
Christophe Rhodes [Fri, 22 Mar 2013 12:03:24 +0000 (12:03 +0000)]
work-in-progress towards full normalization support
Christophe Rhodes [Sun, 17 Mar 2013 21:23:59 +0000 (21:23 +0000)]
beginnings of decomposition
Store enough information in output from ucd.lisp to be able to actually
decompose individual characters. Include proof-of-concept implementation
of decomposition, not hooked into anything yet.
Christophe Rhodes [Sun, 17 Mar 2013 09:23:36 +0000 (09:23 +0000)]
delete now-unused code from ucd.dat
Christophe Rhodes [Fri, 15 Mar 2013 21:44:31 +0000 (21:44 +0000)]
Incorporate some decomposition information in ucd table
Oh boy. This one is quite intricate. We have two bytes free in
the 8-byte entries for information about characters, so use one of
them to indicate if the character has a decomposition, and if so of
what kind it is. Adapt the ucd.lisp tools-for-build code to
parse and preserve that information.
However, this causes there to be more than 256 distinct possible
classes of character known to the system: not a problem in principle,
but Teemu Kalvas' implementation of the double indirection depended on
having a one-byte index. But since Unicode characters are limited to
21 bits, with a careful packing scheme we can in fact steal 3 more bits
for the index, at the cost of needing to do an extra memory reference
and some arithmetic to reconstruct the index. (In the process, change
the endianness of the ucd.dat filesystem representation, because it's
easier that way).
But wait, there's more. Before, there were only two kinds of
lower-case characters: those whose upper-case transformation
lowercase back to the original character, and those where there is
no round-trip. (The former are cl:lower-case-p, the latter aren't).
This gave rise to straightforward implementations of lower-case-p
and friends; in the new world, where there are multiple different
kinds of lower-case characters (with various decomposition classes)
we need to adjust the implementations, still fairly straightforward,
of lower-case-p and related functions.
The extra information provided in the ucd table by this commit
is largely useless on its own; the next step is to incorporate
the actual decomposition data. Stay tuned.
Christophe Rhodes [Fri, 15 Mar 2013 14:29:57 +0000 (14:29 +0000)]
MORE COMMENT regarding the careful format of the encoded UCD data
Christophe Rhodes [Wed, 13 Mar 2013 12:33:24 +0000 (12:33 +0000)]
update to unicode 6.2
Paul Khuong [Sat, 18 May 2013 00:22:44 +0000 (20:22 -0400)]
Complete cut-to-width
* Insert logand/mask-signed-field even around references to variables
in modular arithmetic: avoid recursive rewriting by disabling the
transform when the destination is a direct logand/mask-signed-field
combination.
* Fixes lp#1026634 (reported by Anton Marsden on sbcl-devel).
Paul Khuong [Fri, 17 May 2013 21:44:12 +0000 (17:44 -0400)]
More efficient MASK-SIGNED-FIELD
Word => signed-word and {word, signed-word} => fixnum conversions
are implemented with unchecked move VOPs.
Paul Khuong [Fri, 17 May 2013 21:21:55 +0000 (17:21 -0400)]
Insert typechecks before RAW-INSTANCE-INIT in structure constructors
* Usually, FTYPE declarations ensure that happens, but multiple
inlining of the same structure constructor cause strangeness.
* Fixed lp#1177703, reported by Jan Moringen.
Paul Khuong [Fri, 17 May 2013 20:54:06 +0000 (16:54 -0400)]
More robust erroneous local call detection
* When possible, convert known bad calls into calls to error-signaling
stubs.
* Fixes lp#504121 (and likely other occurrences of
"failed AVER (ZEROP (HASH-TABLE-COUNT ...))."
Paul Khuong [Fri, 17 May 2013 20:08:00 +0000 (16:08 -0400)]
COMPILE-FILE shouldn't "attempt to dump invalid structure" anymore
* When CAST nodes detect definite type mismatch, they are replaced
with debugging instrumentation to provide source locations at
compile and run -time. When code is generated internally, the
source can include literal internal data structures. Skip those
when recovering source locations.
* Fixes lp#943953 and a bunch of equally baffling duplicates.
Paul Khuong [Fri, 17 May 2013 19:25:50 +0000 (15:25 -0400)]
Recover full backtraces with generic arithmetic on x86 and x86-64
* Errors in generic arithmetic (or comparisons) used to hide the caller
in the backtrace: it was replaced with a frame in the anonymous
assembly stub.
* Regression since 1.0.24.35, fixes lp#800343.
* Also remove a misleading FIXME in typed-accessor-definitions
(reported by Matt Novenstern in lp#1171646).
Stas Boukarev [Thu, 16 May 2013 17:08:48 +0000 (21:08 +0400)]
Add a stub for %other-pointer-p.
Otherwise the VOP isn't translated on literal objects, and
sb-sequence:do-sequence stops working on literal vectors. Having a
stub allows constant folding to work.
Reported by adeht on #lisp.
Stas Boukarev [Thu, 16 May 2013 12:51:47 +0000 (16:51 +0400)]
loop: remove code size-estimation.
Loop has a facility to determine whether it's ok to duplicate variable
initialization and stepping code when the variable preceding it has
different initialization and stepping forms. The code which determines
code size is quite strange and it may have been relevant 20 years ago
on primitive implementations, but not anymore, and people who really
care about code size would use functions, which will also improve code
readability.
As a side effect, it fixes a bug which was present in the
estimate-code-size function.
Fixes lp#1178989.
Stas Boukarev [Mon, 13 May 2013 15:40:20 +0000 (19:40 +0400)]
Fix describe-object for characters.
Don't prefix lines with ":_".
Alastair Bridgewater [Sat, 11 May 2013 16:31:40 +0000 (12:31 -0400)]
early-alieneval: Fix package-related thinko with saved-fp-and-pc logic.
* Forgot a package prefix for GET-LISP-OBJ-ADDRESS, because the
function symbol was exported from SB!ALIEN-INTERNALS yet defined
in SB!ALIEN, and the prefix wouldn't have been necessary from
SB!ALIEN-INTERNALS.
* Thanks to Stas Boukarev for the heads-up.
Alastair Bridgewater [Tue, 30 Apr 2013 02:56:14 +0000 (22:56 -0400)]
code/room: Completely rewrite MAP-ALLOCATED-OBJECTS.
* The old version of M-A-O consisted of bizaare toplevel logic,
a scheme for figuring out what each heap object was and its size
that did not parallel what the garbage collector used and may or
may not have been correct, and relied heavily on inlining to
reduce consing.
* This new version of M-A-O uses straightforward toplevel logic,
a scheme for figuring out what each heap object is and its size
that directly parallels what the garbage collector uses and is
verifiably correct, and relies heavily on the aligned unboxed
pointer to fixnum equivalence to reduce consing.
* The new interface to M-A-O no longer includes the optional
"careful" argument, as it gains us nothing once the underlying
mechanism is so obviously correct. sb-introspect has been updated
appropriately.
* The way the new implementation walks the heap and page table
requires direct access to a "static" global variable in gencgc.c,
so the "static" attribute has been removed.
* This implementation has been lightly tested on an x86-64 and
PPC, and it seems to work quite well, but there are still some
fairly obvious non-optimalities in terms of generated code (as
seen in the trace-file output from the cross compiler). It does
pass the two test cases that exhausted the heap on PPC with the
previous implementation.
Alastair Bridgewater [Sat, 27 Apr 2013 12:31:22 +0000 (08:31 -0400)]
code/room: Improve type-format database initialization for simple vector types.
* There has been a longstanding FIXME comment on a piece of code
which contains a hand-maintained list of specialized vector types
and the shift count for converting the length from elements to
octets.
* It turns out that all of this information, plus the type names
that we currently do a song-and-dance with INTERN, SUBSEQ, and
MISMATCH to obtain, plus information for the string types, is
available from *SPECIALIZED-ARRAY-ELEMENT-TYPE-PROPERTIES*. And
*S-A-E-T-P* is guaranteed to be up-to-date, as it's too central to
our implementation of UPGRADED-ARRAY-ELEMENT-TYPE and MAKE-ARRAY
for it to be allowed to break.
* So, replace nasty KLUDGE of an initialization for simple
vector types with something more principled, making it explicit
which properties need to be derived and which are simply already
available, and picking off the one specialized array type that
needs to be handled differently (SIMPLE-ARRAY-NIL).
Alastair Bridgewater [Sat, 11 May 2013 13:48:47 +0000 (09:48 -0400)]
NEWS updates.
* Forgot to add a NEWS update to my recent commit involving the
internal-error logic.
* And clarify that only vectors of boxed items may be stack-
allocated on PPC.
Alastair Bridgewater [Wed, 8 May 2013 01:46:43 +0000 (21:46 -0400)]
code/interr: Hook internal error contexts into the saved-fp-and-pc mechanism.
* This covers the unfortunate case of a signal handler not
having an unbroken stack frame chain to the interrupted context,
which actually occurs on threaded x86-64 FreeBSD systems.
* Use the existing saved-fp-and-pc mechanism, used for
ALIEN-FUNCALL to cover for code compiled -fomit-frame-pointer to
treat the internal error context as an alien funcall point.
Alastair Bridgewater [Thu, 9 May 2013 21:16:58 +0000 (17:16 -0400)]
Allow inlining more calls to INVOKE-WITH-SAVED-FP-AND-PC during XC.
* The INVOKE-WITH-SAVED-FP-AND-PC mechanism was defined in
ALIENCOMP, which occurs well after the first uses of ALIEN-FUNCALL,
thus preventing it from being inlined when used during XC (by
default, only on x86).
* Fix, by relocating the mechanism from SB!C to
SB!ALIEN-INTERNALS and from COMPILER;ALIENCOMP to
CODE;EARLY-ALIENEVAL.
* Also relocate and publish symbols for all of the magic from
SB!ALIEN-INTERNALS.
Stas Boukarev [Tue, 7 May 2013 18:47:31 +0000 (22:47 +0400)]
sb-introspect:find-definition-sources-by-name: more defoptimizer types.
Look for sb-c:ir2-convert and sb-c::stack-allocate-result defoptimizer types.
Lutz Euler [Mon, 6 May 2013 17:37:23 +0000 (19:37 +0200)]
Make CONTAINING-INTEGER-TYPE take N-WORD-BITS into account.
Replace the hardcoded 32s in the function with N-WORD-BITS. This in turn
allows SOURCE-TRANSFORM-NUMERIC-TYPEP to find better transformations
for type checks for types larger than fixnum but smaller than a machine
word also under 64-bit word size. The most important improvement this
achieves is to avoid generic arithmetic for the bounds tests in these
cases.
For example, the test (TYPEP X '(UNSIGNED-BYTE 63)) runs about four
times as fast on x86-64 with this change.
(Tests for the exact types (SIGNED-BYTE 64) and (UNSIGNED-BYTE 64) are
unaffected as compiling them takes another code path, which already
generates well optimized code.)
Lutz Euler [Mon, 6 May 2013 12:04:02 +0000 (14:04 +0200)]
Make %EMIT-ALIGNMENT be more friendly to multi-byte NOPs.
When %EMIT-ALIGNMENT needs to tighten the alignment, it used to emit
a fixed-size skip first and an alignment note afterwards. On x86-64,
where block headers are aligned using multi-byte NOPs, this could
lead to emitting one more such NOP than needed to span the desired
range, unnecessarily increasing the number of machine instructions
the processor needs to decode.
To avoid that, change %EMIT-ALIGNMENT to only emit an alignment note
(covering both the fixed-size skip and the alignment note from the
original version) in this situation.
An example of the difference, from the disassembly of
SB-C::FLATTEN-LIST:
Before:
896: L0: 8F4508 POP QWORD PTR [RBP+8]
899: 0F1F00 NOP
89C:
0F1F4000 NOP
8A0: L1:
4881F917001020 CMP RCX,
537919511
Afterwards:
896: L0: 8F4508 POP QWORD PTR [RBP+8]
899:
0F1F8000000000 NOP
8A0: L1:
4881F917001020 CMP RCX,
537919511
Stas Boukarev [Sun, 5 May 2013 16:51:15 +0000 (20:51 +0400)]
Better type derivation for APPEND, NCONC, LIST.
The result types of APPEND/NCONC depend on the last argument and the
presence of conses in the middle.
For example (append 42) => 42, (append nil nil 42) => 42,
(append (list 1) 42) => (1 . 42), etc.
LIST returns NIL in case of no arguments and a cons in other
cases. That fact required an adjustment for a values-list optimizer,
which removed all arguments from a LIST call making it change the type
from LIST to NULL and confusing things.
Closes lp#538957
Stas Boukarev [Sat, 4 May 2013 22:54:51 +0000 (02:54 +0400)]
Micro-optimize values-list.
Compare a register with nil-value directly, without going through a
temporary register.
Stas Boukarev [Sat, 4 May 2013 00:14:21 +0000 (04:14 +0400)]
sb-introspect:find-definition-sources-by-name: find VOPs by name.
(sb-introspect:find-definition-sources-by-name x :vop) now
also returns VOPs which do not translate any functions.
Martin Cracauer [Fri, 3 May 2013 22:19:30 +0000 (18:19 -0400)]
Commiting fix by Doug Katzman: disassembler missing ",8" on SHLD
Alastair Bridgewater [Fri, 3 May 2013 03:37:56 +0000 (23:37 -0400)]
NEWS: Updates for recent PPC changes.
* I recently committed a small batch of PPC changes, but none of
them had NEWS entries. This was not quite an oversight, as the
original changes were written prior to the 1.1.7 release, but now
that they've been committed they should at least be mentioned as
NEWS.
Stas Boukarev [Wed, 1 May 2013 10:15:45 +0000 (14:15 +0400)]
Correct integer-length on fixnums on x86-64 when n-fixnum-tag-bits > 1.
Use SAR, not SHR for untagging, to preserve the sign.
Thanks to Paul Khuong.
Alastair Bridgewater [Thu, 25 Apr 2013 19:02:12 +0000 (15:02 -0400)]
tests/dynamic-extent.impure.lisp: One of the dx-vector test terms was misplaced.
* MAKE-ARRAY-ON-STACK-1 tries to create a specialized vector,
but was being called from a test that only claims to handle
vectors suitable for a precisely-scavenged control stack.
* Fix, by moving the call to the next test, which is for
specialized vectors (and thus only runs on conservative-stack
systems).
Alastair Bridgewater [Thu, 25 Apr 2013 18:55:13 +0000 (14:55 -0400)]
ppc support for stack-allocatable-vectors
* This turned out to be fairly straightforward. Unlike in a
heap-allocation-only regime, where a VOP is required to :TRANSLATE
ALLOCATE-VECTOR, the :STACK-ALLOCATABLE-VECTORS feature enables an
LTN-ANNOTATE optimizer for ALLOCATE-VECTOR that substitutes
invocations of one of two named VOPs.
* To convert from the old regime to the new, rename the old VOP
to fit the new naming scheme, and write a new VOP to do the stack
allocation.
* As a cleaning-up-a-loose-end matter, lose the :TRANSLATE
option for the old VOP.
* And as a "being somewhat cute about things" matter, make the
support for stack-allocatable-vectors selectable at build time,
which should provide a quick overview of how to make this work on
some other platform, should anyone else be interested later on.
Alastair Bridgewater [Wed, 24 Apr 2013 17:13:50 +0000 (13:13 -0400)]
gencgc: Compute bytes_allocated correctly during dynamic space pickup.
* Rather than computing bytes_allocated based on the number of
pages prior to the "alloc pointer" (really the heap high-water
mark), accumulate it based on the number of ALLOCATED pages.
* Fixes some lossage in write_generation_stats(), which seems to
be the only place where this value is checked against reality.
Alastair Bridgewater [Wed, 24 Apr 2013 17:12:23 +0000 (13:12 -0400)]
Add test cases for non-consing WITHOUT-GCING and WITH-PINNED-OBJECTS.
* Neither of these two constructs should cons under normal
circumstances, but WITH-PINNED-OBJECTS is occasionally broken in
this respect on some backends. May as well make it explicit and
official.
Alastair Bridgewater [Wed, 24 Apr 2013 17:07:35 +0000 (13:07 -0400)]
compiler/{sparc,ppc}/macros: with-pinned-objects improvements.
* For all precise gencgc backends, with-pinned-objects uses an
explicit "pin list". This pin list should be stack-allocated.
* Declare the pin list to be TRULY-DYNAMIC-EXTENT, for both
backends. This won't actually do anything unless the backend
also supports :stack-allocatable-fixed-objects or more than two
objects are to be pinned at once (one-arg LIST and two-arg LIST*
are both converted to CONS by the compiler, and CONS falls under
:stack-allocatable-fixed-objects rather than
:stack-allocatable-lists).
Alastair Bridgewater [Tue, 23 Apr 2013 17:23:01 +0000 (13:23 -0400)]
ppc: Implement :stack-allocatable-fixed-objects
* Alter SYS:SRC;COMPILER;PPC;MACROS.LISP, WITH-FIXED-ALLOCATION
to accept a parameter for requesting stack allocation instead of
heap allocation.
* Alter SYS:SRC;COMPILER;PPC;ALLOC.LISP, VOP FIXED-ALLOC to pass
the new stack-allocation parameter.
* And add :stack-allocatable-fixed-objects to the PPC section in
make-config.sh.
Jim Wise [Wed, 1 May 2013 01:23:05 +0000 (21:23 -0400)]
backtrace-interrupted-condition-wait now passes on darwin.
Stas Boukarev [Tue, 30 Apr 2013 19:32:43 +0000 (23:32 +0400)]
Micro-optimize integer-length on fixnums on x86-64.
INTEGER-LENGTH is implemented by using the BSR instruction, which
returns the position of the first 1-bit from the right. And that needs
to be incremented to get the width of the integer, and BSR doesn't
work on 0, so it needs a branch to handle 0.
But fixnums are tagged by being shifted left n-fixnum-tag-bits times,
untagging by shifting right n-fixnum-tag-bits-1 times (and if
n-fixnum-tag-bits = 1, no shifting is required), will make the
resulting integer one bit wider, making the increment unnecessary.
Then, to avoid calling BSR on 0, OR the result with 1. That sets the
first bit to 1, and if all other bits are 0, BSR will return 0,
which is the correct value for INTEGER-LENGTH.
Stas Boukarev [Tue, 30 Apr 2013 09:52:57 +0000 (13:52 +0400)]
Document the new :directory argument for run-program.
Lutz Euler [Mon, 29 Apr 2013 21:18:27 +0000 (23:18 +0200)]
Convert the MOVE macro on x86-64 into a function.
This is possible as the macro is used just to simulate an inline
function. Converting MOVE into a true function shrinks the core by
448 KiB and may even make the compiler run faster due to reduced
instruction cache pressure.
Some background: Only on x86-64 MOVE is used with float SCs sometimes.
It therefore needs to select different machine instructions depending on
the SC of its destination argument. This compiles to so much code that
inlining it can't be justified, especially given that MOVE is used in
several hundred VOPs.
While at it, correct the comment at the top of the file for 64-bitness.
Lutz Euler [Mon, 29 Apr 2013 20:57:41 +0000 (22:57 +0200)]
Faster ISQRT on small (about fixnum sized) numbers.
ISQRT is implemented using a recursive algorithm for arguments above 24
which is compiled using generic arithmetic only (as it must support both
fixnums and bignums).
Improve this by compiling this recursive part twice, once using generic
and once fixnum-only arithmetic, and dispatching on function entry into
the applicable part. For maximum speed, the fixnum part recurs directly
into itself, thereby avoiding further type dispatching.
This makes ISQRT run about three times as fast on fixnum inputs while
the generated code is about 40 percent larger (both measured on x86-64).
For bignums a speedup can be seen, too, as ISQRT always recurs into
fixnum territory eventually, but the relative gain obviously becomes
smaller very fast with increasing size of the argument.
I have changed the variable names in the recursive part; they no longer
have an "n-" prefix as this in SBCL by convention means "number of" and
as the argument of the recursive part is no longer visibly "n".
Slightly augment the test case.
Lutz Euler [Mon, 29 Apr 2013 20:35:01 +0000 (22:35 +0200)]
Improve scaling of type derivation for LOG{AND,IOR,XOR}.
If the types of the arguments of LOG{AND,IOR,XOR} are known to be ranges
of non-negative integers the compiler currently derives the range of the
result using straightforward implementations of algorithms from
"Hacker's Delight". These take quadratical time in the number of bits of
the inputs in the worst case, potentially leading to unacceptably long
compilation times. (The algorithms are based on loops over the bits of
the inputs, doing calculations during each iteration that are themselves
linear in the number of bits of their operands.)
Instead implement bit-parallel algorithms I have found that take linear
time in all cases. While their runtime therefore is limited to much
smaller values for large inputs, it is comparable to that of the current
algorithms for small inputs, too; the new deriver for LOGXOR is in fact
faster than the old one by a factor of two to ten already in the latter
case.
The (existing) test for these derivers compares their results with those
from a brute-force algorithm for all O(N^4) many pairs of input ranges
with endpoints from the set of N-bit unsigned integers. The brute-force
algorithm needs to consider O(N^2) input pairs for each pair of ranges,
making the total runtime O(N^6). Therefore the test normally runs with
N = 5. I have tested all three new derivers successfully with N = 7.
Replace LOG{AND,IOR,XOR}-DERIVE-UNSIGNED-{LOW,HIGH}-BOUND with
LOG{AND,IOR,XOR}-DERIVE-UNSIGNED-BOUNDS to make it possible to evaluate
expressions only once that the calculations for the low and the high
bound have in common. The callers always need both bounds anyway.
Adapt the test to this change. (It runs twice as fast now due to the
brute force loop calculating both bounds in one go.)
Add a test for the scaling behaviour. This needs a function to measure
runtimes over potentially large ranges; add this to test-util.lisp.
Fixes lp#1096444.
Lutz Euler [Mon, 29 Apr 2013 20:35:01 +0000 (22:35 +0200)]
Split bitops-derive-type.lisp out of srctran.lisp.
The moved part contains DERIVE-TYPE methods for LOGAND, LOGIOR, and
friends. The split is motivated by srctran.lisp being too large and
by planned changes to these type derivers.
Stas Boukarev [Mon, 29 Apr 2013 19:40:41 +0000 (23:40 +0400)]
Fix init-var-ignoring-errors.
Actually set the variable to the default value in case of an error.
Caught by Nikodemus Siivola.
Stas Boukarev [Mon, 29 Apr 2013 19:28:32 +0000 (23:28 +0400)]
Add :directory argument to sb-ext:run-program.
The implementation uses chdir(2) on Unices, the lpCurrentDirectory
argument to CreateProcessW on Windows.
Slightly adapted from the patch by Matthias Benkard.
Closes lp#791800
Stas Boukarev [Mon, 29 Apr 2013 17:15:57 +0000 (21:15 +0400)]
Handle environment initialization better.
Don't fail with mysterious errors and memory faults on startup during
initialization of *default-pathname-defaults* when the current
directory contains undecodable characters or is deleted. Similarly
catch decoding errors for things like *runtime-pathname* and
*posix-argv*.
Turn the errors into warnings, and ensure that streams are initialized
and the error messages can be printed.
Christophe Rhodes [Mon, 29 Apr 2013 14:12:05 +0000 (15:12 +0100)]
1.1.7: will be tagged as "sbcl-1.1.7"
Christophe Rhodes [Mon, 29 Apr 2013 14:11:21 +0000 (15:11 +0100)]
fix formatting of most recent "changes" line in NEWS
Christophe Rhodes [Tue, 23 Apr 2013 10:33:26 +0000 (11:33 +0100)]
sort NEWS into enhancement/bug fix/optimization order
Paul Khuong [Sat, 20 Apr 2013 13:31:30 +0000 (15:31 +0200)]
Trivial code cleanups
Declare a variable as ignored, and descriptors are 64 bit on
x86-64. The latter was brought to my attention by Douglas Katzman.
Paul Khuong [Sat, 20 Apr 2013 11:50:52 +0000 (13:50 +0200)]
Substitute constants with modular equivalents more safely
* Modular arithmetic sometimes lets us narrow constants down,
especially with signed arithmetic. We now update the receiving
LVAR's type conservatively when there are multiple uses; otherwise,
conflicting type information results in spurious dead code
elimination.
* Test case by Eric Marsden.
* Reported by Eric Marsden on sbcl-devel (2013-04-18).
Paul Khuong [Sat, 20 Apr 2013 11:43:00 +0000 (13:43 +0200)]
Fix the build on OS X 10.8.0
It seems our exception handler can be called before it's fully set up.
Handle that case without potentially leaking too many ports.
Reported by Gabriel Dos Reis on sbcl-devel.
Matthias Andreas Benkard [Tue, 16 Apr 2013 14:53:00 +0000 (16:53 +0200)]
Handle multiple-valued forms in TRACE :PRINT.
Closes: lp#457053.
Stas Boukarev [Mon, 15 Apr 2013 23:47:24 +0000 (03:47 +0400)]
Remove an unused VOP %make-symbol on x86-64.
%make-symbol is handled by define-primitive-object now.
The old VOP was copied into alloc.lisp during the x86_64 port, got
removed from x86 before the merge. It wasn't even ported to x86-64,
and was never invoked during the 9 years it has been sitting there.
Also remove fast_random_state variable from the C runtime, it was used
by the VOP.
Stas Boukarev [Mon, 15 Apr 2013 19:50:12 +0000 (23:50 +0400)]
Disassemble: print the size into the right stream.
Print the newly introduced size information to the provided stream,
not to *standard-output*.
Reported by Jan Moringen.