1.0.10.14: remove locking and gc inhibition from hash-tables, power of 2 sizes
This commit removes a bunch of bottlenecks from the hash-table
implementation. It speeds up GETHASH, (SETF GETHASH) and
REMHASH by a factor of 2-4x (on platforms with a real
WITH-PINNED-OBJECTS) depending on the operation. On the flip
side, no automatic locking is done on tables any more, so
multi-threaded applications must do their own locking. (The
locking done by SBCL was always just an implementation detail,
not a part of the external interface). By popular demand it's
also still safe to have multiple readers on the same table
without locking.
Originally GCs were inhibited during most hash-table
operations for two reasons. To prevent the GC from rehashing a
table while a Lisp-side operation is going on, and to prevent
the GC from moving the key after the hash-value has been
calculated.
More recently, most hash-tables operations have acquired a
lock on the table in order to prevent two concurrent writers
from corrupting the chains. While it's never been the intent
for the standard data structures to be automatically
thread-safe in SBCL, this locking had to be done since corrupt
tables could lead to infinite GC loops.
Both the locking and the without-gcing are expensive
operations relative to the total cost of a hash-table lookup.
This commit removes both the gc inhibition and the locks.
Additionally we switch to power of two table size, which
allows calculating a cheaper hash -> bucket with cheaper
operations than MOD.
* The GC no longer does the rehashing itself, but just marks
the hash-table as needing a rehash, which will then be done
Lisp-side when the table is next accessed. While it's
possible to find cases where the former behaviour has better
performance, they're very contrived.
* The hash-table operations that work on the chains now check
for loops in the chains, and signal an error if one is found.
* The hash-table operations now pin the key before calculating
the hash value (needed for EQ-based hash functions).
* Add a GC epoch value that GETHASH can use to check whether
a GC happened during the lookup. This is needed since another
thread calling GETHASH on the same table might have caused it
to be rehashed.
* Kill the old MUST-REHASH vector header, and replace it with a
slot in the HASH-TABLE structure. The overloading of the header
caused missed rehashings when both the GC and %%PUTHASH modified
it at the same time.
* Switch to power of two table sizes, with a slightly more complex
hash value -> bucket calculation than just taking the low bits,
which in many cases have a very skewed distribution for the existing
SBCL hash functions. Still a lot faster than using MOD.
* Leave in locking and GC inhibition during rehashing (needed to
allow multiple readers to coexist) and for weak hash-tables
(they need some GC support, and the code is much simpler when
all of the logic is in the GC instead of interleaved in the GC and
Lisp-side). Neither of these cases is performance critical.
12 files changed: