Safe concurrent MySQL access in Haskell
Published on ; updated on
Update (2016-10-26). The mysql package has got a new maintainer, Paul Rouse. He merged my changes that address Issue 1 described below in mysql-0.1.2. The other issues are still relevant, though.
mysql, Bryan O’Sullivan’s low-level Haskell bindings to the libmysqlclient C library, powers a few popular high-level MySQL libraries, including mysql-simple, persistent-mysql, snaplet-mysql-simple, and groundhog-mysql.
Most users do not suspect that using mysql as it stands concurrently is unsafe.
This article describes the issues and their solutions.
Issue 1: unsafe foreign calls
As of version 0.1.1.8, mysql marks many of its ffi imports as unsafe. This is a common trick to make these calls go faster. In our case, the problem with unsafe calls is that they block a capability (that is, an OS thread that can execute Haskell code). This is bad for two reasons:
- Fewer threads executing Haskell code may result in less multicore utilization and degraded overall performance.
- If all capabilities get blocked executing related MySQL statements, they may deadlock.
Here’s a demonstration of such a deadlock:
{-# LANGUAGE OverloadedStrings #-}
import Database.MySQL.Simple
import Control.Concurrent
import Control.Concurrent.STM
import Control.Applicative
import Control.Monad
import Control.Exception
= do
main <- atomically $ newTVar 0
tv $ \conn -> do
withConn mapM_ (execute_ conn)
"drop table if exists test"
[ "create table test (x int)"
, "insert into test values (0)"
,
]
1..2] $ \n -> forkIO $ withConn $ \conn -> (do
forM_ ["begin"
execute_ conn putStrLn $ show n ++ " updating"
"update test set x = 42"
execute_ conn putStrLn $ show n ++ " waiting"
10^6)
threadDelay ("commit"
execute_ conn putStrLn $ show n ++ " committed"
`finally`
) $ modifyTVar tv (+1))
(atomically
$ check =<< (>=2) <$> readTVar tv
atomically where
= bracket (connect defaultConnectInfo) close withConn
If you run this with stock mysql-0.1.1.8, one capability
(i.e. without +RTS -Nx
), and either threaded or
non-threaded runtime, you’ll see:
1 updating
1 waiting
2 updating
1 committed
test: ConnectionError {
errFunction = "query",
errNumber = 1205,
errMessage = "Lock wait timeout exceeded; try restarting transaction"}
Here’s what’s going on:
- Both threads are trying to update the same row inside their transactions;
- MySQL lets the first update pass but blocks the second one until the first update committed (or rolled back);
- The first transaction never gets a chance to commit, because it has no OS threads (capabilities) to execute on. The only capability is blocked waiting for the second UPDATE to finish.
The solution is to patch mysql to mark its ffi calls as safe (and use the threaded runtime). Here’s what would happen:
- To compensate for the blocked OS thread executing the second UPDATE, the GHC runtime moves the capability to another thread (either fresh or drawn from a pool);
- The first transaction finishes on this unblocked capability;
- MySQL then allows the second UPDATE to go through, and the second transaction finishes as well.
Issue 2: uninitialized thread-local state in libmysqlclient
To quote the docs:
When you call mysql_init(), MySQL creates a thread-specific variable for the thread that is used by the debug library (among other things). If you call a MySQL function before the thread has called mysql_init(), the thread does not have the necessary thread-specific variables in place and you are likely to end up with a core dump sooner or later.
Here’s the definition of the thread-local state data structure, taken from mariadb-10.0.17:
struct st_my_thread_var
{
int thr_errno;
;
mysql_cond_t suspend;
mysql_mutex_t mutex* volatile current_mutex;
mysql_mutex_t * volatile current_cond;
mysql_cond_t ;
pthread_t pthread_self;
my_thread_id idint volatile abort;
;
my_bool initstruct st_my_thread_var *next,**prev;
void *keycache_link;
; /* used by conditional release the queue */
uint lock_typevoid *stack_ends_here;
*mutex_in_use;
safe_mutex_t #ifndef DBUG_OFF
void *dbug;
char name[THREAD_NAME_SIZE+1];
#endif
};
This data structure is used by both server and client code, although
it seems like most of these fields are used by the server, not client
(with the exception of the dbug
thing), which would explain
why Haskellers have gotten away with not playing by the rules so far.
However:
- I am not an expert, and I spent just about 20 minutes grepping the codebase. Am I sure that there’s no code path in the client that accesses this? No.
- Am I going to ignore the above warning and bet the stability of my production system on MySQL/MariaDB devs never making use of this thread-local state? Hell no!
What should we do to obey the rules?
First, make threads which work with MySQL bound, i.e. launch
them with forkOS
instead of forkIO
. Otherwise,
even if an OS thread is initialized, the Haskell thread may be later
scheduled on a different, uninitialized OS thread.
If you create a connection in a thread, use it, and dispose of it,
then using a bound thread should be enough. This is because mysql’s
connect
calls mysql_init
, which in turn calls
mysql_thread_init
.
However, if you are using a thread pool or otherwise sharing a
connection between threads, then connect
may occur on a
different OS thread than a subsequent use. Under this scenario, every
thread needs to call mysql_thread_init
prior to other MySQL
calls.
Issue 3: non-thread-safe calls
The mysql_library_init
function needs to be called prior
to any other MySQL calls. It only needs to be called once per process,
although it is harmless to call it more than once.
It is called implicitly by mysql_init
(which is in turn
called by connect
). However, this function is documented as
not thread-safe. If you connect from two threads simultaneously, bad or
unexpected things can happen.
Also, if you are calling mysql_thread_init
as described
above, it should be called after mysql_library_init
.
This is why it is a good idea to call mysql_library_init
in the very beginning, before you spawn any threads.
Using a connection concurrently
This is not specific to the Haskell bindings, just something to be aware of:
You should not use the same MySQL connection simultaneously from different threads.
First, the docs explicitly warn you about that:
Multiple threads cannot send a query to the MySQL server at the same time on the same connection
(there are some details on this in case you are interested)
Second, the MySQL wire protocol is not designed to multiplex several communication «threads» onto the same TCP connection (unlike, say, AMQP), and trying to do so will probably confuse both the server and the client.
Example
Here is, to the best of my knowledge, a correct example of
concurrently accessing a MySQL database. The example accepts request at
http://localhost/key
and looks up that key in a MySQL
table.
It needs to be compiled against my fork of mysql, which has the following changes compared to 0.1.1.8:
- Unsafe calls are marked as safe (the patch is due to Matthias Hörmann);
mysql_library_init
andmysql_thread_init
are exposed under the namesinitLibrary
andinitThread
.
(How to use a fork that is not on hackage? For example, through a stackage snapshot.)
{-# LANGUAGE OverloadedStrings, RankNTypes #-}
import Network.Wai
import qualified Network.Wai.Handler.Warp as Warp
import Network.HTTP.Types
import qualified Database.MySQL.Base as MySQL
import Database.MySQL.Simple
import Control.Exception (bracket)
import Control.Monad (void)
import Control.Concurrent (forkOS)
import qualified Data.Text.Lazy.Encoding as LT
import Data.Pool (createPool, destroyAllResources, withResource)
import Data.Monoid (mempty)
import GHC.IO (unsafeUnmask)
= do
main
MySQL.initLibrary$ \pool ->
bracket mkPool destroyAllResources 8000 . Warp.setFork forkOSWithUnmask $ Warp.defaultSettings) $
Warp.runSettings (Warp.setPort -> do
\req resp
MySQL.initThread$ \conn ->
withResource pool case pathInfo req of
-> do
[key] <- query conn "SELECT `desc` FROM `test` WHERE `key` = ?"
rs Only key)
(case rs of
Only result : _ -> resp $
responseLBS
ok200"text/plain")]
[(hContentEncoding, LT.encodeUtf8 result)
(-> resp e404
_ -> resp e404
_
where
= createPool (connect defaultConnectInfo) close 1 60 10
mkPool = responseLBS notFound404 [] mempty
e404 forkOSWithUnmask :: ((forall a . IO a -> IO a) -> IO ()) -> IO ()
= void $ forkOS (io unsafeUnmask) forkOSWithUnmask io
The forkWithUnmask
business is only an artifact of the
way warp spawns threads; normally a simple forkOS
would do.
On the other hand, this example shows that in the real world you
sometimes need to make an extra effort to have bound threads. Even warp
got this feature only recently.
Note that this isn’t the most efficient implementation, since it essentially uses OS threads instead of lightweight Haskell threads to serve requests.
On destructors
The *_init
functions allocate memory, so there are
complementary functions, mysql_thread_end
and
mysql_library_end
, which free that library.
However, you probably do not want to call them. Here’s why.
Most multithreaded Haskell programs have a small numbers of OS threads managed by the GHC runtime. These threads are also long-lived. Trying to free the resources associated with those threads won’t give much, and not doing so won’t do any harm.
Furthermore, suppose that you still want to free the resources. When should you do so?
Naively calling mysql_thread_end
after serving a request
would be wrong. It is only the lightweight Haskell thread that is
finishing. The OS thread executing the Haskell thread may be executing
other Haskell threads at the same time. If you suddenly destroy MySQL’s
thread-local state, the effect on other Haskell threads would be the
same as if you didn’t call mysql_thread_init
in the first
place.
And calling mysql_library_end
without
mysql_thread_end
makes MySQL upset when it sees that not
all threads have ended.
References
- GitHub issue bos/mysql#11: Address concurrency
- Leon P Smith: Concurrency And Foreign Functions In The Glasgow Haskell Compiler
- Edward Z. Yang: Safety first: FFI and threading
- Simon Marlow, Simon Peyton Jones, Wolfgang Thaller: Extending the Haskell Foreign Function Interface with Concurrency
- MySQL 5.6 Reference Manual: Writing C API Threaded Client Programs