Wikimedia Developer Support

[MediaWiki-l] php-fpm worker crashes when editing article


#1

After updating MediaWiki, PHP, and MySQL, I'm fairly consistently getting
php-fpm worker crashes when I try to edit articles on my wiki.

I think I can rule out database corruption. mysql_check says everything is OK,
and even after dropping the database and restoring from a mysqldump backup,
the crashes continue.

I've enabled error reporting & logging in php.ini, php-fpm.conf, and my.cnf.

The MySQL log reveals that the crash, if it happens (it doesn't always) is
always right after a certain MySQL query runs. Meanwhile, php_error.log says
nothing (but is working), and php-fpm.log says nothing of note, just "[pool
www] child 17491 exited on signal 11 (SIGSEGV - core dumped) after 0.000125
seconds from start".

After adjusting some kernel settings and rebuilding PHP with --enable-debug, I
managed to finally get some core dumps with symbols. Using gdb, I saw in one
of them that it looks like pcre could be to blame:


#2

What does total RAM look like, and is there a swap partition?

I'm wondering if Apache and MySQL are battling for RAM

Greg Rundlett
https://eQuality-Tech.com
https://freephile.org


#3

The BBB has 512 MB RAM, not expandable. I run FreeBSD from a 60 GB microSD
card, and I dedicated the internal eMMC drive's original 3.4 GB Linux
partition for swap. Very little of that swap is ever used, though. Every
time I look at 'top' it only shows 5% or so in use, and only by my login
shell, not the daemons.

I try to tune services to keep RAM use down. Typical 'top -tw' output:

last pid: 20543; load averages: 1.04, 0.98, 0.85 up 13+20:25:45 20:17:07
30 processes: 30 sleeping
CPU: 0.8% user, 0.0% nice, 0.0% system, 0.8% interrupt, 98.4% idle
Mem: 108M Active, 139M Inact, 115M Laundry, 112M Wired, 55M Buf, 26M Free
Swap: 1964M Total, 116M Used, 1848M Free, 5% Inuse

  PID USERNAME THR PRI NICE SIZE RES SWAP STATE TIME WCPU COMMAND
17391 mysql 35 40 0 229M 141M 0 select 14:02 0.15% mysqld
2878 root 1 40 0 65M 11M 0 select 1:10 0.08% perl
9513 mike 1 40 0 12M 4884K 0 select 0:51 0.03% sshd
2151 ntpd 1 40 0 8296K 8320K 0 select 4:00 0.02% ntpd
20053 root 1 40 0 9084K 6040K 0 select 0:00 0.02% sendmail
17431 root 1 4 0 100M 14M 0 kqread 0:04 0.01% php-fpm
18455 www 1 8 0 36M 15M 0 nanslp 0:07 0.01% php
  682 unbound 1 40 0 16M 8880K 0 select 6:10 0.00% local-unbound
17441 www 1 4 0 12M 5596K 0 kqread 0:31 0.00% nginx
  706 root 1 40 0 4860K 1776K 0 select 0:21 0.00% syslogd
2880 root 1 40 0 68M 27M 0 select 0:20 0.00% perl
3898 root 1 8 0 4860K 824K 0 nanslp 0:20 0.00% cron
20239 www 1 4 0 104M 36M 0 accept 0:13 0.00% php-fpm
20380 www 1 4 0 104M 31M 0 accept 0:06 0.00% php-fpm
9669 root 1 16 0 6432K 2272K 0 pause 0:05 0.00% tcsh
18375 mike 1 40 0 9572K 5904K 0 select 0:03 0.00% ssh
  583 root 1 40 0 3924K 412K 0 select 0:02 0.00% devd
15699 mysql 1 8 0 5264K 2148K 0 wait 0:01 0.00% sh
3888 root 1 51 0 11M 3968K 0 select 0:01 0.00% sshd
2881 root 1 40 0 65M 10M 0 select 0:00 0.00% perl
9514 mike 1 16 0 6768K 0 2056K pause 0:00 0.00% <tcsh>
9511 root 1 40 0 12M 4728K 0 select 0:00 0.00% sshd
18330 mike 1 16 0 6772K 3736K 0 pause 0:00 0.00% tcsh
9668 mike 1 8 0 5360K 0 1788K wait 0:00 0.00% <su>
18452 root 1 8 0 5360K 2628K 0 wait 0:00 0.00% su
18453 www 1 16 0 5516K 2696K 0 pause 0:00 0.00% csh
20057 smmsp 1 16 0 8660K 5792K 0 pause 0:00 0.00% sendmail
3948 root 1 40 0 4332K 1644K 0 ttyin 0:00 0.00% getty
20412 www 1 4 0 100M 14M 0 accept 0:00 0.00% php-fpm
17440 root 1 16 0 11M 4608K 0 pause 0:00 0.00% nginx

(PID 18455 = tt-rss feed updater; PID 2881 = SpamAssassin)

I watched 'top -tw' while producing the crash and there was no swap use at
all. Even during a successful edit, the top of 'top' is showing RAM's not a
problem:

CPU: 55.6% user, 0.0% nice, 13.1% system, 23.9% interrupt, 7.3% idle
Mem: 147M Active, 108M Inact, 118M Laundry, 111M Wired, 55M Buf, 14M Free
Swap: 1964M Total, 99M Used, 1864M Free, 5% Inuse

Here's 'vmstat -w 1' output before, during, and after a successful edit:

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr mm0 mm1 in sy cs us sy id
3 0 2 691M 26M 0 0 0 0 0 60 1 0 44 163 97 0 1 99
3 0 2 691M 26M 0 0 0 0 0 60 0 0 39 169 79 0 3 97
3 0 2 691M 26M 0 0 0 0 0 54 0 0 42 155 95 1 1 98
5 0 2 695M 26M 164 0 0 0 0 60 0 0 111 599 238 2 5 93
4 0 2 691M 26M 35 0 0 0 196 60 0 0 52 224 92 0 2 98
3 0 2 691M 26M 0 0 0 0 0 60 0 0 43 174 86 1 1 98
4 0 2 691M 26M 0 0 0 0 0 60 0 0 42 147 92 0 1 99
2 1 2 691M 26M 61 0 0 0 20 60 25 0 2477 2217 3423 65 12 22
2 0 2 693M 25M 114 0 0 0 298 66 50 0 2640 1304 3370 89 8 2
2 2 2 693M 25M 65 0 0 0 120 60 16 0 1995 546 1883 91 9 0
1 0 2 693M 25M 77 0 0 0 61 60 25 0 2480 455 2818 93 7 0
1 0 2 693M 25M 0 0 0 0 0 60 4 0 1399 192 663 98 2 0
1 0 2 693M 25M 54 0 0 0 160 60 21 0 1492 1320 975 89 10 1
2 0 2 693M 25M 155 0 0 0 372 60 55 0 2957 3679 4283 78 20 2
7 3 2 696M 24M 126 0 0 0 2083 66 183 0 5895 1852 10022 51 49 0
5 4 2 696M 24M 107 0 0 0 1408 60 244 0 12751 2486 23605 51 49 0
3 0 2 696M 23M 104 0 0 0 797 60 180 0 8104 2135 14689 64 36 0
2 0 2 696M 20M 369 0 18 0 174 66 91 3 4300 2674 6677 67 33 0
2 0 2 694M 20M 132 0 4 0 817 60 70 0 3308 3420 4868 79 21 0
4 0 2 697M 18M 1193 0 1 0 538 60 19 0 1901 3219 1928 84 16 0
3 0 2 707M 16M 973 0 0 0 710 60 35 0 1914 3476 2104 79 21 0
2 0 2 705M 16M 191 0 0 0 407 60 16 0 1864 3265 1941 89 9 2
2 2 2 707M 16M 34 0 0 0 327 60 49 0 3901 4143 5965 77 23 0
1 0 2 707M 16M 16 0 0 0 216 60 59 0 4025 3740 6352 68 27 5
1 0 2 707M 16M 88 0 0 0 12 66 5 0 1397 4865 809 81 19 0
1 0 2 707M 15M 302 0 0 0 12 60 0 0 1139 6580 148 77 23 0
0 0 2 694M 22M 466 0 0 0 2593 60 0 0 353 1085 153 24 7 69
0 0 2 694M 22M 0 0 0 0 0 60 1 0 78 151 160 1 1 98
0 0 2 694M 22M 6 0 0 0 32 60 7 0 465 278 931 1 4 95
0 0 2 694M 22M 0 0 0 0 56 60 18 0 1232 221 2384 0 7 93
0 0 2 694M 22M 0 0 0 0 0 60 4 0 302 189 621 1 2 97
0 0 2 694M 22M 0 0 0 0 0 60 4 0 301 170 624 0 1 99
0 0 2 694M 22M 0 0 0 0 0 60 0 0 37 173 80 1 1 98
1 0 2 694M 22M 0 0 0 0 0 60 1 0 83 159 175 0 2 98

Here's the same thing when it crashes:

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr mm0 mm1 in sy cs us sy id
1 0 2 699M 21M 0 0 0 0 0 66 4 0 323 169 628 0 2 98
0 0 2 699M 21M 0 0 0 0 0 60 2 0 72 184 133 0 3 97
2 0 2 699M 21M 6 0 0 0 31 54 3 0 477 1641 258 33 7 60
0 1 2 699M 20M 151 0 0 0 840 60 1109 0 8771 936 16021 50 36 15
1 0 2 699M 16M 582 0 0 0 1898 66 499 0 28483 147 56578 0 43 57
0 1 2 699M 17M 172 0 0 0 2265 60 376 0 28877 171 56985 0 48 52
0 1 2 699M 17M 636 0 0 0 1248 60 441 0 30110 162 59860 1 47 52
0 1 2 699M 17M 612 0 0 0 1224 60 417 0 29243 173 58092 0 41 59
0 1 2 699M 17M 636 0 0 0 1296 66 413 0 30507 147 60642 1 37 63
0 1 2 699M 17M 510 0 0 0 1584 60 386 0 31064 169 61611 0 46 54
0 1 2 699M 17M 480 0 0 0 960 60 314 0 22862 147 45330 1 27 72
0 1 2 699M 17M 514 0 0 0 1456 60 367 0 28493 189 56439 0 47 53
0 1 2 699M 17M 216 0 0 0 2416 60 320 0 32647 160 64335 0 47 53
0 1 2 699M 17M 450 0 0 0 1520 60 411 0 29764 169 58956 1 44 55
0 1 2 699M 17M 600 0 0 0 1200 66 403 0 28716 149 56910 1 45 54
0 1 2 699M 17M 596 0 0 0 1248 60 398 0 29501 169 58548 0 44 56
0 1 2 699M 17M 600 0 0 0 1224 60 391 0 28821 172 57211 1 44 55
0 1 2 699M 17M 276 0 0 0 552 66 179 0 13592 169 26593 0 17 83
0 1 2 699M 17M 132 0 0 0 264 60 89 0 6736 147 12862 0 12 88
0 1 2 699M 17M 432 0 0 0 864 60 280 0 20640 174 40563 0 32 68
0 1 2 699M 17M 204 0 0 0 412 60 151 0 11409 165 21299 0 19 81
0 1 2 699M 17M 504 0 0 0 1032 60 332 0 24438 188 48209 0 33 67
0 1 2 699M 17M 636 0 0 0 1248 60 456 0 30150 147 59932 0 38 62
1 0 2 699M 17M 624 0 0 0 1272 66 547 0 30275 173 60258 0 45 55
0 1 2 699M 17M 624 0 0 0 1248 60 604 0 29725 149 59148 0 36 64
1 0 2 694M 27M 309 0 0 0 4132 60 509 0 28489 276 56415 0 48 52
0 0 2 694M 27M 33 0 0 0 0 60 1 0 46 365 91 0 2 98
0 0 2 694M 27M 0 0 0 0 0 59 2 0 175 169 338 1 2 98
0 0 2 694M 27M 0 0 0 0 0 60 0 0 39 147 79 1 1 98
0 0 2 694M 27M 0 0 0 0 16 60 1 0 170 169 337 1 3 96

I'm no expert in interpreting this but I don't get the impression it's thrashing.

Any thoughts on the pcre clue and the fact that it always dies at right after
that one SQL query?


#4

Is the PCRE version 8+? This will tell you the version (if it's loaded):

php -r 'echo "pcre: " . ( extension_loaded( "pcre" ) ? PCRE_VERSION : "no"
) . "\n";'

I know older versions of PCRE can cause problems.

Greg Rundlett
https://eQuality-Tech.com
https://freephile.org


#5

Yes, it's 8.43, as I mentioned.

Is there an easy way to figure out where in the MediaWiki code the queries
are running? Maybe there will be a clue there.

2019-04-19T20:00:41.166828Z 2501 Connect offset@localhost on OffsetDB using Socket
2019-04-19T20:00:41.168849Z 2501 Init DB offsetdb
2019-04-19T20:00:41.170499Z 2501 Query SET group_concat_max_len = 262144, sql_mode = ''
2019-04-19T20:00:41.181845Z 2501 Query BEGIN /* Wikimedia\Rdbms\Database::query (User::loadFromDatabase) */
2019-04-19T20:00:41.184828Z 2501 Query SELECT /* User::loadFromDatabase */ user_id,user_name,user_real_name,user_email,user_touched,user_token,user_email_authenticated,user_email_token,user_email_token_expires,user_registration,user_editcount FROM `mw_user` WHERE user_id = '2' LIMIT 1
2019-04-19T20:00:41.199361Z 2501 Query SELECT /* UserGroupMembership::getMembershipsForUser */ ug_user,ug_group,ug_expiry FROM `mw_user_groups` WHERE ug_user = '2'
2019-04-19T20:00:41.210905Z 2501 Query SELECT /* User::loadOptions */ up_property,up_value FROM `mw_user_properties` WHERE up_user = '2'
2019-04-19T20:00:41.227391Z 2502 Connect offset@localhost on OffsetDB using Socket
2019-04-19T20:00:41.229184Z 2502 Init DB offsetdb
2019-04-19T20:00:41.230537Z 2502 Query SET group_concat_max_len = 262144, sql_mode = ''
2019-04-19T20:00:41.234579Z 2502 Query SHOW /* Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly */ GLOBAL VARIABLES LIKE 'read_only'
2019-04-19T20:00:41.258097Z 2502 Query SELECT /* SqlBagOStuff::getMulti */ keyname,value,exptime FROM `mw_objectcache` WHERE keyname = 'OffsetDB-mw_:MWSession:pbefr5e4bhrj5ek7lenuqq4vqv1bjg98'
2019-04-19T20:00:41.303730Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'deps' LIMIT 1
2019-04-19T20:00:41.312750Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'list' LIMIT 1
2019-04-19T20:00:41.366572Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'preload' LIMIT 1
2019-04-19T20:00:41.380583Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'preload' LIMIT 1
2019-04-19T20:00:41.388089Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'messages:title-invalid-empty' LIMIT 1
2019-04-19T20:00:41.408978Z 2501 Query SELECT /* Block::newLoad */ ipb_id,ipb_address,ipb_timestamp,ipb_auto,ipb_anon_only,ipb_create_account,ipb_enable_autoblock,ipb_expiry,ipb_deleted,ipb_block_email,ipb_allow_usertalk,ipb_parent_block_id,ipb_reason AS `ipb_reason_text`,NULL AS `ipb_reason_data`,NULL AS `ipb_reason_cid`,ipb_by,ipb_by_text,NULL AS `ipb_by_actor` FROM `mw_ipblocks` WHERE ipb_address = 'Mjb'
2019-04-19T20:00:41.425947Z 2501 Query SELECT /* WikiPage::pageData */ page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `mw_page` WHERE page_namespace = '2' AND page_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software' LIMIT 1
2019-04-19T20:00:41.433949Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'namespaceGenderAliases' LIMIT 1
2019-04-19T20:00:41.442378Z 2501 Query SELECT /* Title::loadRestrictions */ pr_type,pr_expiry,pr_level,pr_cascade FROM `mw_page_restrictions` WHERE pr_page = '1206'
2019-04-19T20:00:41.464948Z 2501 Query SELECT /* MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds */ rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS `rev_comment_text`,NULL AS `rev_comment_data`,NULL AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name FROM `mw_revision` INNER JOIN `mw_page` ON ((page_id = rev_page)) LEFT JOIN `mw_user` ON ((rev_user != 0) AND (user_id = rev_user)) WHERE rev_page = '1206' AND page_id = '1206' AND rev_id = '3940' LIMIT 1
2019-04-19T20:00:41.480834Z 2501 Query SELECT /* MediaWiki\Revision\RevisionStore::loadSlotRecords */ slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model FROM `mw_slots` INNER JOIN `mw_content` ON ((slot_content_id = content_id)) WHERE slot_revision_id = '3940'
2019-04-19T20:00:41.488519Z 2501 Query SELECT /* MediaWiki\Storage\NameTableStore::loadTable */ role_id AS `id`,role_name AS `name` FROM `mw_slot_roles` ORDER BY id
2019-04-19T20:00:41.495793Z 2501 Query SELECT /* MediaWiki\Storage\NameTableStore::loadTable */ model_id AS `id`,model_name AS `name` FROM `mw_content_models` ORDER BY id
2019-04-19T20:00:41.531263Z 2501 Query SELECT /* Title::getCascadeProtectionSources */ pr_page,page_namespace,page_title,pr_expiry,pr_type,pr_level FROM `mw_templatelinks`,`mw_page_restrictions`,`mw_page` WHERE tl_namespace = '2' AND tl_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software' AND (tl_from=pr_page) AND pr_cascade = '1' AND (page_id=pr_page)
2019-04-19T20:00:41.546515Z 2501 Query SELECT /* Block::newLoad */ ipb_id,ipb_address,ipb_timestamp,ipb_auto,ipb_anon_only,ipb_create_account,ipb_enable_autoblock,ipb_expiry,ipb_deleted,ipb_block_email,ipb_allow_usertalk,ipb_parent_block_id,ipb_reason AS `ipb_reason_text`,NULL AS `ipb_reason_data`,NULL AS `ipb_reason_cid`,ipb_by,ipb_by_text,NULL AS `ipb_by_actor` FROM `mw_ipblocks` WHERE ipb_address = 'Mjb'
2019-04-19T20:00:41.557840Z 2501 Query SELECT /* MediaWiki\Storage\SqlBlobStore::fetchBlob */ old_text,old_flags FROM `mw_text` WHERE old_id = '3895' LIMIT 1
2019-04-19T20:00:41.574169Z 2501 Query SELECT /* WikiPage::pageData */ page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `mw_page` WHERE page_namespace = '2' AND page_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software' LIMIT 1
2019-04-19T20:00:41.583145Z 2501 Query SELECT /* WikiPage::pageData */ page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `mw_page` WHERE page_namespace = '2' AND page_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software' LIMIT 1
2019-04-19T20:00:41.594411Z 2501 Query SELECT /* MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds */ rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS `rev_comment_text`,NULL AS `rev_comment_data`,NULL AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name FROM `mw_revision` INNER JOIN `mw_page` ON ((page_id = rev_page)) LEFT JOIN `mw_user` ON ((rev_user != 0) AND (user_id = rev_user)) WHERE rev_page = '1206' AND page_id = '1206' AND rev_id = '3940' LIMIT 1
2019-04-19T20:00:41.606978Z 2501 Query SELECT /* MediaWiki\Revision\RevisionStore::loadSlotRecords */ slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model FROM `mw_slots` INNER JOIN `mw_content` ON ((slot_content_id = content_id)) WHERE slot_revision_id = '3940'
2019-04-19T20:00:41.625565Z 2501 Query SELECT /* LCStoreDB::get */ lc_value FROM `mw_l10n_cache` WHERE lc_lang = 'en' AND lc_key = 'magicWords' LIMIT 1

This is the point where it crashed. If it hadn't crashed, the next line would
have looked like this (with different datestamp & PID of course):

2019-04-19T19:56:42.461414Z 2479 Query SELECT /* WikiPage::pageData */ page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM `mw_page` WHERE page_namespace = '2' AND page_title = 'Mjb/FreeBSD_on_BeagleBone_Black/Additional_software' LIMIT 1


#6

sorry, I don't know what the problem is

Greg Rundlett
https://eQuality-Tech.com
https://freephile.org