Dynamic Parsoid configuration

parsoid

#1

All wiki instances that want to use Parsoid need to be set with their API endpoint in the config.yml.

I want to use Parsoid in a dynamic wiki family. New wiki instances can be created at any time by an automated script. Those new instances then would have to be added to config.yml. While this could be done by a script, the Parsoid would also need to be restarted. And this probably is a problem. As it would make the service unavailable for all the other instances.

Are there any recommendations / suggestions on how to deal with this? Would localsettings.js be an option?


#2

See https://phabricator.wikimedia.org/T100841.


#3

Thanks! I was not aware that there is already some discussion on phabricator. Need to check this in future.
It’s good to know that people at the WMF know about this. But until those patches are merged, do you think something like this would work?

ATTENTION: Pseudo-code, not tested. Just to express an idea.

In localsettings.js:

exports.setup = function(parsoidConfig) {
	parsoidConfig.mwApiMap.get = function( prefix ) {
		return {
			uri: 'https://' + prefix + '.wiki.company.local/api.php',
			domain: prefix + '.wiki.company.local',
			prefix: prefix
		};
	}
};

It would probably also be necessary to overwrite (parsoidConfig.reverseMwApiMap), wouldn’t it?


#4

For every parse request, Parsoid creates a MWParserEnvironment object which then looks up the appropriate WikiConfig for the request based on the wiki. These wiki configs are cached in an cache.

Anyway, take a look at MWParserEnvironment.getParserEnv function. It then calls a switchToConfig function which looks up or creates a wiki config object. The creation of wiki config in terms involves contacting the mediawiki api to fetch siteinfo and other properties and populating the wiki config object. But, that only requires the mw api url.

The override you provided for ParsoidConfig.mwApiMap.get may do the trick. You will have to try it out to see if that change ensures all the above env machinery kicks in properly for the new wiki. It is hard to know if there is something else missing till you try it.

Hope that helps.


#5

If you look at the patch in https://gerrit.wikimedia.org/r/463979 we end up calling reverseMwApiMap.has(domain) first. That’s probably the method you’d need to monkey-patch in order to all your domain dynamically to reverseMwApiMap and mwApiMap. And then get() would work fine.

But it would probably be more useful if you tried the patch at 463979 and reported if it Worked For You ™. That would provide incentive to merge it.


#6

Thank you both for your input. @cscott, if I understand patch 463979 correctly I will have to add something like this in localsettings.js:

exports.setup = function(parsoidConfig) {
  parsoidConfig.dynamicConfig = function(domain) {
    parsoidConfig.setMwApi({
      uri: 'https://' + domain + '/api.php',
      domain: domain
    });
  }
}

I will give it a try and report back. Thanks again!


#7

Unfortunately patch 463979 did not work for me. Looks like localsettings.js:export.setup is called four times when the service is being started. But the parsoidConfig.dynamicConfig callback never gets called. Requests end with error “Invalid domain <client-provided-domain>” [1].

[1] https://github.com/wikimedia/parsoid/blob/f6c64caaf1becc5b5780c871d495a3d9f5aac7a8/lib/api/routes.js#L92-L95


#8

I’ve found a solution and submitted a patch. This discussion should probably take place in phabricator so I posted my findings there: https://phabricator.wikimedia.org/T100841#4648877