Wikimedia Developer Support

How to set up a "job queue"?

I have programmed an extension and it works quite well, but I want to automate some tasks via a job queue instead of starting them manually.
I want
a) execute a task at the end of the month (store some statistical data).
b) execute a task 4 times a month (update something)
c) execute a task daily (update something else)
So manually triggered, all three tasks are well done, but I have no idea how to set up a job queue to do this.

I don’t think the Job queue allows scheduling jobs to be executed at a specific time/date.

Instead, I’d create a maintenance script for those and program them on a cron job on the server, where you can schedule them as you like.

Firstly, I do not know what I can achieve this solution either.

Second, it wasn’t my intention that those who use/install my extension should additionally set up cron jobs at the server level. … :frowning:

Even if the Job queue allows not scheduling jobs to be executed at a specific time/date, it makes sense to do the task with a job queue. Even if it is to be triggered manually.

The job queue is normally used for running expensive tasks as soon as possible (but outside the current request). For setting up scheduled tasks, you are on your own. You could do something like Drupal’s poormanscron - choose a hook that’s called often (e.g. BeforePageDisplay), do some kind of cheap test of when it was called for the last time (this is trickier than it sounds, probably some combination of cache and DB flags), and if you have passed the scheduled time, push the tasks in the job queue (see the docs). An extension that exposes this functionality to other extensions would be pretty cool (I’m surprised it doesn’t exist yet).

If you want to write a maintenance script and call that from cron, see here.

Setting up scheduled tasks, that’s just a side issue. Mainly I am faced with the problem of setting up a job queue at all because I don’t understand the structure of the concept and therefore don’t know what to start with and what to pay attention to.

So far I haven’t found any useful instructions in the topic.

I have a really expensive task, the correct calculation of the text length for the account with text collecting societies. This requires extensive parsing of the raw data. This is precisely why it should be processed by a job queue, regardless of whether it is technically feasible to start this automatically and time-controlled.

Thus, the time-controlled tripping is at the moment only a “nice to have” feature. But how to use and set up job queues in general is the big problem for me.

The documentation suggests a simple way to set up the job queue.

I read the “documentation” first, before I came to this forum and I didn’t find it as “simple” as you may understand it. To say “read the documentation” and “it is simple” is NOT a helpful answer, indeed.

In the sextion “Create a Job subclass” it is said “class SynchroniseThreadArticleDataJob extends Job”. First of all, I am completely at a loss as to whether I should do “class SynchroniseMyClassJob extends Job” or “class SynchroniseMyClassJob extends SynchroniseThreadArticleDataJob”. Secondly, it affects me that here a solution is offered for the treatment of a single article. However, I want to calculate the text lengths of all articles in the Wiki and not only those of a certain article.

Right now I have the following code and the updateTextlength() function is started by clicking a button in an extensions form:

public static function updateTextlength() {

	$rows = DBConnect::getListForUpdateTextlength();

	foreach ( $rows as $row ) {
		$page_id = $row->page_id;
		$page_length = self::getTextLength( $page_id );

		$dbw = DBConnect::getWritingConnect();
		$dbw->update( 'my_page', [ 'page_length' => $page_length ], [ 'page_id' => $page_id ], __METHOD__, 'IGNORE' );
	}
}

public static function getTextLength( $page_id ) {

	$title     = Title::newFromID( $page_id );
	$wiki_page = WikiPage::newFromID( $page_id );

	$user = RequestContext::getMain()->getUser();
	$opt = ParserOptions::newFromUser( $user );

	global $wgParser;
	$parser = $wgParser->getFreshParser();

	$text = $wiki_page->getContent()->getNativeData();

	$text = $parser->preprocess( $text, $title, $opt );

	$out = $parser->parse( $text, $title, $opt, true, true );

	$text = $out->getText();

	return strlen( $text ); // finally!
}

It starts with the fact that in the so-called documentation the code lines
$limit = $this->params[‘limit’];
$cascade = $this->params[‘cascade’];
are written, without even the slightest explanation of what this is all about. So the so called documentation gives me even more questions than answers.

What the hell is “limit” und what the hell is “cascade”?

Am I really forced to create 40,000 job objects (assuming the wiki contains 40,000 articles), initialize them with “$title”???
Or can I do a job that does the calculations for - say - 500 articles at once?

The documentation you mention is less helpful, but all the more confusing.
I came to this forum because this documentation didn’t help me at all.

Another approach I’m thinking about is finding a hook that calculates the text length at each change/storage of an article and writes the result into my extension’s table.

But also this approach is dealing with job and job queue, which until now is as incomprehensible to me as Bohemian villages.

I tried my very best with the documentations information, but I have no idea if it works like this:

class UpdateStatusJob extends Job {
	/**
	 * @param int $page_id
	 * @param array $params job parameters (lives, usleep)
	 */
	function __construct( $page_id, $params ) {
		// Replace 'UpdateStatus' with an identifier for your job.
		parent::__construct( 'UpdateStatus', $page_id, $params );
		if ( !isset( $this->params['lives'] ) ) {
			$this->params['lives'] = 1;
		}
		if ( !isset( $this->params['usleep'] ) ) {
			$this->params['usleep'] = 0;
		}
	}

	/**
	 * Execute the job
	 *
	 * @return bool
	 */
	public function run() {
		if ( $this->params['usleep'] > 0 ) {
			usleep( $this->params['usleep'] );
		}
		if ( $this->params['lives'] > 1 ) {
			# $limit = $this->params['limit'];     // What to do with this?
			# $cascade = $this->params['cascade']; // What to do with this?
			$params = $this->params;
			$params['lives']--;
			$job = new self( $this->page_id, $params );
			JobQueueGroup::singleton()->push( $job );
		}

		return true;
	}

	private function getHitNumber( $page_id ) {
		$hit_number = 0;	// Get the $hit_number from somethere
		return $hit_number;
	}

	private function getTextLength( $page_id ) {
		$text_length = 0;	// Get the $text_length from somethere
		return $text_length;
	}

	/**
	 * int $hit_number
	 * int $text_length
	 * return int - return text status for given values
	 */
	private function getStatus( $hit_number, $text_length ) {
		$count_limit_1 = Settings::getInteger( 'count_limit_1' );
		$count_limit_2 = Settings::getInteger( 'count_limit_2' );
		$count_limit_3 = Settings::getInteger( 'count_limit_3' );
		$count_limit_4 = Settings::getInteger( 'count_limit_4' );
		$text_length_1 = Settings::getInteger( 'text_length_1' );
		$text_length_2 = Settings::getInteger( 'text_length_2' );
		$text_length_3 = Settings::getInteger( 'text_length_3' );
		$text_length_4 = Settings::getInteger( 'text_length_4' );

		$status_value = 0;	// Calculate the $status_value out of $hit_number and $text_length
		return $status_value;
	}

	private function storeStatus( $value ) {
		// Store the $status_value somethere
	}
}