|
|
| Author |
Message |
illes.farkas at gmail.com Guest
|
|
| Back to top |
|
 |
|
|
 |
Platonides at gmail.com Guest
|
Posted: Tue Nov 18, 2008 9:27 pm Post subject: [Mediawiki-l] page abstracts for Yahoo: produced by humans o |
|
|
Farkas, Illes wrote:
| Quote: | Dear All,
Is the dump file containing the page abstracts for Yahoo produced by
human or machines ?
Thanks
|
It's producesd by a machine, extracting the beginning of all articles
(which are human-created).
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l |
|
| Back to top |
|
 |
brion at wikimedia.org Guest
|
Posted: Thu Nov 20, 2008 7:42 pm Post subject: [Mediawiki-l] page abstracts for Yahoo: produced by humans o |
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Platonides wrote:
| Quote: | Farkas, Illes wrote:
| Quote: | Dear All,
Is the dump file containing the page abstracts for Yahoo produced by
human or machines ?
Thanks
|
It's producesd by a machine, extracting the beginning of all articles
(which are human-created).
|
It's a machine attempting to pull the first two sentences of the article
as plaintext, sometimes more successfully than others. :)
I'm not sure these files are actually still being used, though.
You can find the code in:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/ActiveAbstract/
But I think the newer code here to pull the first sentence is more
reliable (requires current MediaWiki with new parser):
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/OpenSearchXml/
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkklvocACgkQwRnhpk1wk458QgCfQythKEvXp9ssRsILQOejNQ09
bWoAn31APe3W773YkBTy2UuKOE2drQJ9
=MGM8
-----END PGP SIGNATURE-----
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l |
|
| Back to top |
|
 |
|