Named Entities Matching with Nazca

This IPython notebook show some features of the Python Nazca library :

  • original notebook : here !
  • date: 2014-07-01
  • author: Vincent Michel (vincent.michel@logilab.fr, vm.michel@gmail.com) @HowIMetYourData

Named Entities Matching is the process of recognizing elements in a text and matching it to different types (e.g. Person, Organization, Place). This may be related to Record Linkage (for linking entities from a text corpus and from a reference corpus) and to Named Entities Recognition.

In [1]:
from IPython.display import HTML
HTML('<iframe src=http://en.mobile.wikipedia.org/wiki/Named-entity_recognition?useformat=mobile width=700 height=350></iframe>')
Out[1]:

In Nazca, we provide tools to match named entities to existing corpus or reference database. This may be used for contextualizing your data, e.g. by recognizing in them elements from Dbpedia.

Token, Sentence and Tokenizer - nazca.utils.tokenizer

A Sentence is:

  • the start indice in the text
  • the end indice in the text
  • the indice of the sentence amonst all the other sentences
In [2]:
from nazca.utils.tokenizer import Sentence

sentence = Sentence(indice=0, start=0, end=38)
print sentence
Sentence(indice=0, start=0, end=38)

A Token is:

  • a word or part of a sentence
  • the start indice in the sentence
  • the end indice in the sentence
  • the sentence which contains the Token
In [3]:
from nazca.utils.tokenizer import Token

token = Token(word='Hello everyone this', start=0, end=20, sentence=Sentence(indice=0, start=0, end=38))
print token
Token(word='Hello everyone this', start=0, end=20, sentence=Sentence(indice=0, start=0, end=38))

Nazca provides a Tokenizer that creates tokens from a text:

class RichStringTokenizer(object):

    def iter_tokens(self, text):
        """ Iterate tokens over a text """

    def find_sentences(self, text):
        """ Find the sentences """

    def load_text(self, text):
        """ Load the text to be tokenized """

    def __iter__(self):
        """ Iterator over the text given in the object instantiation """

The tokenizer may be initialized with a minimum size and maximum size for the tokens. It will find all the sentences from a text using find_sentences and will create tokens in an iterative way.

In [4]:
from nazca.utils.tokenizer import RichStringTokenizer

text = 'Hello everyone, this is   me speaking. And me !Why not me ? Blup'
tokenizer = RichStringTokenizer(text, token_min_size=1, token_max_size=4)
sentences = tokenizer.find_sentences(text)

print sentences
[Sentence(indice=0, start=0, end=38), Sentence(indice=1, start=38, end=47), Sentence(indice=2, start=47, end=59), Sentence(indice=3, start=59, end=64)]
In [5]:
for token in tokenizer:
    print token
Token(word='Hello everyone this is', start=0, end=23, sentence=Sentence(indice=0, start=0, end=38))
Token(word='Hello everyone this', start=0, end=20, sentence=Sentence(indice=0, start=0, end=38))
Token(word='Hello everyone', start=0, end=14, sentence=Sentence(indice=0, start=0, end=38))
Token(word='Hello', start=0, end=5, sentence=Sentence(indice=0, start=0, end=38))
Token(word='everyone this is me', start=6, end=28, sentence=Sentence(indice=0, start=0, end=38))
Token(word='everyone this is', start=6, end=23, sentence=Sentence(indice=0, start=0, end=38))
Token(word='everyone this', start=6, end=20, sentence=Sentence(indice=0, start=0, end=38))
Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38))
Token(word='this is me speaking', start=16, end=37, sentence=Sentence(indice=0, start=0, end=38))
Token(word='this is me', start=16, end=28, sentence=Sentence(indice=0, start=0, end=38))
Token(word='this is', start=16, end=23, sentence=Sentence(indice=0, start=0, end=38))
Token(word='this', start=16, end=20, sentence=Sentence(indice=0, start=0, end=38))
Token(word='is me speaking', start=21, end=37, sentence=Sentence(indice=0, start=0, end=38))
Token(word='is me', start=21, end=28, sentence=Sentence(indice=0, start=0, end=38))
Token(word='is', start=21, end=23, sentence=Sentence(indice=0, start=0, end=38))
Token(word='me speaking', start=26, end=37, sentence=Sentence(indice=0, start=0, end=38))
Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38))
Token(word='speaking', start=29, end=37, sentence=Sentence(indice=0, start=0, end=38))
Token(word='And me Why', start=39, end=50, sentence=Sentence(indice=1, start=38, end=47))
Token(word='And me', start=39, end=45, sentence=Sentence(indice=1, start=38, end=47))
Token(word='And', start=39, end=42, sentence=Sentence(indice=1, start=38, end=47))
Token(word='me Why', start=43, end=50, sentence=Sentence(indice=1, start=38, end=47))
Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=47))
Token(word='Why not me', start=47, end=57, sentence=Sentence(indice=2, start=47, end=59))
Token(word='Why not', start=47, end=54, sentence=Sentence(indice=2, start=47, end=59))
Token(word='Why', start=47, end=50, sentence=Sentence(indice=2, start=47, end=59))
Token(word='not me', start=51, end=57, sentence=Sentence(indice=2, start=47, end=59))
Token(word='not', start=51, end=54, sentence=Sentence(indice=2, start=47, end=59))
Token(word='me', start=55, end=57, sentence=Sentence(indice=2, start=47, end=59))
Token(word='Blup', start=60, end=64, sentence=Sentence(indice=3, start=59, end=64))

Defining a source of entities - nazca.ner.sources

First, we should define the source of entities that should be retrieved in the tokens yield by the tokenizer.

A source herits from AbstractNerSource:

class AbstractNerSource(object):
    """ High-level source for Named Entities Recognition """

    def __init__(self, endpoint, query, name=None, use_cache=True, preprocessors=None):
    """ Initialise the class."""

    def add_preprocessors(self, preprocessor):
        """ Add a preprocessor """

    def recognize_token(self, token):
        """ Recognize a token """

    def query_word(self, word):
        """ Query a word for a Named Entities Recognition process """

It requires an endpoint and a query. And may recognize a word directly using query_word, or a token using recognize_token. In both cases, it returns a list of URIs/identifiers.

Sparql source

In [6]:
from nazca.ner.sources import NerSourceSparql
ner_source = NerSourceSparql('http://dbpedia.org/sparql',
                             '''SELECT distinct ?uri
                                WHERE{?uri rdfs:label "%(word)s"@en}''')
In [7]:
print ner_source.query_word('Victor Hugo')
[u'http://dbpedia.org/resource/Victor_Hugo', u'http://dbpedia.org/resource/Category:Victor_Hugo', u'http://sw.opencyc.org/2008/06/10/concept/en/VictorHugo', u'http://sw.opencyc.org/2008/06/10/concept/Mx4rve1ZXJwpEbGdrcN5Y29ycA', u'http://wikidata.dbpedia.org/resource/Q1459231', u'http://wikidata.dbpedia.org/resource/Q535', u'http://wikidata.dbpedia.org/resource/Q3557368']

Of course, you can use a more complex query reflecting restrictions and business logic

In [8]:
from nazca.ner.sources import NerSourceSparql
ner_source = NerSourceSparql('http://dbpedia.org/sparql',
                              '''SELECT distinct ?uri
                                 WHERE{?uri rdfs:label "%(word)s"@en .
                                       ?p foaf:primaryTopic ?uri}''')
In [9]:
print ner_source.query_word('Victor Hugo')
[u'http://dbpedia.org/resource/Victor_Hugo']

You may also used use_cache to True to keep all the previous results.

Rql source

In [10]:
from nazca.ner.sources import NerSourceRql

ner_source = NerSourceRql('http://www.cubicweb.org',
                          'Any U WHERE X cwuri U, X name "%(word)s"')
In [11]:
print ner_source.query_word('apycot')
[u'http://www.cubicweb.org/1310453', u'http://www.cubicweb.org/749162']

Lexicon source

A lexicon source is a source based on a dictionnary

In [12]:
from nazca.ner.sources import NerSourceLexicon


lexicon = {'everyone': 'http://example.com/everyone',
           'me': 'http://example.com/me'}
ner_source = NerSourceLexicon(lexicon)
In [13]:
print ner_source.query_word('me')
['http://example.com/me']

Defining preprocessors - nazca.ner.preprocessors

Preprocessors are used to cleanup/filter a token before recognizing it in a source. All preprocessors herit from AbstractNerPreprocessor

class AbstractNerPreprocessor(object):
    """ Preprocessor """

    def __call__(self, token):
        raise NotImplementedError

A token which is None after being returned by a preprocessor is not recognized by the source.

In [14]:
import nazca.ner.preprocessors as nnp

NerWordSizeFilterPreprocessor

This preprocessor remove token based on the size of the word.

In [15]:
preprocessor = nnp.NerWordSizeFilterPreprocessor(min_size=2, max_size=4)
token = Token('toto', 0, 4, None)
print token, '-->', preprocessor(token)
Token(word='toto', start=0, end=4, sentence=None) --> Token(word='toto', start=0, end=4, sentence=None)
In [16]:
token = Token('t', 0, 4, None)
print token, '-->', preprocessor(token)
Token(word='t', start=0, end=4, sentence=None) --> None
In [17]:
token = Token('tototata', 0, 4, None)
print token, '-->', preprocessor(token)
Token(word='tototata', start=0, end=4, sentence=None) --> None

NerLowerCaseFilterPreprocessor

This preprocessor remove token in lower case

In [18]:
preprocessor = nnp.NerLowerCaseFilterPreprocessor()
token = Token('Toto', 0, 4, None)
print token, '-->', preprocessor(token)
Token(word='Toto', start=0, end=4, sentence=None) --> Token(word='Toto', start=0, end=4, sentence=None)
In [19]:
token = Token('toto', 0, 4, None)
print token, '-->', preprocessor(token)
Token(word='toto', start=0, end=4, sentence=None) --> None

NerLowerFirstWordPreprocessor

This preprocessor lower the first word of each sentence if it is a stopword.

In [20]:
preprocessor = nnp.NerLowerFirstWordPreprocessor()
sentence = Sentence(0, 0, 20)
token = Token('Toto tata', 0, 4, sentence)
print token.word, '-->', preprocessor(token).word
Toto tata --> Toto tata
In [21]:
token = Token('The tata', 0, 4, sentence)
print token.word, '-->', preprocessor(token).word
The tata --> the tata
In [22]:
token = Token('Tata The', 0, 4, sentence)
print token.word, '-->', preprocessor(token).word
Tata The --> Tata The

NerStopwordsFilterPreprocessor

This preprocessor remove stopwords from the token. If split_words is False, it only removes token that are just a stopwords, in the other case, it will remove tokens that are entirely composed of stopwords.

In [23]:
preprocessor = nnp.NerStopwordsFilterPreprocessor(split_words=True)
token = Token('Toto', 0, 4, None)
print token.word, '-->', preprocessor(token).word
Toto --> Toto
In [24]:
token = Token('Us there', 0, 4, None)
print token.word, '-->', preprocessor(token)
Us there --> None
In [25]:
preprocessor = nnp.NerStopwordsFilterPreprocessor(split_words=False)
token = Token('Us there', 0, 4, None)
print token.word, '-->', preprocessor(token).word
Us there --> Us there

NerHashTagPreprocessor

This preprocessor cleanup hashtags.

In [26]:
preprocessor = nnp.NerHashTagPreprocessor()
token = Token('@BarackObama', 0, 4, None)
print token.word, '-->', preprocessor(token).word
@BarackObama --> BarackObama
In [27]:
token = Token('@Barack_Obama', 0, 4, None)
print token.word, '-->', preprocessor(token).word
@Barack_Obama --> Barack Obama

Defining the NER process - nazca.ner

The entire NER process may be implemented in a NerProcess

class NerProcess(object):

    def add_ner_source(self, process):
        """ Add a ner process """

    def add_preprocessors(self, preprocessor):
        """ Add a preprocessor """

    def add_filters(self, filter):
        """ Add a filter """

    def process_text(self, text):
        """ High level function for analyzing a text """

    def recognize_tokens(self, tokens):
        """ Recognize Named Entities from a tokenizer or an iterator yielding tokens. """

    def postprocess(self, named_entities):
        """ Postprocess the results by applying filters """

The main entry point of a NerProcess is the function process_text(), that yield triples (URI, source_name, token). The source_name may be set during source creation, and may be useful when working with different sources.

In [28]:
from nazca.ner import NerProcess

text = 'Hello everyone, this is   me speaking. And me.'
source = NerSourceLexicon({'everyone': 'http://example.com/everyone',
                           'me': 'http://example.com/me'})
ner = NerProcess((source,))
In [29]:
for infos in ner.process_text(text):
    print infos
('http://example.com/everyone', None, Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))

Use multi sources

Different sources may be given to the NER process.

In [30]:
source1 = NerSourceLexicon({'everyone': 'http://example.com/everyone',
                             'me': 'http://example.com/me'})
source2 = NerSourceLexicon({'me': 'http://example2.com/me'})
ner = NerProcess((source1, source2))

for infos in ner.process_text(text):
    print infos
('http://example.com/everyone', None, Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example2.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))
('http://example2.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))

It is possible to set the unique attribute, to keep the first appearance of a token in a source, in the sources order

In [31]:
ner = NerProcess((source1, source2), unique=True)

for infos in ner.process_text(text):
    print infos
('http://example.com/everyone', None, Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))
In [32]:
ner = NerProcess((source2, source1), unique=True)

for infos in ner.process_text(text):
    print infos
('http://example.com/everyone', None, Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38)))
('http://example2.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example2.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))

Pretty printing the output

The output can be pretty printed with hyperlinks.

In [33]:
from nazca.utils.dataio import HTMLPrettyPrint

ner = NerProcess((source1, source2))
named_entities = ner.process_text(text)
html = HTMLPrettyPrint().pprint_text(text, named_entities)
print html
Hello <a href="http://example.com/everyone">everyone</a>, this is   <a href="http://example2.com/me">me</a> speaking. And <a href="http://example2.com/me">me</a>.

Defining the filters - nazca.ner.filters

It is sometimes useful to filter the named entities found by a NerProcess before returning them. This may be done using filters, that herit from AbstractNerFilter

class AbstractNerFilter(object):
    """ A filter used for cleaning named entities results """

    def __call__(self, named_entities):
        raise NotImplementedError
In [34]:
import nazca.ner.filters as nnf

NerOccurenceFilter

This filter is based on the number of occurence of named entities in the results.

In [35]:
text = 'Hello everyone, this is   me speaking. And me.'
source1 = NerSourceLexicon({'everyone': 'http://example.com/everyone',
                            'me': 'http://example.com/me'})
source2 = NerSourceLexicon({'me': 'http://example2.com/me'})
_filter = nnf.NerOccurenceFilter(min_occ=2)

ner = NerProcess((source1, source2))
for infos in ner.process_text(text):
    print infos
('http://example.com/everyone', None, Token(word='everyone', start=6, end=14, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example2.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))
('http://example2.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))
In [36]:
ner = NerProcess((source1, source2), filters=(_filter,))
for infos in ner.process_text(text):
    print infos
('http://example.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example2.com/me', None, Token(word='me', start=26, end=28, sentence=Sentence(indice=0, start=0, end=38)))
('http://example.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))
('http://example2.com/me', None, Token(word='me', start=43, end=45, sentence=Sentence(indice=1, start=38, end=46)))

NerDisambiguationWordParts

This filter disambiguates named entities based on the words parts, i.e. if a token is included in a larger reocgnized tokens, we replace it by the larger token.

In [37]:
text = 'Hello Toto Tutu. And Toto.'
source = NerSourceLexicon({'Toto Tutu': 'http://example.com/toto_tutu',
                           'Toto': 'http://example.com/toto'})
_filter = nnf.NerDisambiguationWordParts()

ner = NerProcess((source,))
for infos in ner.process_text(text):
    print infos
('http://example.com/toto_tutu', None, Token(word='Toto Tutu', start=6, end=15, sentence=Sentence(indice=0, start=0, end=16)))
('http://example.com/toto', None, Token(word='Toto', start=21, end=25, sentence=Sentence(indice=1, start=16, end=26)))
In [38]:
ner = NerProcess((source,), filters=(_filter,))
for infos in ner.process_text(text):
    print infos
('http://example.com/toto_tutu', None, Token(word='Toto Tutu', start=6, end=15, sentence=Sentence(indice=0, start=0, end=16)))
('http://example.com/toto_tutu', None, Token(word='Toto', start=21, end=25, sentence=Sentence(indice=1, start=16, end=26)))

NerReplacementRulesFilter

This filter allow to define replacement rules for Named Entities.

In [39]:
text = 'Hello toto tutu. And toto.'
source = NerSourceLexicon({'toto tutu': 'http://example.com/toto_tutu',
                           'toto': 'http://example.com/toto'})
rules = {'http://example.com/toto': 'http://example.com/tata'}
_filter = nnf.NerReplacementRulesFilter(rules)

ner = NerProcess((source,))
for infos in ner.process_text(text):
    print infos
('http://example.com/toto_tutu', None, Token(word='toto tutu', start=6, end=15, sentence=Sentence(indice=0, start=0, end=16)))
('http://example.com/toto', None, Token(word='toto', start=21, end=25, sentence=Sentence(indice=1, start=16, end=26)))
In [40]:
ner = NerProcess((source,), filters=(_filter,))
for infos in ner.process_text(text):
    print infos
('http://example.com/toto_tutu', None, Token(word='toto tutu', start=6, end=15, sentence=Sentence(indice=0, start=0, end=16)))
('http://example.com/tata', None, Token(word='toto', start=21, end=25, sentence=Sentence(indice=1, start=16, end=26)))

NerRDFTypeFilter

This filter is based on the RDF type of objects

In [41]:
text = 'Hello Victor Hugo and Sony'
source = NerSourceSparql('http://dbpedia.org/sparql',
                         '''SELECT distinct ?uri
                            WHERE{?uri rdfs:label "%(word)s"@en .
                                  ?p foaf:primaryTopic ?uri}''')
_filter = nnf.NerRDFTypeFilter('http://dbpedia.org/sparql',
                             ('http://schema.org/Place',
                              'http://dbpedia.org/ontology/Agent',
                              'http://dbpedia.org/ontology/Place'))

ner = NerProcess((source,))
for infos in ner.process_text(text):
    print infos
(u'http://dbpedia.org/resource/Hello', None, Token(word='Hello', start=0, end=5, sentence=Sentence(indice=0, start=0, end=26)))
(u'http://dbpedia.org/resource/Victor_Hugo', None, Token(word='Victor Hugo', start=6, end=17, sentence=Sentence(indice=0, start=0, end=26)))
(u'http://dbpedia.org/resource/Sony', None, Token(word='Sony', start=22, end=26, sentence=Sentence(indice=0, start=0, end=26)))
In [42]:
ner = NerProcess((source,), filters=(_filter,))
for infos in ner.process_text(text):
    print infos
(u'http://dbpedia.org/resource/Victor_Hugo', None, Token(word='Victor Hugo', start=6, end=17, sentence=Sentence(indice=0, start=0, end=26)))
(u'http://dbpedia.org/resource/Sony', None, Token(word='Sony', start=22, end=26, sentence=Sentence(indice=0, start=0, end=26)))

Putting it all together

Get the data

In [43]:
import feedparser
from BeautifulSoup import BeautifulSoup  

data = feedparser.parse('http://rss.nytimes.com/services/xml/rss/nyt/World.xml')
entries = data.entries

Define the source

In [44]:
from nazca.ner.sources import NerSourceSparql

dbpedia_sparql_source = NerSourceSparql('http://dbpedia.org/sparql',
                                         '''SELECT distinct ?uri
                                            WHERE{
                                            ?uri rdfs:label "%(word)s"@en .
                                            ?p foaf:primaryTopic ?uri}''',
                                         'http://dbpedia.org/sparql',
                                        use_cache=True)
ner_sources = [dbpedia_sparql_source,]

Create the NER process

In [45]:
from nazca.ner.preprocessors import (NerLowerCaseFilterPreprocessor,
                                    NerStopwordsFilterPreprocessor)
from nazca.ner import NerProcess

preprocessors = [NerLowerCaseFilterPreprocessor(),
        	     NerStopwordsFilterPreprocessor()]
process = NerProcess(ner_sources, preprocessors=preprocessors)

Process the data

In [46]:
from IPython.display import HTML
from nazca.utils.dataio import HTMLPrettyPrint

pprint_html = HTMLPrettyPrint()

html = []
for ind, entry in enumerate(entries[:20]):
    text = BeautifulSoup(entry['summary_detail']['value']).text
    named_entities = process.process_text(text)
    html.append(pprint_html.pprint_text(text, named_entities))
HTML('<br/><br/>'.join(html))
Out[46]:
A huge throng of people, mostly young, took to Hong Kong’s streets Tuesday, defying Beijing’s dwindling tolerance for challenges to its control.

The post, until now largely ceremonial, could become much more important under Mr. Erdogan, who has held power in Turkey for a decade.

The discovery of the teenagers’ bodies in the West Bank prompted vows of retaliation by Israel, which blamed the Palestinian group Hamas for the killings.

The South African track star’s agent and friend testified that the couple’s relationship was strong and that he did not intend to kill her.

The Japanese prime minister announced that his government would reinterpret the antiwar Constitution to allow the armed forces to come to the aid of friendly nations.

The first clues that led to the grisly discovery of the bodies came only hours after their abduction in the West Bank was reported.

The lawmakers were under pressure to name an inclusive government as insurgents mount a violent challenge north and west of Baghdad.

The only viable political future for the country is federation. But America’s first priority is to see ISIS crushed.

President Petro O. Poroshenko said he would resume full-scale efforts to quash the pro-Russian uprising in eastern Ukraine.

Nicolas Sarkozy, the former French president, has been under scrutiny for possible financial irregularities in his 2007 campaign and for other alleged offenses.

Myanmar is enjoying some new diplomatic clout, leading China to court the country as Beijing presses its territorial claims in the South China Sea.

The last remaining African teams in the World Cup, Algeria and Nigeria, were eliminated on Monday, ensuring that the continent would once again remember the 2014 event for off-the-field squabbles.

As Hong Kong prepared for its annual pro-democracy march Tuesday, a survey of residents found more discontent than ever with the Chinese government’s policies toward the city, especially among the young.

President Petro O. Poroshenko ended a 10-day cease-fire, saying that rebels had not put down their weapons and had persisted in attacking government troops.

At least 22 people were killed in the firefight — all of them assailants, the military said. One soldier was injured.

The giant French bank admitted to transferring billions of dollars on behalf of Sudan and other countries the United States has blacklisted.

The former chief justice of the Constitutional Court was sentenced to life in prison for corruption, the heaviest sentence ever for graft in one of the most corrupt countries in the world.

A former aide to former Prime Minister Petr Necas who later married him was found guilty of abuse of power on Monday in a scandal that exposed their affair and toppled the government a year ago.

The court found that in 1973 an American naval officer provided Chilean officials with information on two Americans, which led to their executions as part of a coup that ousted President Salvador Allende.

Mayor Rob Ford of Toronto returned to his job after undergoing drug and alcohol treatment, saying, “My top priority will be rebuilding trust.”