"The Law: Journal of Higher School of Economics" published the article "A Study in Complexity of Sentences Constituting Russian Federation Legal Acts" by Denis Saveliev, the IRL researcher.
The fact of official publication of regulatory acts it is not enough to ensure proper law enforcement. What is important is the clarity of legal texts, their accessibility for understanding. Linguistic and legal quality of the text are interconnected. Creation of a text that is good from the point of view of linguistics will contribute to a clearer formulation of ideas embodied in a legal or judicial act. Linguistic aid after the creation of the draft act is insufficient. It is necessary to take into account recommendations for the clear writing of texts at the stage of creating a legal act. The methodology and results of a study of Russian legislation texts carried out in order to improve law enforcement and mobilization, to reduce the time spent on the perception of legal norms, and to improve the quality of legal acts are presented.
A corpus of texts from 199 thousand legal acts was used. Its texts were segmented into 5.5 million sentences. Using artificial intelligence technologies, morphosyntactic markup of sentences with the allocation of parts of speech and their properties was carried out. On this basis, the metrics of the lexical and syntactic complexity of each sentence were calculated: length, lexical diversity, lengths of dependencies of parts of speech (Dependency Length), word lengths in syllables, etc. Metrics were selected that quantified the complexity of sentences in a legal text, which is different from the literary text. A technique is proposed for the automated search of sentences that can be attributed to the most difficult to read without the use of manual labor.
On the basis of this work, a dataset of poorly readable sentences of legal acts was created and published in the public domain, consisting of a wider selection — too long sentences and narrower — sentences that differ for the worse from the majority in three metrics at the same time. This corpus is analyzed statistically and the authorities that write more difficult are identified, and the subjects of documents in which there are more complex written sentences. It is shown that the number of long sentences in the legislation has significantly (5 times) increased in comparison with the first years of modern Russian statehood. Half of the sentences from acts of the Constitutional Court of the Russian Federation consist of more than 40 tokens. Using the NPMI method, the most frequently occurring phrases and phrases that characterize the subject of the text are selected from the body. The published corpus may become a subject for more detailed work on improving the legal technique and content of legal and judicial acts.
See full text (in Russian): pdf.