当前位置: X-MOL 学术Culture and Education › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Do boys and girls write the same? Analysis of n-grams of morphological categories (¿Niños y niñas escriben igual? Análisis de n-gramas de categorías morfológicas)
Culture and Education ( IF 2.226 ) Pub Date : 2022-11-08 , DOI: 10.1080/11356405.2022.2121130
Sheila Queralt 1 , Jordi Cicres 2
Affiliation  

ABSTRACT

The objective of this study is to characterize writing samples in Catalan written by boys and girls in primary school (from seven to 12 years old) using syntactic patterns. The corpus contains 169 writings divided by sex (76 boys and 93 girls) with an average of 200 words and a total length of 33,763 words. From this corpus, we calculated the 40 n-grams of the most frequent morphological categories (bigrams, trigrams). The data were statistically analysed using ANOVA and Linear Discriminant Analysis, and the accuracy in predicting the writer’s gender in a cross-validation experiment was 60.4% using both bigrams and trigrams. When the children’s age was taken into account, the percentage of accuracy was higher than 70% in both the original classification and the cross-validation. The identification of the most discriminating bigrams and trigrams allowed us to determine that girls show a greater expressive capacity and superior syntactic maturity, and greater lexical and syntactic richness.



中文翻译:

男生和女生写的一样吗?形态类别的 n-gram 分析

摘要

本研究的目的是使用句法模式表征小学男孩和女孩(7 至 12 岁)用加泰罗尼亚语书写的写作样本。语料库包含 169 篇按性别划分的作品(76 个男孩和 93 个女孩),平均 200 个单词,总长度为 33,763 个单词。从这个语料库中,我们计算了最常见的形态类别(二元组、三元组)的 40 个 n-gram。使用方差分析和线性判别分析对数据进行统计分析,使用二元组和三元组在交叉验证实验中预测作者性别的准确率为 60.4%。当考虑到儿童的年龄时,原始分类和交叉验证的准确率都高于 70%。

更新日期:2022-11-08
down
wechat
bug