Elena Beretta

Elena BerettaPolitecnico di Torino

Elena Beretta received the M.Sc. in Cooperation, Development and Innovation in Global Economy from University of Turin (Department of Economics and Statistics "Cognetti de Martiis") in September 2016; she worked on an experimental thesis investigating the diffusion of innovation by agent-based models.

She earned a second level Master degree in Data Science for Complex Economic Systems at the Collegio Carlo Alberto in Moncalieri (TO), in June 2017.
From April 2017 to September 2017 she got involved in an internship at DESPINA - Laboratory on Big Data Analytics at the Department of Economics and Statistics of the University Study of Turin – working on the NoVELOG project ("New Cooperative Business Models and Guidance for Sustainable City Logistics").

In November 2017 she’s starting to collaborate as PhD student, and effective member, with Nexa Center for Internet & Society at Polytechnic of Turin, by working on a project on Data and Algorithms Ethics under the supervision of Professor Juan Carlos De Martin.

Her current research focus for the PhD thesis is on Fair Machine Learning field.

Many support decision software systems today make use of Artificial intelligence techniques fed with large amount of citizens’ data, to generate recommendations or to make automatic decisions. Credit scores, loan granting decision systems, institution rankings, employment application screeners, workplace wellness/control programs, are just some examples. The decisions generated by these software systems have an increasingly relevance on many facets of life: based on the collected data, their algorithms can deny a loan, or reject a job application.

Given such relevance and impact on people life, a fundamental question is "What does it mean -in an operational way- for AI-software to be ethical?

A first aspect emerging from this question is to find a standard by which to measure the ethical impact. For example, most of the problems related to results bias arise from dataset concerns. The sampling theory, or more generally the statistical theory, provides different methods of measurement of distortion, where in statistics the term bias is used with regard to two concepts, the sample and the estimator. However, what statistical theory does not provide are benchmark’s metrics to assess whether an outcome or a sample should be considered ethical or not.

In addition, most of the collected data do not have labels or descriptions referring to their context and acquisition process, neither measures of their representativeness towards the source population, or their quality.

On the base of these motivations, and building on current research efforts in this direction, in my PhD work I propose a conceptual and operational data labeling framework, the “Ethical and Socially-Aware Labels (EASAL)”. At the current state of our research, the framework is based on three metrics:
i) Disproportion;
ii) collinearity and correlation;
iii) inherent data quality.

Experiments on real datasets will show benefits and limitations of this approach.