Ponedeljkov seminar računalništva in informatikenatisni
V ponedeljek, 1. avgusta 2022, bo ob 16.00 uri prek spletnih orodij na daljavo izvedeno
predavanje v okviru PONEDELJKOVEGA SEMINARJA RAČUNALNIŠTVA IN INFORMATIKE
Oddelkov za Informacijske znanosti in tehnologije UP FAMNIT in UP IAM.
ČAS/PROSTOR: 1. avgust 2022 ob 16.00 na daljavo
Na tokratnem seminarju bodo 3 magistrski študenti programa Podatkovnih znanosti predstavili teme svojih magistrskih del, zato bo seminar, izjemoma, trajal 2 uri.
PREDAVATELJ: Uroš SERGAŠ
Uroš Sergaš was born in Koper in 1994. He finished his Bachelor study of Computer Science at FAMNIT, University of Primorska in 2018. Alongside work he is enrolled in the master study programme of Data Science, also at FAMNIT, University of Primorska. His interests include various psychological, sociological and geopolitical topics that are interesting to analyse using statistical methods and/or machine learning on big data. His master thesis is also looking into such topic and his future interests include spending more time researching similar topics/fields.
NASLOV: Tribalism and fake news: descriptive and predictive models on how belief influences news trust
There are studies that have investigated the perception or the impact of trusting fake news. There are also articles describing how divisions in society arise and what the consequences are. However, there are few studies that have looked at the divisiveness of society on social networks and how it manifests itself in trust in (fake) news. Based on existing research, we created a questionnaire that combined demographic questions, questions about trust, the big five factors, a quiz where a person was asked to spot the fake news, and questions that asked to determine the tribe of an individual. We also set up a website that mimicked currently popular social networks. Using this, we recorded users' actions, which was an integral part of the individual's participation in this research. The total number of respondents was 138, 69 men and 69 women, mostly from Slovenia and elsewhere in Europe, but also from Asia and North America. The data were cleaned, normalised, factorised and processed. We used various techniques to create new features from the existing data, which helped us in the next step. This was to set up various models and to obtain the highest possible level of prediction accuracy through nested cross-validation. The experiments we carried out show that, based on an individual's behaviour on a social network, it is possible to determine which tribe he or she belongs to and which news stories they will believe. The results also show that exploring social science questions using machine learning has great potential for future work.
PREDAVATELJ: Đorđe KLISURA
Đorđe Klisura is a Master's student in the Data Science programme at Famnit. He graduated from UP FAMNIT, with a Bachelor's degree in Computer Science. He prepared his master's thesis abroad, spending 4 months at the University of Turku, in Finland.
NASLOV: Automated pipeline for binary classification of high-dimensional biomedical data
Machine learning (ML) is rapidly becoming a popular method for diagnosing illnesses in microbiome research. ML models may be used to improve our knowledge of current data structure variance and to make predictions about the latest data. Researchers have employed machine learning models to identify and better understand illnesses including liver cirrhosis, colorectal cancer, and inflammatory bowel disease. Many human illnesses and environmental processes are now understood to be the result of numerous bacterial populations rather than a single organism. Traditional statistical methods are helpful in detecting circumstances when a single organism is linked to a process. Machine learning approaches, on the other hand, allow for the inclusion of the overall structure of microbial communities and the identification of correlations between community structure and disease status. If communities can be consistently classified, ML approaches may be used to identify the microbial populations inside the communities that are essential for the classification. For our study we have the two datasets – the one that contains XCMS features for the training serum samples - 4,851-dimensional feature vector of intensities for (mass/charge, retention time) measured by an Agilent mass spectrometer and processed by the R-package XCMS and the other dataset that contains 118 samples classified as : i) Early Disseminated Lyme (EDL), ii) Early Localized Lyme (ELL), iii) Healthy Control Non-Endemic (HCN) and iv) Healthy Control Wormser (HCE1). We create the ML workflow with the models with high accuracy in predicting the outcome of the Lyme disease diagnosis.
PREDAVATELJ: Aleksandar AVDALOVIĆ
Aleksandar Avdalović is a second-year student of the master's programme in Data Science at UP FAMNIT. He will be presenting his Master’s thesis topic that he is completing under the supervision of Prof. Mário Alberto Zenha-Rela and Prof. Raul Barbosa from University of Coimbra and co-supervision of Prof. Marko Tkalčič from UP FAMNIT.
NASLOV: Translation of random forests to loop-free imperative programs for the purpose of formal verification
Throughout the years of society's digitalisation and transformation, the ethics of AI/ML-dependent critical systems have been continually questioned. In the use of such systems, accuracy may not be a valid metric because machine errors can lead to devastating consequences, including the loss of human life. This thesis takes a narrow approach to the aforementioned problem, namely the formal verification of the incredibly effective and widespread supervised learning method of today, random forest. Using the first-order logic statements and algorithm developed in this study, we intend to determine if the model has acquired factually-grounded prior knowledge. In other words, we wish to confirm that the model is reliable in light of prior knowledge. The translation of the random forest algorithm into a loop-free imperative program for verification purposes provides a technical challenge for this task. The core of the thesis is the algorithm provided in response to the research question of whether an algorithm for described translation exists. We shall theoretically prove the algorithm's correctness, compute its efficiency, and demonstrate its application on three datasets (one dataset from Cardiology and two datasets from Rheumatology). Another research problem will be practical, examining if the model has acquired particular prior knowledge in the fields of Cardiology and Rheumatology given the data. Results demonstrate that such an algorithm can be created and is highly efficient and that it can be applied realistically in the health sector to validate ML models based on the random forest algorithm.Seminar bo tokrat vodil prof. Michaël Mrissa in bo potekal v angleškem jeziku v Zoom "klepetalnici" na naslovu: