Identifying semantic characteristics of user interaction datasets through application of a data analysis

Bibliographische Detailangaben

Titel
Identifying semantic characteristics of user interaction datasets through application of a data analysis
verantwortlich
Ricardo César Gonçalves Sant’Ana; Pedro Henrique Santos Bisi; Fernando de Assis Rodrigues
Erscheinungsjahr
2020
Medientyp
Preprint
Datenquelle
LISSA
sid-179-col-lissa
Tags
Tag hinzufügen

Zugang

Diese Ressource ist frei verfügbar.

author_facet Ricardo César Gonçalves Sant’Ana
Pedro Henrique Santos Bisi
Fernando de Assis Rodrigues
Ricardo César Gonçalves Sant’Ana
Pedro Henrique Santos Bisi
Fernando de Assis Rodrigues
author Ricardo César Gonçalves Sant’Ana
Pedro Henrique Santos Bisi
Fernando de Assis Rodrigues
spellingShingle Ricardo César Gonçalves Sant’Ana
Pedro Henrique Santos Bisi
Fernando de Assis Rodrigues
Identifying semantic characteristics of user interaction datasets through application of a data analysis
Social and Behavioral Sciences
online social network
datasets
semantics
data analysis
bepress
Cataloging and Metadata
LIS Scholarship Archive
social network
Library and Information Science
author_sort ricardo césar gonçalves sant’ana
spelling Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis
doi_str_mv 10.31229/OSF.IO/U8ATZ
facet_avail Online
format Preprint
fullrecord blob:ai-179-E0089-16C-906
id ai-179-E0089-16C-906
institution FID-BBI-DE-23
imprint 2020
imprint_str_mv 2020
language English
mega_collection LISSA
match_str santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis
publishDateSort 2020
record_id E0089-16C-906
recordtype ai
record_format ai
source_id 179
title Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_unstemmed Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_fullStr Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full_unstemmed Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_short Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_sort identifying semantic characteristics of user interaction datasets through application of a data analysis
topic Social and Behavioral Sciences
online social network
datasets
semantics
data analysis
bepress
Cataloging and Metadata
LIS Scholarship Archive
social network
Library and Information Science
url http://dx.doi.org/10.31229/OSF.IO/U8ATZ
http://osf.io/u8atz/
publishDate 2020
physical
description The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination.
collection sid-179-col-lissa
format_de105
format_de14
format_de15 Preprint
format_de520
format_de540
format_dech1
format_ded117
format_degla1
format_del152
format_del189
format_dezi4
format_dezwi2
format_finc Preprint
format_nrw
_version_ 1792366285028327438
geogr_code not assigned
last_indexed 2024-03-01T22:54:39.837Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fkatalog.fid-bbi.de%3Agenerator&rft.title=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&rft.date=2020-04-01&genre=article&rft_id=info%3Adoi%2F10.31229%2FOSF.IO%2FU8ATZ&atitle=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&au=Fernando+de+Assis+Rodrigues&rft.language%5B0%5D=eng
SOLR
_version_ 1792366285028327438
author Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues
author_facet Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues
author_sort ricardo césar gonçalves sant’ana
collection sid-179-col-lissa
description The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination.
doi_str_mv 10.31229/OSF.IO/U8ATZ
facet_avail Online
format Preprint
format_de105
format_de14
format_de15 Preprint
format_de520
format_de540
format_dech1
format_ded117
format_degla1
format_del152
format_del189
format_dezi4
format_dezwi2
format_finc Preprint
format_nrw
geogr_code not assigned
geogr_code_person not assigned
id ai-179-E0089-16C-906
imprint 2020
imprint_str_mv 2020
institution FID-BBI-DE-23
language English
last_indexed 2024-03-01T22:54:39.837Z
match_str santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis
mega_collection LISSA
physical
publishDate 2020
publishDateSort 2020
record_format ai
record_id E0089-16C-906
recordtype ai
source_id 179
spelling Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis
spellingShingle Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Identifying semantic characteristics of user interaction datasets through application of a data analysis, Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science
title Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_fullStr Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full_unstemmed Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_short Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_sort identifying semantic characteristics of user interaction datasets through application of a data analysis
title_unstemmed Identifying semantic characteristics of user interaction datasets through application of a data analysis
topic Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science
url http://dx.doi.org/10.31229/OSF.IO/U8ATZ, http://osf.io/u8atz/