Identifying semantic characteristics of user interaction datasets through application of a data analysis

Titel: Identifying semantic characteristics of user interaction datasets through application of a data analysis
verantwortlich: Ricardo César Gonçalves Sant’Ana; Pedro Henrique Santos Bisi; Fernando de Assis Rodrigues
Erscheinungsjahr: 2020
Medientyp: Preprint
Datenquelle: LISSA
sid-179-col-lissa
Tags: Tag hinzufügen

Zugang

Diese Ressource ist frei verfügbar.

Weblinks

author_facet	Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues
author	Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues
spellingShingle	Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Identifying semantic characteristics of user interaction datasets through application of a data analysis Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science
author_sort	ricardo césar gonçalves sant’ana
spelling	Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis
doi_str_mv	10.31229/OSF.IO/U8ATZ
facet_avail	Online
format	Preprint
fullrecord	blob:ai-179-E0089-16C-906
id	ai-179-E0089-16C-906
institution	FID-BBI-DE-23
imprint	2020
imprint_str_mv	2020
language	English
mega_collection	LISSA
match_str	santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis
publishDateSort	2020
record_id	E0089-16C-906
recordtype	ai
record_format	ai
source_id	179
title	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_unstemmed	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_fullStr	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full_unstemmed	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_short	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_sort	identifying semantic characteristics of user interaction datasets through application of a data analysis
topic	Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science
url	http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/
publishDate	2020
physical
description	The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination.
collection	sid-179-col-lissa
format_de105
format_de14
format_de15	Preprint
format_de520
format_de540
format_dech1
format_ded117
format_degla1
format_del152
format_del189
format_dezi4
format_dezwi2
format_finc	Preprint
format_nrw
_version_	1792366285028327438
geogr_code	not assigned
last_indexed	2024-03-01T22:54:39.837Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fkatalog.fid-bbi.de%3Agenerator&rft.title=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&rft.date=2020-04-01&genre=article&rft_id=info%3Adoi%2F10.31229%2FOSF.IO%2FU8ATZ&atitle=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&au=Fernando+de+Assis+Rodrigues&rft.language%5B0%5D=eng
SOLR
_version_	1792366285028327438
author	Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues
author_facet	Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues
author_sort	ricardo césar gonçalves sant’ana
collection	sid-179-col-lissa
description	The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination.
doi_str_mv	10.31229/OSF.IO/U8ATZ
facet_avail	Online
format	Preprint
format_de105
format_de14
format_de15	Preprint
format_de520
format_de540
format_dech1
format_ded117
format_degla1
format_del152
format_del189
format_dezi4
format_dezwi2
format_finc	Preprint
format_nrw
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-179-E0089-16C-906
imprint	2020
imprint_str_mv	2020
institution	FID-BBI-DE-23
language	English
last_indexed	2024-03-01T22:54:39.837Z
match_str	santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis
mega_collection	LISSA
physical
publishDate	2020
publishDateSort	2020
record_format	ai
record_id	E0089-16C-906
recordtype	ai
source_id	179
spelling	Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis
spellingShingle	Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Identifying semantic characteristics of user interaction datasets through application of a data analysis, Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science
title	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_fullStr	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_full_unstemmed	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_short	Identifying semantic characteristics of user interaction datasets through application of a data analysis
title_sort	identifying semantic characteristics of user interaction datasets through application of a data analysis
title_unstemmed	Identifying semantic characteristics of user interaction datasets through application of a data analysis
topic	Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science
url	http://dx.doi.org/10.31229/OSF.IO/U8ATZ, http://osf.io/u8atz/

Identifying semantic characteristics of user interaction datasets through application of a data analysis

Bibliographische Detailangaben

Zugang

Weblinks