Identifying semantic characteristics of user interaction datasets through application of a data analysis
Bibliographische Detailangaben
- Titel
- Identifying semantic characteristics of user interaction datasets through application of a data analysis
- verantwortlich
- ; ;
- Erscheinungsjahr
- 2020
- Medientyp
- Preprint
- Datenquelle
- LISSA
sid-179-col-lissa - Tags
- Tag hinzufügen
Zugang
Diese Ressource ist frei verfügbar.
- Details Klicken Sie hier, um den Inhalt der Registerkarte zu laden.
- Internformat Klicken Sie hier, um den Inhalt der Registerkarte zu laden.
author_facet |
Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues |
---|---|
author |
Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues |
spellingShingle |
Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Identifying semantic characteristics of user interaction datasets through application of a data analysis Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science |
author_sort |
ricardo césar gonçalves sant’ana |
spelling |
Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis |
doi_str_mv |
10.31229/OSF.IO/U8ATZ |
facet_avail |
Online |
format |
Preprint |
fullrecord |
blob:ai-179-E0089-16C-906 |
id |
ai-179-E0089-16C-906 |
institution |
FID-BBI-DE-23 |
imprint |
2020 |
imprint_str_mv |
2020 |
language |
English |
mega_collection |
LISSA |
match_str |
santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis |
publishDateSort |
2020 |
record_id |
E0089-16C-906 |
recordtype |
ai |
record_format |
ai |
source_id |
179 |
title |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_unstemmed |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_full |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_fullStr |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_full_unstemmed |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_short |
Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_sort |
identifying semantic characteristics of user interaction datasets through application of a data analysis |
topic |
Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science |
url |
http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ |
publishDate |
2020 |
physical |
|
description |
The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. |
collection |
sid-179-col-lissa |
format_de105 |
|
format_de14 |
|
format_de15 |
Preprint |
format_de520 |
|
format_de540 |
|
format_dech1 |
|
format_ded117 |
|
format_degla1 |
|
format_del152 |
|
format_del189 |
|
format_dezi4 |
|
format_dezwi2 |
|
format_finc |
Preprint |
format_nrw |
|
_version_ |
1792366285028327438 |
geogr_code |
not assigned |
last_indexed |
2024-03-01T22:54:39.837Z |
geogr_code_person |
not assigned |
openURL |
url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fkatalog.fid-bbi.de%3Agenerator&rft.title=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&rft.date=2020-04-01&genre=article&rft_id=info%3Adoi%2F10.31229%2FOSF.IO%2FU8ATZ&atitle=Identifying+semantic+characteristics+of+user+interaction+datasets+through+application+of+a+data+analysis&au=Fernando+de+Assis+Rodrigues&rft.language%5B0%5D=eng |
SOLR | |
_version_ | 1792366285028327438 |
author | Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues |
author_facet | Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues |
author_sort | ricardo césar gonçalves sant’ana |
collection | sid-179-col-lissa |
description | The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. |
doi_str_mv | 10.31229/OSF.IO/U8ATZ |
facet_avail | Online |
format | Preprint |
format_de105 | |
format_de14 | |
format_de15 | Preprint |
format_de520 | |
format_de540 | |
format_dech1 | |
format_ded117 | |
format_degla1 | |
format_del152 | |
format_del189 | |
format_dezi4 | |
format_dezwi2 | |
format_finc | Preprint |
format_nrw | |
geogr_code | not assigned |
geogr_code_person | not assigned |
id | ai-179-E0089-16C-906 |
imprint | 2020 |
imprint_str_mv | 2020 |
institution | FID-BBI-DE-23 |
language | English |
last_indexed | 2024-03-01T22:54:39.837Z |
match_str | santana2020identifyingsemanticcharacteristicsofuserinteractiondatasetsthroughapplicationofadataanalysis |
mega_collection | LISSA |
physical | |
publishDate | 2020 |
publishDateSort | 2020 |
record_format | ai |
record_id | E0089-16C-906 |
recordtype | ai |
source_id | 179 |
spelling | Ricardo César Gonçalves Sant’Ana Pedro Henrique Santos Bisi Fernando de Assis Rodrigues Social and Behavioral Sciences online social network datasets semantics data analysis bepress Cataloging and Metadata LIS Scholarship Archive social network Library and Information Science http://dx.doi.org/10.31229/OSF.IO/U8ATZ http://osf.io/u8atz/ The study goal is to identify semantics characteristics of datasets, at the moment of data collecting, from dataset's structures found on export data interfaces available on user’s interactions analysis tools, on Internet communication channels, and on statistical data access tools involved in a scientific journal management process, thru an application of data analysis and data model techniques. The research universe was delimited to exportable dataset's structures, found in journal publishing systems, online social networks statistics, search engines, and web analytics tools. The sample analyzed was restricted to dataset's structures, available in reports found in Open Journal Systems (OJS), Google Analytics, Google Search Console, Twitter Analytics, and Facebook Insights. These resources did not present any version control numbering, except by OJS (2.6). The data was collected in September' 2017 from "Electronic Journal Digital Skills for Family Farming" accounts. It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on those data resources, contemplating a systematically describing process of datasets, entities, and attributes related to the interaction between users and communications channels from a scientific journal. A total of 255 exportable datasets were found, distributed in 5 file formats: Comma-Separated Values (CSV) (82), Google Docs Spreadsheet File Format (69), Excel Microsoft Office Open XML Format Spreadsheet file (50), Portable Document Format (50), and Excel Binary File Format (3). Except for CSV, all other file formats were discarded, mainly because CSV is a machine-readable, open file format, and available in every export data interfaces analyzed. It was collected 82 CSV datasets from Google Analytics (50), Google Search (20), Open Journal Systems (7), Facebook Insights (3), and Twitter Analytics (2). In order to systematize the analysis, it was applied concepts from Entity-Relationship (ER) Model (Silberschatz, Korth, & Sudarshan, 2010) with entities to store data collected from i) services, ii) resources available in the services, iii) datasets available in the resources, and iv) attributes available in the datasets. Also, it was developed two auxiliary tables i) format, to store file format types available on datasets, and ii) data type to store data types: "a named (and in practice finite) set of values" (Date, 2016, p. 228). This applied ER Model provides a structure to store data from entities and attributes from each dataset. Applying this ER structure on data collected in this study was possible to identify 82 entities, 2280 attributes, with a subset of 1342 unique attribute labels. The ER structure and data was stored in a Google Spreadsheet file. After that, the file was uploaded to a DataBase Management System (DBMS) to a further data analysis. It was developed a Python script to reorder the data stored in DBMS to a new data structure, adopting the Online Analytical Processing (OLAP) cube as representation with Service (s), Entity (e), and Attribute (a) data used as dimensions (Gray, Bosworth, Lyaman, & Pirahesh, 1996; Inmon, 1996; Kimball & Ross, 2011). The collected data was reordered to OLAP cube dimensions by a pivot table process (Cornell, 2005). It was intended to observe on intersections of OLAP cube the characteristics shared internally and externally by services, entities and, attributes that can affect semantics aspects on data collecting. The results show that 88.69% of attributes doesn't it relate to any description about its content. Added to that, all attributes that share equal labels between distinct services came without description on collecting. This subset of attributes had a significant importance to interoperability applicability of those datasets, with a capability to distinguish the context on collecting process and also be part of a group of potential primary keys or unique fields, helping to build relationships between data from this sources, or even in a geographic, timing or linguistic determination. Identifying semantic characteristics of user interaction datasets through application of a data analysis |
spellingShingle | Ricardo César Gonçalves Sant’Ana, Pedro Henrique Santos Bisi, Fernando de Assis Rodrigues, Identifying semantic characteristics of user interaction datasets through application of a data analysis, Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science |
title | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_full | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_fullStr | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_full_unstemmed | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_short | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_sort | identifying semantic characteristics of user interaction datasets through application of a data analysis |
title_unstemmed | Identifying semantic characteristics of user interaction datasets through application of a data analysis |
topic | Social and Behavioral Sciences, online social network, datasets, semantics, data analysis, bepress, Cataloging and Metadata, LIS Scholarship Archive, social network, Library and Information Science |
url | http://dx.doi.org/10.31229/OSF.IO/U8ATZ, http://osf.io/u8atz/ |