17 research outputs found

    Identifying Relationships between Scientific Datasets

    Get PDF
    Scientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between datasets can help scientists recall their original derivation history. For instance, if dataset A is contained in dataset B, then the connection between A and B could be that A was extended to create B. We present a relationship-identification methodology as a solution to this problem. To examine the feasibility of our approach, we articulated a set of relevant relationships, developed algorithms for efficient discovery of these relationships, and organized these algorithms into a new system called ReConnect to assist scientists in relationship discovery. We also evaluated existing alternative approaches that rely on flagging differences between two spreadsheets and found that they were impractical for many relationship-discovery tasks. Additionally, we conducted a user study, which showed that relationships do occur in real-world spreadsheets, and that ReConnect can improve scientists\u27 ability to detect such relationships between datasets. The promising results of ReConnect\u27s evaluation encouraged us to explore a more automated approach for relationship discovery. In this dissertation, we introduce an automated end-to-end prototype system, ReDiscover, that identifies, from a collection of datasets, the pairs that are most likely related, and the relationship between them. Our experimental results demonstrate the overall effectiveness of ReDiscover in predicting relationships in a scientist\u27s or a small group of researchers\u27 collections of datasets, and the sensitivity of the overall system to the performance of its various components

    Data Citation: A New Provenance Challenge

    Get PDF

    Curriculum analysis for data systems education.

    Get PDF
    The field of data systems has seen quick advances due to the popularization of data science, machine learning, and real-time analytics. In industry contexts, system features such as recommendation systems, chatbots and reverse image search require efficient infrastructure and data management solutions. Due to recent advances, it remains unclear (i) which topics are recommended to be included in data systems studies in higher education, (ii) which topics are a part of data systems courses and how they are taught, and (iii) which data-related skills are valued for roles such as software developers, data engineers, and data scientists. This working group aims to answer these points to explain the state of data systems education today and to uncover knowledge gaps and possible discrepancies between recommendations, course implementations, and industry needs. We expect the results to be applicable in tailoring various data systems courses to better cater to the needs of industry, and for teachers to share best practices

    Data systems education: curriculum recommendations, course syllabi, and industry needs.

    Get PDF
    Data systems have been an important part of computing curricula for decades, and an integral part of data-focused industry roles such as software developers, data engineers, and data scientists. However, the field of data systems encompasses a large number of topics ranging from data manipulation and database distribution to creating data pipelines and data analytics solutions. Due to the slow nature of curriculum development, it remains unclear (i) which data systems topics are recommended across diverse higher education curriculum guidelines, (ii) which topics are taught in higher education data systems courses, and (iii) which data systems topics are actually valued in data-focused industry roles. In this study, we analyzed computing curriculum guidelines, course contents, and industry needs regarding data systems to uncover discrepancies between them. Our results show, for example, that topics such as data visualization, data warehousing, and semi-structured data models are valued in industry, yet seldom taught in courses. This work allows professionals to further align curriculum guidelines, higher education, and data systems industry to better prepare students for their working life by focusing on relevant skills in data systems education

    Data systems education : curriculum recommendations, course syllabi, and industry needs

    Get PDF
    Data systems have been an important part of computing curricula for decades, and an integral part of data-focused industry roles such as software developers, data engineers, and data scientists. However, the field of data systems encompasses a large number of topics ranging from data manipulation and database distribution to creating data pipelines and data analytics solutions. Due to the slow nature of curriculum development, it remains unclear (i) which data systems topics are recommended across diverse higher education curriculum guidelines, (ii) which topics are taught in higher education data systems courses, and (iii) which data systems topics are actually valued in data-focused industry roles. In this study, we analyzed computing curriculum guidelines, course contents, and industry needs regarding data systems to uncover discrepancies between them. Our results show, for example, that topics such as data visualization, data warehousing, and semi-structured data models are valued in industry, yet seldom taught in courses. This work allows professionals to further align curriculum guidelines, higher education, and data systems industry to better prepare students for their working life by focusing on relevant skills in data systems education

    Green BIM Adoption,an Agile Approach

    No full text
    The energy consumption issues of the United States cannot be discussed without the inclusion of the energy needs in the building sector. Currently there are approximately 76 million residential structures and 5 million commercial structures in the United States [1]. As the population grows upward of 311 million people, the need for additional buildings will correspondingly increase [2]. Currently, buildings account for approximately 40% of total energy and 70% of electricity usage [4]. Additionally, the cost of energy in the United States has also been increasing. As the rest of world develops and industrializes, the demand for energy is going to increase due to the economic elasticity in the energy sector

    Data Citation: Giving Credit where Credit is Due

    No full text
    An increasing amount of information is being published in structured databases and retrieved using queries, raising the question of how query results should be cited. Since there are a large number of possible queries over a database, one strategy is to specify citations to a small set of frequent queries - citation views - and use these to construct citations to other "general" queries. We present three approaches to implementing citation views and describe alternative policies for the use of citation views. Extensive experiments using both synthetic and realistic citation views and queries show the tradeoffs between the approaches in terms of the time to generate citations and the size of the resulting citation

    Call Tracking - Technology Selection Model

    No full text
    In this paper we evaluate the selection of a call tracking feature for an existing marketing automation solution. This type of selection process has become much more complex over time based on the sheer volume of offerings available, different technical approaches to implementation, and service plans (features plus costs). In order to manage this complexity for decision making, we gathered a set of core requirements from the client, assembled a panel of experts to rank the importance of requirements, and then evaluated the potential solutions based on those criteria. The actual decision making methodology used in this study is the hierarchical decision model (HDM) [12] testing two alternative methods for evaluating the expert criteria ranking. In this case, by focusing on client requirements, rather than specific technologies or implementation approaches, allows us to greatly simplify this complex decision making process in the absence of a more detailed technical analysis of every possible solution

    Project Selection for New Product Development utilizing a Hierarchical Decision Model

    No full text
    Evaluation and selection of new projects is vital to the health of an organization. This paper explores the background of new project selection and the methodologies to evaluate new projects for selection. To illustrate the process, a new product selection evaluation is performed for North African Food Products (NAFP) using a Hierarchical Decision Model (HDM). NAFP is the pseudonym for an existing confidential food products company in North Africa. Two improved existing and eight new products were evaluated using the pair wise comparison method. HDM experts were consulted for construction of the model. Five products were selected for consideration by NAFP managers. The research methodology is discussed and appendixes for the data and interviews are provided. The new products selected for development were either improved existing or new products from NAFP‟s existing production lines. The results were validated by post survey interviews with the NAFP managers. Conclusions and suggestions for future research are also provided
    corecore