24 research outputs found

    Computing Lyndon Arrays

    Get PDF
    There are at least two reasons to have an efficient algorithm for identifying all maximal Lyndon substrings in a string: first, in 2015, Bannai et al. introduced a linear algorithm to compute all runs in a string that relies on knowing all maximal Lyndon substrings of the input string, and second, in 2017, Franek et al. showed a linear co-equivalence of sorting suffixes and sorting maximal Lyndon substrings of a string (inspired by a novel suffix sorting algorithm of Baier). In 2016, Franek et al. presented a brief overview of algorithms for com- puting the Lyndon array that encodes the knowledge of maximal Lyndon substrings of the input string. It discussed four different algorithms. Two known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on iterated Duval’s algorithm for Lyndon factorization and a linear algorithmic scheme based on linear suffix sorting, computing the inverse suffix array, and applying the NSV (Next Smaller Value) algorithm. The overview also discusses a recursive version of Duval’s algorithm with a quadratic complexity and an algorithm emulating the NSV approach with a possible O(n log(n)) complexity. The authors at that time did not know of Baier’s algorithm. In 2017, Paracha proposed in her Ph.D. thesis an algorithm for the Lyndon array. The proposed algorithm was interesting as it emulated Farach’s recursive approach for computing suffix trees in linear time and introduced τ-reduction; which might be of independent interest. This was the starting point of this Ph.D. thesis. The primary aim is: (a) developing, analyzing, proving correct, and implementing in C++ a linear algorithm for computing the Lyndon array based on Baier’s suffix sorting; (b) analyzing, proving correct, and implementing in C++ the algorithm proposed by Paracha; and (c) empirically comparing the performance of these two algorithms with the iterative version of Duval’s algorithm.DissertationDoctor of Philosophy (PhD

    Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases

    Full text link
    Due to the recent improvements and wide availability of Large Language Models (LLMs), they have posed a serious threat to academic integrity in education. Modern LLM-generated text detectors attempt to combat the problem by offering educators with services to assess whether some text is LLM-generated. In this work, we have collected 124 submissions from computer science students before the creation of ChatGPT. We then generated 40 ChatGPT submissions. We used this data to evaluate eight publicly-available LLM-generated text detectors through the measures of accuracy, false positives, and resilience. The purpose of this work is to inform the community of what LLM-generated text detectors work and which do not, but also to provide insights for educators to better maintain academic integrity in their courses. Our results find that CopyLeaks is the most accurate LLM-generated text detector, GPTKit is the best LLM-generated text detector to reduce false positives, and GLTR is the most resilient LLM-generated text detector. We also express concerns over 52 false positives (of 114 human written submissions) generated by GPTZero. Finally, we note that all LLM-generated text detectors are less accurate with code, other languages (aside from English), and after the use of paraphrasing tools (like QuillBot). Modern detectors are still in need of improvements so that they can offer a full-proof solution to help maintain academic integrity. Further, their usability can be improved by facilitating a smooth API integration, providing clear documentation of their features and the understandability of their model(s), and supporting more commonly used languages.Comment: 18 pages total (16 pages, 2 reference pages). In submissio

    MSMI1:Towards a Validated SQL Misconceptions Instrument

    Get PDF

    Student Usage of Q&A Forums: Signs of Discomfort?

    Full text link
    Q&A forums are widely used in large classes to provide scalable support. In addition to offering students a space to ask questions, these forums aim to create a community and promote engagement. Prior literature suggests that the way students participate in Q&A forums varies and that most students do not actively post questions or engage in discussions. Students may display different participation behaviours depending on their comfort levels in the class. This paper investigates students' use of a Q&A forum in a CS1 course. We also analyze student opinions about the forum to explain the observed behaviour, focusing on students' lack of visible participation (lurking, anonymity, private posting). We analyzed forum data collected in a CS1 course across two consecutive years and invited students to complete a survey about perspectives on their forum usage. Despite a small cohort of highly engaged students, we confirmed that most students do not actively read or post on the forum. We discuss students' reasons for the low level of engagement and barriers to participating visibly. Common reasons include fearing a lack of knowledge and repercussions from being visible to the student community.Comment: To be published at ITiCSE 202

    Opportunities for Adaptive Experiments to Enable Continuous Improvement that Trades-off Instructor and Researcher Incentives

    Full text link
    Randomized experimental comparisons of alternative pedagogical strategies could provide useful empirical evidence in instructors' decision-making. However, traditional experiments do not have a clear and simple pathway to using data rapidly to try to increase the chances that students in an experiment get the best conditions. Drawing inspiration from the use of machine learning and experimentation in product development at leading technology companies, we explore how adaptive experimentation might help in continuous course improvement. In adaptive experiments, as different arms/conditions are deployed to students, data is analyzed and used to change the experience for future students. This can be done using machine learning algorithms to identify which actions are more promising for improving student experience or outcomes. This algorithm can then dynamically deploy the most effective conditions to future students, resulting in better support for students' needs. We illustrate the approach with a case study providing a side-by-side comparison of traditional and adaptive experimentation of self-explanation prompts in online homework problems in a CS1 course. This provides a first step in exploring the future of how this methodology can be useful in bridging research and practice in doing continuous improvement

    Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

    Full text link
    Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results. We conducted a formative study in an undergraduate computer science classroom (N=145) and a controlled experiment on Prolific (N=356) to explore the impact of four pedagogically informed guidance strategies and the interaction between student approaches and LLM responses. Direct LLM answers marginally improved performance, while refining student solutions fostered trust. Our findings suggest a nuanced relationship between the guidance provided and LLM's role in either answering or refining student input. Based on our findings, we provide design recommendations for optimizing learner-LLM interactions

    Curriculum analysis for data systems education.

    Get PDF
    The field of data systems has seen quick advances due to the popularization of data science, machine learning, and real-time analytics. In industry contexts, system features such as recommendation systems, chatbots and reverse image search require efficient infrastructure and data management solutions. Due to recent advances, it remains unclear (i) which topics are recommended to be included in data systems studies in higher education, (ii) which topics are a part of data systems courses and how they are taught, and (iii) which data-related skills are valued for roles such as software developers, data engineers, and data scientists. This working group aims to answer these points to explain the state of data systems education today and to uncover knowledge gaps and possible discrepancies between recommendations, course implementations, and industry needs. We expect the results to be applicable in tailoring various data systems courses to better cater to the needs of industry, and for teachers to share best practices

    Spatial Skills and Demographic Factors in CS1

    Get PDF
    Motivation Prior studies have established that training spatial skills may improve outcomes in computing courses. Very few of these studies have, however, explored the impact of spatial skills training on women or examined its relationship with other factors commonly explored in the context of academic performance, such as socioeconomic background and self-efficacy. Objectives In this study, we report on a spatial skills intervention deployed in a computer programming course (CS1) in the first year of a post-secondary program. We explore the relationship between various demographic factors, course performance, and spatial skills ability at both the beginning and end of the term. Methods Data was collected using a combination of demographic surveys, existing self-efficacy and CS1 content instruments, and the Revised PVST:R spatial skills assessment. Spatial skills were evaluated both at the beginning of the term and at the end, after spatial skills training was provided. Results While little evidence was found to link spatial skills to socioeconomic status or self-efficacy, both gender identity and previous experience in computing were found to be correlated to spatial skills ability at the start of the course. Women initially recorded lower spatial skills ability, but after training, the distribution of spatial skills scores for women approached that of men. Discussion These findings suggest that, if offered early enough, spatial skills training may be able to remedy some differences in background that impact performance in computing courses

    ABScribe: Rapid Exploration of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

    Full text link
    Exploring alternative ideas by rewriting text is integral to the writing process. State-of-the-art large language models (LLMs) can simplify writing variation generation. However, current interfaces pose challenges for simultaneous consideration of multiple variations: creating new versions without overwriting text can be difficult, and pasting them sequentially can clutter documents, increasing workload and disrupting writers' flow. To tackle this, we present ABScribe, an interface that supports rapid, yet visually structured, exploration of writing variations in human-AI co-writing tasks. With ABScribe, users can swiftly produce multiple variations using LLM prompts, which are auto-converted into reusable buttons. Variations are stored adjacently within text segments for rapid in-place comparisons using mouse-over interactions on a context toolbar. Our user study with 12 writers shows that ABScribe significantly reduces task workload (d = 1.20, p < 0.001), enhances user perceptions of the revision process (d = 2.41, p < 0.001) compared to a popular baseline workflow, and provides insights into how writers explore variations using LLMs

    Practice report: six studies of spatial skills training in introductory computer science

    Get PDF
    We have been training spatial skills for Computing Science students over several years with positive results, both in terms of the students’ spatial skills and their CS outcomes. The delivery and structure of the training has been modified over time and carried out at several institutions, resulting in variations across each intervention. This article describes six distinct case studies of training deliveries, highlighting the main challenges faced and some important takeaways. Our goal is to provide useful guidance based on our varied experience for any practitioner considering the adoption of spatial skills training for their students
    corecore