24 research outputs found
Computing Lyndon Arrays
There are at least two reasons to have an efficient algorithm for identifying all maximal Lyndon substrings in a string: first, in 2015, Bannai et al. introduced a linear algorithm to compute all runs in a string that relies on knowing all maximal Lyndon substrings of the input string, and second, in 2017, Franek et al. showed a linear co-equivalence of sorting suffixes and sorting maximal Lyndon substrings of a string (inspired by a novel suffix sorting algorithm of Baier).
In 2016, Franek et al. presented a brief overview of algorithms for com- puting the Lyndon array that encodes the knowledge of maximal Lyndon substrings of the input string. It discussed four different algorithms. Two known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on iterated Duval’s algorithm for Lyndon factorization and a linear algorithmic scheme based on linear suffix sorting, computing the inverse suffix array, and applying the NSV (Next Smaller Value) algorithm. The overview also discusses a recursive version of Duval’s algorithm with a quadratic complexity and an algorithm emulating the NSV approach with a possible O(n log(n)) complexity. The authors at that time did not know of Baier’s algorithm. In 2017, Paracha proposed in her Ph.D. thesis an algorithm for the Lyndon array. The proposed algorithm was interesting as it emulated Farach’s recursive approach for computing suffix trees in linear time and introduced τ-reduction; which might be of independent interest.
This was the starting point of this Ph.D. thesis. The primary aim is: (a) developing, analyzing, proving correct, and implementing in C++ a linear algorithm for computing the Lyndon array based on Baier’s suffix sorting; (b) analyzing, proving correct, and implementing in C++ the algorithm proposed by Paracha; and (c) empirically comparing the performance of these two algorithms with the iterative version of Duval’s algorithm.DissertationDoctor of Philosophy (PhD
Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases
Due to the recent improvements and wide availability of Large Language Models
(LLMs), they have posed a serious threat to academic integrity in education.
Modern LLM-generated text detectors attempt to combat the problem by offering
educators with services to assess whether some text is LLM-generated. In this
work, we have collected 124 submissions from computer science students before
the creation of ChatGPT. We then generated 40 ChatGPT submissions. We used this
data to evaluate eight publicly-available LLM-generated text detectors through
the measures of accuracy, false positives, and resilience. The purpose of this
work is to inform the community of what LLM-generated text detectors work and
which do not, but also to provide insights for educators to better maintain
academic integrity in their courses. Our results find that CopyLeaks is the
most accurate LLM-generated text detector, GPTKit is the best LLM-generated
text detector to reduce false positives, and GLTR is the most resilient
LLM-generated text detector. We also express concerns over 52 false positives
(of 114 human written submissions) generated by GPTZero. Finally, we note that
all LLM-generated text detectors are less accurate with code, other languages
(aside from English), and after the use of paraphrasing tools (like QuillBot).
Modern detectors are still in need of improvements so that they can offer a
full-proof solution to help maintain academic integrity. Further, their
usability can be improved by facilitating a smooth API integration, providing
clear documentation of their features and the understandability of their
model(s), and supporting more commonly used languages.Comment: 18 pages total (16 pages, 2 reference pages). In submissio
Student Usage of Q&A Forums: Signs of Discomfort?
Q&A forums are widely used in large classes to provide scalable support. In
addition to offering students a space to ask questions, these forums aim to
create a community and promote engagement. Prior literature suggests that the
way students participate in Q&A forums varies and that most students do not
actively post questions or engage in discussions. Students may display
different participation behaviours depending on their comfort levels in the
class. This paper investigates students' use of a Q&A forum in a CS1 course. We
also analyze student opinions about the forum to explain the observed
behaviour, focusing on students' lack of visible participation (lurking,
anonymity, private posting). We analyzed forum data collected in a CS1 course
across two consecutive years and invited students to complete a survey about
perspectives on their forum usage. Despite a small cohort of highly engaged
students, we confirmed that most students do not actively read or post on the
forum. We discuss students' reasons for the low level of engagement and
barriers to participating visibly. Common reasons include fearing a lack of
knowledge and repercussions from being visible to the student community.Comment: To be published at ITiCSE 202
Opportunities for Adaptive Experiments to Enable Continuous Improvement that Trades-off Instructor and Researcher Incentives
Randomized experimental comparisons of alternative pedagogical strategies
could provide useful empirical evidence in instructors' decision-making.
However, traditional experiments do not have a clear and simple pathway to
using data rapidly to try to increase the chances that students in an
experiment get the best conditions. Drawing inspiration from the use of machine
learning and experimentation in product development at leading technology
companies, we explore how adaptive experimentation might help in continuous
course improvement. In adaptive experiments, as different arms/conditions are
deployed to students, data is analyzed and used to change the experience for
future students. This can be done using machine learning algorithms to identify
which actions are more promising for improving student experience or outcomes.
This algorithm can then dynamically deploy the most effective conditions to
future students, resulting in better support for students' needs. We illustrate
the approach with a case study providing a side-by-side comparison of
traditional and adaptive experimentation of self-explanation prompts in online
homework problems in a CS1 course. This provides a first step in exploring the
future of how this methodology can be useful in bridging research and practice
in doing continuous improvement
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception
Personalized chatbot-based teaching assistants can be crucial in addressing
increasing classroom sizes, especially where direct teacher presence is
limited. Large language models (LLMs) offer a promising avenue, with increasing
research exploring their educational utility. However, the challenge lies not
only in establishing the efficacy of LLMs but also in discerning the nuances of
interaction between learners and these models, which impact learners'
engagement and results. We conducted a formative study in an undergraduate
computer science classroom (N=145) and a controlled experiment on Prolific
(N=356) to explore the impact of four pedagogically informed guidance
strategies and the interaction between student approaches and LLM responses.
Direct LLM answers marginally improved performance, while refining student
solutions fostered trust. Our findings suggest a nuanced relationship between
the guidance provided and LLM's role in either answering or refining student
input. Based on our findings, we provide design recommendations for optimizing
learner-LLM interactions
Curriculum analysis for data systems education.
The field of data systems has seen quick advances due to the popularization of data science, machine learning, and real-time analytics. In industry contexts, system features such as recommendation systems, chatbots and reverse image search require efficient infrastructure and data management solutions. Due to recent advances, it remains unclear (i) which topics are recommended to be included in data systems studies in higher education, (ii) which topics are a part of data systems courses and how they are taught, and (iii) which data-related skills are valued for roles such as software developers, data engineers, and data scientists. This working group aims to answer these points to explain the state of data systems education today and to uncover knowledge gaps and possible discrepancies between recommendations, course implementations, and industry needs. We expect the results to be applicable in tailoring various data systems courses to better cater to the needs of industry, and for teachers to share best practices
Spatial Skills and Demographic Factors in CS1
Motivation Prior studies have established that training spatial skills may improve outcomes in computing courses. Very few of these studies have, however, explored the impact of spatial skills training on women or examined its relationship with other factors commonly explored in the context of academic performance, such as socioeconomic background and self-efficacy. Objectives In this study, we report on a spatial skills intervention deployed in a computer programming course (CS1) in the first year of a post-secondary program. We explore the relationship between various demographic factors, course performance, and spatial skills ability at both the beginning and end of the term. Methods Data was collected using a combination of demographic surveys, existing self-efficacy and CS1 content instruments, and the Revised PVST:R spatial skills assessment. Spatial skills were evaluated both at the beginning of the term and at the end, after spatial skills training was provided. Results While little evidence was found to link spatial skills to socioeconomic status or self-efficacy, both gender identity and previous experience in computing were found to be correlated to spatial skills ability at the start of the course. Women initially recorded lower spatial skills ability, but after training, the distribution of spatial skills scores for women approached that of men. Discussion These findings suggest that, if offered early enough, spatial skills training may be able to remedy some differences in background that impact performance in computing courses
ABScribe: Rapid Exploration of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models
Exploring alternative ideas by rewriting text is integral to the writing
process. State-of-the-art large language models (LLMs) can simplify writing
variation generation. However, current interfaces pose challenges for
simultaneous consideration of multiple variations: creating new versions
without overwriting text can be difficult, and pasting them sequentially can
clutter documents, increasing workload and disrupting writers' flow. To tackle
this, we present ABScribe, an interface that supports rapid, yet visually
structured, exploration of writing variations in human-AI co-writing tasks.
With ABScribe, users can swiftly produce multiple variations using LLM prompts,
which are auto-converted into reusable buttons. Variations are stored
adjacently within text segments for rapid in-place comparisons using mouse-over
interactions on a context toolbar. Our user study with 12 writers shows that
ABScribe significantly reduces task workload (d = 1.20, p < 0.001), enhances
user perceptions of the revision process (d = 2.41, p < 0.001) compared to a
popular baseline workflow, and provides insights into how writers explore
variations using LLMs
Practice report: six studies of spatial skills training in introductory computer science
We have been training spatial skills for Computing Science students over several years with positive results, both in terms of the students’ spatial skills and their CS outcomes. The delivery and structure of the training has been modified over time and carried out at several institutions, resulting in variations across each intervention. This article describes six distinct case studies of training deliveries, highlighting the main challenges faced and some important takeaways. Our goal is to provide useful guidance based on our varied experience for any practitioner considering the adoption of spatial skills training for their students