Data poisoning: how artists are sabotaging AI to take revenge on image generators

Over the break we read and loved this article from The Conversation, originally published on 18 December 2023. We hope you do too!

T.J. Thomson, Author provided

T.J. Thomson, RMIT University and Daniel Angus, Queensland University of Technology

Imagine this. You need an image of a balloon for a work presentation and turn to a text-to-image generator, like Midjourney or DALL-E, to create a suitable image.

You enter the prompt: “red balloon against a blue sky” but the generator returns an image of an egg instead. You try again but this time, the generator shows an image of a watermelon.

What’s going on?

The generator you’re using may have been “poisoned”.

What is ‘data poisoning’?

Text-to-image generators work by being trained on large datasets that include millions or billions of images. Some generators, like those offered by Adobe or Getty, are only trained with images the generator’s maker owns or has a licence to use.

But other generators have been trained by indiscriminately scraping online images, many of which may be under copyright. This has led to a slew of copyright infringement cases where artists have accused big tech companies of stealing and profiting from their work.

This is also where the idea of “poison” comes in. Researchers who want to empower individual artists have recently created a tool named “Nightshade” to fight back against unauthorised image scraping.

The tool works by subtly altering an image’s pixels in a way that wreaks havoc to computer vision but leaves the image unaltered to a human’s eyes.

If an organisation then scrapes one of these images to train a future AI model, its data pool becomes “poisoned”. This can result in the algorithm mistakenly learning to classify an image as something a human would visually know to be untrue. As a result, the generator can start returning unpredictable and unintended results.

Symptoms of poisoning

As in our earlier example, a balloon might become an egg. A request for an image in the style of Monet might instead return an image in the style of Picasso.

Some of the issues with earlier AI models, such as trouble accurately rendering hands, for example, could return. The models could also introduce other odd and illogical features to images – think six-legged dogs or deformed couches.

The higher the number of “poisoned” images in the training data, the greater the disruption. Because of how generative AI works, the damage from “poisoned” images also affects related prompt keywords.

For example, if a “poisoned” image of a Ferrari is used in training data, prompt results for other car brands and for other related terms, such as vehicle and automobile, can also be affected.

Nightshade’s developer hopes the tool will make big tech companies more respectful of copyright, but it’s also possible users could abuse the tool and intentionally upload “poisoned” images to generators to try and disrupt their services.

Is there an antidote?

In response, stakeholders have proposed a range of technological and human solutions. The most obvious is paying greater attention to where input data are coming from and how they can be used. Doing so would result in less indiscriminate data harvesting.

This approach does challenge a common belief among computer scientists: that data found online can be used for any purpose they see fit.

Other technological fixes also include the use of “ensemble modeling” where different models are trained on many different subsets of data and compared to locate specific outliers. This approach can be used not only for training but also to detect and discard suspected “poisoned” images.

Audits are another option. One audit approach involves developing a “test battery” – a small, highly curated, and well-labelled dataset – using “hold-out” data that are never used for training. This dataset can then be used to examine the model’s accuracy.

Strategies against technology

So-called “adversarial approaches” (those that degrade, deny, deceive, or manipulate AI systems), including data poisoning, are nothing new. They have also historically included using make-up and costumes to circumvent facial recognition systems.

Human rights activists, for example, have been concerned for some time about the indiscriminate use of machine vision in wider society. This concern is particularly acute concerning facial recognition.

Systems like Clearview AI, which hosts a massive searchable database of faces scraped from the internet, are used by law enforcement and government agencies worldwide. In 2021, Australia’s government determined Clearview AI breached the privacy of Australians.

In response to facial recognition systems being used to profile specific individuals, including legitimate protesters, artists devised adversarial make-up patterns of jagged lines and asymmetric curves that prevent surveillance systems from accurately identifying them.

There is a clear connection between these cases and the issue of data poisoning, as both relate to larger questions around technological governance.

Many technology vendors will consider data poisoning a pesky issue to be fixed with technological solutions. However, it may be better to see data poisoning as an innovative solution to an intrusion on the fundamental moral rights of artists and users.

T.J. Thomson, Senior Lecturer in Visual Communication & Digital Media, RMIT University and Daniel Angus, Professor of Digital Communication, Queensland University of Technology

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Being Prompt with Prompt Engineering

Krista Yuen, The University of Waikato
Danielle Degiorgio, Edith Cowan University

Warning – ChatGPT and DALL-E were used in the making of this post.

Experienced AI users have been experimenting with the art of prompt engineering to ensure they are getting the most useful and accurate responses from generative AI systems. As a result, they have created and synthesised techniques to ensure that they are getting the best output from these systems. Crafting an effective prompt, also known as prompt engineering, is arguably a skill that may be needed in a world of information seeking, as the trend of AI continues to grow.

Whilst AI continues to improve, and many systems now encourage more precise prompting from their users, AI is still only as good as the prompts they are given. Essentially, if you want quality content, you must use quality prompts. The structure of a solid prompt requires critical thinking and reflection in the design of your prompt, as well as how you interact with the output. While there are many ways to structure a prompt, these are the three more important things to remember when constructing your prompt:

Context

  • Provide background information
  • Set the scene
  • Use exact keywords
  • Specify audience
  • You could also give the AI tool a role to play, e.g. “Act as an expert community organiser!”

Task

  • Clearly define tasks
  • Be as specific as possible about exactly what you want the AI tool to do
  • Break down the steps involved if needed
  • Put in any extra detail, information or text that the AI tool needs

Output

  • Specify desired format, style, and tone
  • Specify inclusions and exclusions
  • Tell it how you would like the results formatted, e.g. a table, bullet point list or even in HTML or CSS.

Example prompt for text generation e.g., ChatGPT

You are an expert marketing and communications advisor working on a project for dolphin conservation and need to create a comprehensive marketing proposal. The goal is to raise awareness and promote actions that contribute to the protection of dolphins and their habitats. The target audience includes environmental activists and the general public who might be interested in marine conservation.

The proposal should highlight the current challenges faced by dolphins, including threats like pollution, overfishing, and habitat destruction. It should emphasise the importance of dolphins to marine ecosystems and their appeal to people due to their intelligence and playful nature. It should include five bullet points for each area: campaign objectives, target audience, key messages, marketing channels, content ideas, partnerships, budget estimation, timeline, and evaluation metrics.

Please structure it in a format that is easy to present to stakeholders, such as a PowerPoint presentation or a detailed report. It should be professionally written, persuasive, and visually appealing with suggestions for imagery and design elements that align with the theme of dolphin conservation.

Example prompt for image generation e.g., DALL∙E

Create a captivating and colourful image for a marketing campaign focused on dolphin conservation. The setting is a serene, crystal-clear ocean under a bright blue sky with soft, fluffy clouds. In the foreground, a group of three playful dolphins is leaping gracefully out of the water. These dolphins should appear joyful and full of life, symbolising the beauty and intelligence of marine life.

The central dolphin, a majestic bottlenose, is at the peak of its jump, with water droplets sparkling around it like diamonds under the sunlight. On the left, a smaller, younger dolphin, mirrors its movement, adding a sense of playfulness and family. To the right, another dolphin is partially submerged, preparing to leap. In the background, a distant, unspoiled coastline with lush greenery and a few palm trees provides a natural, pristine environment. This idyllic scene should evoke a sense of peace and the importance of preserving such beautiful natural habitats.

This image was created with DALL·E 2 via ChatGPT 4 (November 22 Version).

Not getting the results you want?

If your first response has not given you exactly what you need, remember you can try and try again! You may need to add more guidelines to your prompt:

  • Try adding more words or ideas that might be needed. What kind of instructions might make your prompt obtain more?
  • Provide some more context, like “I’m not an expert and I need this explained to me in simpler terms.”
  • Do you need more detailed information that will make your response more relevant and useful?

Want to learn more?

There are a few places you can go to learn more about developing good prompts for your generative AI tool:

LinkedIn Learning: How to write an effective prompt for AI

Learn Prompting: Prompt Engineering Guide

Is ChatGPT cheating? The complexities of AI use in tertiary education. 

Craig Wattam, Rachael Richardson-Bullock

Te Mātāpuna Library & Learning Services, Auckland University of Technology

“The university is at the stage of reviewing its rules for misconduct because they really don’t apply as much anymore.” 

– Tom, Student Advocate, on the Noisy Librarian Podcast

Cheating in the tertiary education sector is not new. Generative AI technologies, while presenting enormous opportunity, are the latest threat to academic integrity. AI tools like Chat GPT blur the lines between human-generated and machine-generated content. They present a raft of issues, including ambiguous standards for legitimate and illegitimate use, variations in acceptance and usage across discipline contexts, and little or inadequate evidence of their use. A nuanced response is required.

Fostering academic integrity through AI literacy

Academic integrity research argues pervasively that a systematic, multi-stakeholder, networked approach is the best way to foster a culture of academic integrity (Kenny & Eaton, 2022). Fortunately, this is also the way to foster ethical, critical reflective and skilful use of AI tools, in other words, a culture of AI literacy. Ironically, to support integrity, we must shift our attention away from merely preventing cheating to ensuring that students learn how to use these tools responsibly. Thus, we can ensure that our focus is on learning and helping students develop the skills necessary to navigate the digital age ethically and effectively.

Hybrid future 

So, the challenge of AI is an opportunity and an imperative. As we humans continue to interact with technology in high complexity systems, so the way we approach academic work will continue to develop.  Rather than backing away or banning AI technologies from the classroom all together, forging a hybrid future, where AI tools play a role in setting students up for success, will benefit both staff and students.

Information and academic literacy practitioners, and other educators, will need to be dexterous enough to respond to the eclipsing, revision, and constant evolution of some of our most ingrained concepts. Concepts such as authorship, originality, plagiarism, and acknowledgement. 

What do students say? 

This was the topic of discussion in a recent episode of the Noisy Librarian Podcast. Featured guests were an academic and a student – a library Learning Advisor and a Student Advocate. The guests delved into the complexities of academic integrity in today’s digital landscape. Importantly, their discussion underscored the need for organizations to understand and hear from students about how AI is impacting them, how they are using it, and what they might be concerned about. Incorporating the student voice and understanding student perspectives is crucial for developing guidelines and support services that are truly effective and relevant.  

Forget supervillains! 

Both podcast guests emphasised that few cases of student misconduct involve serial offenders or super villains who have made a career out of gaming the system. Rather than intending to cheat, more closely, misconduct is related to a lack of knowledge or skill. Meantime, universities are facing challenges – needing to adapt their misconduct rules and provide clear guidelines on the acceptable use of AI tools. 

Listen to the Noisy Librarian podcast episode Is ChatGPT cheating? The complexities of AI use in tertiary education

Podbean

Or find us on Google Podcasts, Apple Podcasts or I Heart Radio

Reference:

Kenny, N., & Eaton, S. E. (2022). Academic Integrity Through a SoTL Lens and 4M Framework: An Institutional Self-Study. In Academic Integrity in Canada (pp. 573–592). Springer, Cham. https://doi.org/10.1007/978-3-030-83255-1_30

The power of large language models to augment human learning 

By Fernando Marmolejo-Ramos, Tim Simon and Rhoda Abadia; University of South Australia

In early 2023, OpenAI’s ChatGPT became the buzzword in the Artificial Intelligence (AI) world. A cutting-edge large language model (LLM) that is part of the revolutionary generative AI movement. Google’s Bard and Anthropic’s Claude are other notable LLMs in this league, transforming the way we interact with AI applications. LLMs are super-sized dynamic libraries that can respond to queries, abstract text, and even tackle complex mathematical problems. Ever since ChatGPT’s debut, there has been an overwhelming surge of academic papers and grey literature (including blogs and pre-prints) both praising and critiquing the impact of LLMs. In this discussion, we aim to emphasise the importance of recognising LLMs as technologies that can augment human learning. Through examples, we illustrate how interacting with LLMs can foster AI literacy and augment learning, ultimately boosting innovation and creativity in problem-solving scenarios. 

In the field of education, LLMs have emerged as powerful tools with the potential to enhance the learning experience for both students and teachers. They can be used as powerful supplements for reading, research, and personalised tutoring, benefiting students in various ways. 

For students, LLMs offer the convenience of summarising lengthy textbook chapters and locating relevant literature with tools like ChatPDF, ChatDOC, Perplexity, or Consensus. We believe that these tools not only accelerate students’ understanding of the material but also enable a deeper grasp of the subject matter. LLMs can also act as personalised tutors that are readily available to answer students’ queries and provide guided explanations. 

For teachers, LLMs may help in reducing repetitive tasks like grading assignments. By analysing students’ essays and short answers, they can assess coherence, reasoning, and plagiarism, thereby saving valuable time for meaningful teaching. Additionally, LLMs have the potential to suggest personalised feedback and improvements for individual students, enhancing the overall learning experience. The caveat, though, is that human judgement is to be ‘in-the-loop’ as LLMs have limited understanding of teaching methodologies, curriculum, and student needs. UNESCO has recognised this importance and produced a short guide on the use of LLMs in higher education, providing valuable insights for educators (see table on page 10). 

Achieving remarkable results with LLMs is made possible through the art of “prompt engineering” (PE) – a term referring to the art of crafting effective prompts to guide these language models towards informed responses. For instance, a prompt could be as straightforward as “rewrite the following = X,” where X represents the text to be rephrased. Alternatively, a more complex prompt like “explain what Z is in layman’s terms?” can help clarify intricate concepts. In Figure 1, we present an example demonstrating how students can use specific prompts to learn statistical concepts while simultaneously gaining familiarity with R coding.

Figure 1.  Example of a prompt given to ChatGPT to create R code. The plot on the right shows the result when the code is run in R. Note how the LLM features good code commenting practices and secures reproducibility via the ‘set.seed( )’ function.

Additionally, Figure 2 reveals that not all LLMs offer identical responses to the same prompts, highlighting the uniqueness of each model’s output.

Figure 2. Example of how ChatGPT (left) and Claude (right) respond to the same prompt. Claude seemed to give a better response than ChatGPT and provided an explanation of what was done.

However, the most interesting aspect of PE lies in formulating appropriate questions for the LLMs, making it a matter of problem formulation. We believe this crucial element is at the core of effective prompting in educational contexts. Seen this way, it’s clear that good prompts should have context for the question being asked, as context provides reference points for the intended meaning. For example, a teacher or student could design a prompt like: “Given the information in texts A and B, produce a text that discusses concepts a1 and a2 in text A in terms of concepts b1 and b2 in text B”; where A and B are paragraphs or texts given along with the prompt and a1, a2, b1 and b2 are specific aspects from texts A and B. Admittedly, that prompt lacks context. Nonetheless, context-rich prompts could still be conceived (see Figure 3). These examples also hint at the idea that prompts work in a “rubbish prompts in; rubbish responses out” fashion; i.e. the quality of the prompt is directly proportional to the quality of the response.

Figure 3.  Example of a prompt with good context. This prompt was obtained via Bard through the prompt “construct a prompt on the subject of cognitive science and artificial intelligence that provides adequate context for any LLM to generate a meaningful response”.

PE is thus a process that involves engaging in a dialogue with the LLM to discover creative and innovative solutions to problems. One effective approach is the “chain-of-thought” (CoT) prompting, which entails eliciting the LLM to provide more in-depth responses by following up on previously introduced ideas. The example shown in Figure 4 was output by Bard after the prompt “provide an example of a chain of thought prompting to be submitted to a large language model”. The green box contains the initial prompt, the orange box represents three subsequent questions, and the blue box represents a potential answer given by the LLM. Another way of CoT prompting can be achieved by starting by setting a topic (e.g. “The Role of Artificial Intelligence (AI) in Education”), then ask questions such as “start by defining Artificial Intelligence (AI) and its relevance in the context of education, including its potential applications in learning, teaching, and educational administration.”, “explore how AI can personalise the learning experience for students, catering to individual needs, learning styles, and pace of progress.”, “discuss the benefits of AI-powered adaptive learning systems in identifying students’ strengths and weaknesses, providing targeted interventions, and improving overall academic performance.”, “examine the role of AI in automating administrative tasks, such as grading, scheduling, and resource management, to enhance efficiency and reduce the burden on educators.” etc.

Figure 4. Example of a CoT prompt.

Variants of CoT prompting can be considered by generating several CoT reasoning paths (see the following articles Tree of Thought Deliberate Problem Solving with Large Language Models and Large Language Models Tree -of -Thoughts.). Regardless of the CoT prompting used, the ultimate goal is to solve a problem in an original and informative ways.

It’s crucial not to overlook AI technologies but rather embrace them, finding the right balance between tasks delegated to AI and those best suited for human involvement. Fine-tuning interactions between humans and AI is key when exchanging information, ensuring a seamless and effective collaboration between the two.

Library strategy and Artificial Intelligence

by Dr Andrew M Cox, Senior Lecturer, the Information School, University of Sheffield.

This post was originally published in the National Centre for AI blog, owned by Jisc. It is re-printed with permission from Jisc and the author.

On April 20th 2023 the Information School, University of Sheffield invited five guest speakers from across the library sectors to debate “Artificial Intelligence: Where does it fit into your library strategy?”

The speakers were:

  1. Nick Poole, CEO of CILIP
  2. Neil Fitzgerald, Head of Digital Research, British Library
  3. Sue Lacey-Bryant, Chief Knowledge Officer; Workforce, Training and Education Directorate of NHS England
  4. Sue Attewell, Head of Edtech, JISC
  5. John Cox, University Librarian, University of Galway

A capacity 250 people had signed up online, and there was a healthy audience in the room in Sheffield.

Slides from the event can be downloaded here . These included updated results from the pre-event survey, which had 68 responses.

This blog is a personal response to the event and summary written by Andrew Cox and Catherine Robinson.

Impact of generative AI

Andrew Cox opened the proceedings by setting the discussion in the context of the fascination with AI in our culture from ancient Greece, movies from as early as the start of the C20th, through to current headlines in the Daily Star!

Later on in the event, in his talk John Cox quoted several authors saying AI promised to produce a profound change to professional work. And it seemed to be agreed amongst all the speakers that we had entered a period of accelerating change, especially with Chat GPT and other generative AI.

These technologies offer many benefits. Sue Lacey-Bryant shared some examples of how colleagues were already experimenting with using Chat GPT in multiple ways: to search, organise content, design web pages, draft tweets and write policies. Sue Attewell mentioned JISC sponsored AI pilots to accelerate grading, draft assessment tasks, and analyse open text NSS comments.

And of course wider uses of AI are potentially very powerful. For example Sue Lacey-Bryant shared the example of how many hours of radiologists time AI was saving the NHS. Andrew Cox mentioned how Chat GPT functions would be realised within MS Office as Copilot. Specifically for libraries, from the pre-event survey it seemed that the most developed services currently were library chatbots and Text and Data Mining support; but the emphasis of future plans was “Promoting AI (and data) literacy for users”.

But it did mean uncertainty. Nick Poole compared the situation to the rise of Web2.0 and suggested that many applications of generative AI were emerging and we didn’t know which might be the winners. User behaviour was changing and so there was a need to study this. As behaviour changed there would be side effects which required us to reflect holistically, Sue Attewell pointed out. For example if generative AI can write bullet point notes, how does this impact learning if writing those notes was itself how one learned? She suggested that the new technology cannot be banned. It may also not be detectable. There was no choice but to “embrace” it.

Ethics

The ethics of AI is a key concern. In the pre-event survey, ethics were the most frequently identified key challenge. Nick Poole talked about several of the novel challenges from generative AI, such as what is its implication for intellectual freedom? What should be preserved from generative AI (especially as it answers differently to each iteration of a question)? Nick identified that professional ethics have to be:

  • “Inclusive – adopting an informed approach to counter bias
  • Informed & evidence-based – geared towards helping information users to navigate the hype cycle
  • Critical & reflective – understanding our own biases and their impact
  • Accountable – focused on trust, referencing and replicability
  • Creative – helping information users to maximise the positive benefits of AI augmented services
  • Adaptive – enabling us to refresh our skills and expertise to navigate change”

Competencies

In terms of professional competencies for an AI world, Nick said that there was now wider recognition that critical thinking and empathy were key skills. He pointed out that the CILIP Professional Knowledge and Skills Base (PKSB) had been updated to reflect the needs of an AI world for example by including data stewardship and algorithmic literacy. Andrew Cox referred to some evidence that the key skills needed are social and influencing skills not just digital ones. Skills that respondents to the pre-event survey thought that libraries needed were:

  •        General understanding of AI
  •        How to get the best results from AI
  •        Open-mindedness and willingness to learn 
  •        Knowledge of user behaviour and need
  •        Copyright
  •        Professional ethics and having a vision of benefits

Strategy

John Cox pointed to evidence that most academic library strategies were not yet encompassing AI. He thought it was because of anxiety, hesitancy, ethics concerns and inward looking and linear thinking. But Neil explained how the British Library is developing a strategy. The process was challenging, akin to ‘Flying a plane while building it”. Sue Attewell emphasised the need for the whole sector to develop a view. The pre-event survey suggested that the most likely strategic responses were: to upskill existing staff, study sector best practice and collaborate with other libraries.

Andrew Cox suggested that some key issues for the profession were:

  • How do we scope the issue: As about data/AI or a wider digital transformation?
    • How does AI fit into our existing strategies – especially given the context of institutional alignment?
    • What constitutes a strategic response to AI? How does this differ between information sectors?
  • How do we meet the workforce challenge?
    • What new skills do we need to develop in the workforce?
    • How might AI impact equality and diversity in the profession?

Workshop discussions

Following the presentations from the speakers, those attending the event in person were given the opportunity to further discuss in groups the professional competencies needed for AI. Those attending online were asked to put any comments they had regarding this in the chat box. Some of the key discussion points were:

  • The need for professionals to rapidly upskill themselves in AI. This includes understanding what AI is and the concepts and applications of AI in individual settings (e.g. healthcare, HE etc.), along with understanding our role in supporting appropriate use. However, it was believed this should go beyond a general understanding to a knowledge of how AI algorithms work, how to use AI and actively adopting AI in our own professional roles in order to grow confidence in this area.
  • Horizon scanning and continuous learning – AI is a fast-paced area where technology is rapidly evolving. Professionals not only need to stay up-to-date with the latest developments, but also be aware of potential future developments to remain effective and ensure we are proactive, rather than reactive.
  • Upskilling should not just focus on professional staff, but all levels of library staff will require some level of upskilling in the area of AI (e.g. library assistants).
  • Importance of information literacy and critical thinking skills in order to assess the quality and relevance of AI outputs. AI should therefore be built into professional training around these skills.
  • Collaboration skills – As one group stated, this should be more ‘about people, not data’. AI requires collaboration with:
    • Information professionals across the sector to establish a consistent approach; 
    • Users (health professionals, students, researchers, public etc.) to establish how they are using AI and what for;
    • With other professionals (e.g. data scientists).
  • Recruitment problems were also discussed, with it noted that for some there had been a drop in people applying for library roles. This was impacting on the ability to bring in new skillsets to the library (e.g. data scientist), but on the ability to allow existing staff the time to upskill in the area of AI. It was discussed that there was the need to promote lifestyle and wellbeing advantages to working in libraries to applicants.

Other issues that came up in the workshop discussions centered around how AI will impact on the overall library service, with the following points made:

  • There is the need to expand library services around AI, as well as embed it in current services;
  • Need to focus on where the library can add value in the area of AI (i.e. USP);
  • Libraries need to make a clear statement to their institution regarding their position on AI;
  • AI increases the importance of and further incentivises open access, open licencing and digitisation of resources;
  • Questions over whether there is a need to rebrand the library.

The attendees also identified that the following would useful to help prepare the sector for AI:

  • Sharing of job descriptions to learn about what AI means in practice and help with workforce planning. Although, it was noted how the RL (Research Libraries) Position Description Bank contains almost 4000 position descriptions from research libraries primarily from North America, although there are many examples from RLUK members; 
  • A reading list and resource bank to help professionals upskill in AI;
  • Work shadowing;
  • Sharing of workshops delivered by professionals to users around the use of AI;
  • AI mailing lists (e.g. JISCmail);
  • Establishment of a Community of Practice to promote collaboration. Although it was noted that AI would probably change different areas of library practice (such as collecting or information literacy) so was likely to be discussed within the professional communities that already existed in these areas.

Workshop outcome

Following the workshop Andrew Cox and Catherine Robinson worked on a draft Working paper which we invite you to comment on @ Draft for comment: Developing a library strategic response to Artificial Intelligence: Working paper.

Getting comfortable with data: Ideas for explaining some basic data types

By Leah Gustafson

Leah is working for the Languages Data Commons of Australia project. It is a University of Queensland and ARDC co-funded project building digital infrastructure for preserving language data. They provide support to researchers and the broader community around making language data FAIR with CARE

(This post is in part a summary of an article that first appeared in The Living Book of Digital Skills).

It seems that every which way we turn, the mysterious concept of data is ever present and lurking in the background of our everyday lives. In the professional setting of the library, data is not a foreign concept – we are surrounded by books and journals and often help students navigate the world of information. But being in such close and constant proximity to data can lead to elements of expert bias creeping in despite the best efforts to keep them at bay! This can make it difficult to explain data concepts in simple terms.

João Batista Neto, CC BY 3.0 https://creativecommons.org/licenses/by/3.0, via Wikimedia Commons

So, what are some ideas for demystifying the concept of data for a wide audience? Particularly those who might be exposed to many different types (and potentially without even realising)…

First, the word itself. Remember that in its purist form data isn’t just digital! Wikipedia defines data as a “unit of information” about a person or object that could be a fact, statistic, or other item of information.

And then to progress to modern contexts (as so much of the data we deal with is digital), the Oxford English Dictionary entry states that it can be “quantities, characters, or symbols” in the form of electrical signals that can be used, stored, or transmitted with computer equipment. 

Once there is an understanding of what data is, trying to explain it further can suddenly become wildly more complicated! It may be helpful to compartmentalise that data can be structured, meaning that it is ready for analysing. Otherwise it may be unstructured because perhaps it has just been collected or perhaps multiple data sources are being combined to created a larger dataset. A dataset is just a collection of data points that are together – maybe they came from the same source or maybe they are about the same topic.

Terms that many people will be familiar with are qualitative and quantitative. Qualitative data is an opinion or generalisation about something – a user gives a rating of 5 out of 5 for their experience watching a film. This type of data can be descriptive, be true or false, or give a rank. On the other hand, quantitative data is an objective measurement of something and it generally numerical – for instance, the piece of string is 23 centimetres long. They can also be a number of items or the number of times something happened: there are 2 dishwashers and 1 cupboard and the cupboard was opened 46 times today.

Adapted from an image by Koen Leemans, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

For data to be used in an analysis, it must be structured in such a way that a computer program can interpret it. For example, data that is output from remote sensing equipment is generally already structured, whereas data that is gathered in a survey where someone’s experience was described would be unstructured.

This has been a very brief introduction to the intriguing world of data! More information can be found in the Chapter 2: Information literacy, media literacy and data literacy – Types of Data section of the The Living Book of Digital Skills. There are also some helpful resources that provide more in-depth details about the fundamental data concepts discussed.

Sharing is caring: comment below on techniques and approaches that you find helpful when needing to explain data concepts!

Crypto scams will increase over the holidays – here’s what you need to know to not fall victim

by Ashish Nanda, CyberCRC Research Fellow, Centre for Cyber Security Research and Innovation (CSRI), Deakin University; Jeb Webb, Senior Research Fellow, Centre for Cyber Security Research and Innovation (CSRI), Deakin University; Jongkil Jay Jeong, CyberCRC Senior Research Fellow, Centre for Cyber Security Research and Innovation (CSRI), Deakin University; Mohammad Reza Nosouhi, CyberCRC Research Fellow, Centre for Cyber Security Research and Innovation (CSRI), Deakin University, Deakin University, and Syed Wajid Ali Shah, CSCRC Research Fellow, Centre for Cyber Security Research and Innovation, Deakin University

We loved this article from The Conversation, originally published on 14 November 2022

Each year, as the festive season arrives, we must also keep an eye out for potential scammers trying to ruin the fun. This is because scammers become more active during the holidays, targeting us while we have our guard down.

So far in 2022, Australians have lost around half a billion dollars to scams, which is already significantly more than had been lost by this time last year. The majority of these losses – around $300 million – have involved investment or cryptocurrency scams.

Researchers from Deakin University’s Centre for Cyber Security Research and Innovation had a opportunity to interview recent victims of these scams. Here is what we found.

Anyone can fall for a scam

I was shocked and could not accept that this happened to me although I was very careful […] I was numb for a couple of minutes as it was a large amount of money. – (26-year-old female office manager from South Australia)

These scams have become highly sophisticated and criminals have become less discriminating about whom they target. This is reflected in recent victim demographics, showing a wide variety of backgrounds, a more even distribution across several age groups, and an almost even split on gender.

So, how can you spot these scams and where can you get help if you have fallen victim?

If it sounds too good to be true, it might just be a scam

I was dumbfounded, to say that ground shattered under my feet would be an understatement, it will take me a very long time to recover from it, financially and mentally. – (36-year-old female, legal practitioner from Victoria)

Most crypto scams involve getting the victim to buy and send cryptocurrency to the perpetrator’s account for what appears to be a legitimate investment opportunity.

Cryptocurrency is the currency of choice for this type of crime, because it’s unregulated, untraceable and transactions cannot be reversed.

Victims of such scams are targeted using a number of different methods, which include:

Investment scams: scammers pretend to be investment managers claiming high returns on crypto investments. They get the victim to transfer over funds and escape with them.

“Pump and dump”: scammers usually hype up a new cryptocurrency or an NFT project and artificially increase its value. Once enough victims invest, the scammers sell their stake, leaving the victims with worthless cryptocurrency or NFT.

Romance scams: involves scammers using dating platforms, social media or direct messaging to engage with you, gain your trust and pitch an amazing investment opportunity promising high returns, or ask for cryptocurrency to cover medical or travel expenses.

Phishing scams: an old but still effective scam involving malicious emails or messages with links to fake websites promising huge returns on investment or just outright stealing credentials to access users’ digital currency wallets.

Ponzi schemes: a type of investment scam where the scammers use cryptocurrency gathered from multiple victims to repay high interest to some of them; when victims invest more funds, the scammers escape with all the investments.

Mining scams: scammers try and convince victims to buy cryptocurrency to use in mining more of it, while in reality there is no mining happening – the scammers just make transfers that look like returns on the investment. Over time, the victim invests more, and the scammers keep taking it all.

Although methods evolve and change, the telltale signs of a potential scam remain relatively similar:

  • very high returns with promises of little or no risk
  • proprietary or secretive strategies to gain an advantage
  • lack of liquidity, requiring a minimum accumulation amount before funds are released.

Where to seek help if you’ve been scammed

I felt helpless, I didn’t know what to do, who to reach out to, I was too embarrassed and just kept blaming myself. – (72-year-old male, accountant from Victoria)

If you think you have fallen victim to one of these scams, here is what you need to do next:

  • inform the Australian Competition and Consumer Commission (ACCC) here or reach out to relevant authorities as per advice on the ScamWatch website
  • reach out to your friends and family members and inform them of the scam; they can also be a source of help and support during such times
  • as these events can have a psychological impact, it’s recommended you talk to your GP, a health professional, or someone you trust
  • you can also reach out to counselling services such as LifeLine, beyond blue, Sucide Call Back Service, Mens Line, and more for help and support.

If you ever find yourself in a difficult situation, please remember help and support is available.

Finally, to prevent yourself becoming the next statistic over the holiday period, keep in mind the following advice:

  • don’t share your personal details with people online or over a call
  • don’t invest in something you don’t understand
  • if in doubt, talk to an expert or search online for resources yourself (don’t believe any links the scammers send you).

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Open knowledge activism for lifelong learning, independent research and knowledge translation

By Clare O’Hanlon, La Trobe University Library

e: c.ohanlon@latrobe.edu.au

Open knowledge activism in libraries is about more than negotiating transformative agreements and making research available in repositories and open access journals. It also involves helping researchers and students give research back to communities in an accessible and meaningful format for their needs and contexts. Academic library worker support for student and academic digital literacies development, particularly information, media, and data literacies; collaboration; community and participation; and digital creation, problem solving and innovation, plays a crucial role in this. Local public library and community archive and museum workers provide extensive digital literacies, local history, STEM, and creative programming in their communities. Together we can do more to support lifelong learning, independent research, and knowledge translation.

Open knowledge activism by night

Volunteering with the Australian Queer Archives (AQuA) by night to preserve and make research and more knowledge available for and with LGBTIQA+ communities within and beyond the academy in multiple formats (from queer history walks and exhibitions to an Honours thesis prize and beyond) has helped me see that research can be a collective, generative, and transformative process. Our collection and work may not be open in traditional academic “Open Access” ways, and it is not safe for our collection to be completely open to all, but we are open in the inclusive sense of the word. In her Open as in dangerous talk, Chris Bourg illustrates the importance of individual privacy and protection from abuse and harassment, and warns that Open Access publishing can perpetuate existing systems of oppression and inequality and that opening up collections can potentially lead to a loss of context that is then extracted and shared in diverse ways. Bourg’s warnings and my work at AQuA by night motivate me to advocate for the collective, generative, and transformative kind of research and openness in the sometimes extractive and competitive academic environment I work in by day.

The Australian Queer Archives reading room
Australian Queer Archives reading room ready for visitors (author supplied).

Other ways that library workers can support open knowledge activism by night might include participating in learning spaces outside of universities, including but not limited to:

Open knowledge activism by day

Below are some ways I have helped and seen others help support lifelong learning, independent research, and knowledge translation through open knowledge activism by day:

Additionally, we could help connect academics and students with local public library, archive and museum-based STEM, local history, literary and creative programming rather than compete with such programs. Some examples of this public library and related programming include:

We must keep in mind the amount of labour involved in opening up research, translating it into practice, and making it accessible to communities and recognise that this is not always adequately acknowledged and supported. With increasing focus on research impact and engagement, this is changing, and I hope this post will encourage academic and public library workers to collaborate with each other and academics and students to open research with and for communities.

Large protest on Flinders Street in Melbourne with a trans flag and placard with the words 'Change the System' written in rainbow-coloured letters and two Aboriginal flags on it.
Protest in Melbourne (author supplied).

The OER Capability Toolkit – Reflection and Learning

by Frank Ponte, Manager, Library Services (Teaching), RMIT University Library

E: frank.ponte@rmit.edu.au or
LinkedIn: https://www.linkedin.com/in/francoponte/ and Twitter: @ponte_frank

The OER Capability Toolkit

Cover of the OER Capability Toolkit from RMIT

Read and download the OER Capability Toolkit from:
https://rmit.pressbooks.pub/oercapabilitytoolkit/

Eighteen months ago, I formed a team to investigate how we would address OER awareness, adoption, support and capability for teaching staff. We addressed these needs through the development of an OER Capability Toolkit designed for the RMIT University audience but shared openly for others to adapt.

The authoring and development of this work was conducted remotely in the shared Teams environment. The OER Capability Toolkit was published in July 2022. The published work also spawned a set of four open education self-directed modules via the university HR platform for onboarding of new staff and professional development, an authoring toolkit and a style guide. Collectively, these works are the fundamental building blocks to open education knowledge building and all designed to provide the support structure required for educators to successfully author an open work.

Building the OER Capability Toolkit allowed me to reflect on the process that was undertaken and share the learning from our project.  

Sustainability

Sustainability is key driver in the development of an open publication. Educators are tasked with bringing together large groups of authors, and consequently need to ensure clarity and purpose. Therefore, a strong foundation of support is required. The library has provided this through the aforementioned publications, self-directed modules, and the Pressbooks authoring platform. In addition, the library created an open publishing team to reinforce our commitment to open education, streamline the support the library provides, and assign each open textbook project an open publishing team member to provide advice and guidance for a successful outcome.

A publishing workflow

When we embarked on our project to develop the OER Capability Toolkit our understanding of an open publishing workflow was emergent. In retrospect, it would have been a simpler task if we had a clearer understanding of the fundamental principles, processes and tasks associated with publishing rather than vacillating between authoring and addressing complex problems. The subsequent emergence of the CAUL publishing workflow  now anchors our support with educators and ensures that the seven stages of publishing and associated tasks are addressed at the appropriate time.

Creative Commons licensing

The OER Capability Toolkit is a remix. That is, the publication is a combination of existing creative commons resources and original content. Lessons learned include:

  • Ensuring there is an understanding of the license type you are publishing under from the outset. This will determine what resources you have at your disposal and can use in the adaptation process.
  • Knowing a non-derivative license cannot be used in any adaptation.
  • Maintaining track of what was being used in the adaptation. Doing so, assisted in creating the reference list and acknowledging the original resource.
  • Reflecting on your level of comfort with releasing an open work. That is, are you happy for your newly created work to be adapted, remixed, or monetized.

Formative and summative assessments – H5P activities

H5P is a plugin available in Pressbooks which allows the author to create formative and summative assessment tasks for learners. There is evidence to suggest that this kind of interactivity assists learners to stay focused and engaged with the content.  I wanted to include these activities in the OER Capability Toolkit as learning and engagement was a critical element to building and delivering this work. The toolkit contains a number of H5P activities used as formative assessment and presents a summative assessment called the “open pedagogy plan” in Part 5 as the culmination of this learning.

Open publications that contain formative and summative activities have the capacity to be embedded within the context of a broader course curriculum and provide the flexibilities required for educators to engage with open pedagogical practices.

Referencing

Ensure that attribution and citation are clearly defined and articulated from the beginning.  Even though the terms share characteristics, citations and attributions play different roles and appear in different places. A citation allows authors to provide the source of any quotations, ideas, and information that they include in their own work based on the copyrighted works of other authors. It is used in works for which broad permissions have not been granted.

Attribution on the other hand is used when a resource or text is released with an open licence. This legal requirement states that users must attribute — give credit — to the creator of the work and encompass these critical elements at a minimum:

  • Title of the work
  • Author (creator) of the work
  • Source (link) or where the work can be found
  • License of the work

Peer review, front and back matter

Peer review was an important element to get right. We engaged in three rounds of peer review. Starting by reviewing each other’s chapters within the authoring group. This exercise provided an initial opportunity to assess, grammar, language, the use, or overuse of acronyms, and finesse language and comprehension. The second peer review involved an external cohort of colleagues from other Australian universities who provided a similar overview but from an external perspective. A third peer review was undertaken using a tool called Hypothe.sis. This tool is a plug-in in Pressbooks and allows for social annotation with students. It is also a useful tool to implement as part of a peer review process. All commentary is contextualized within the chapters and responses are received by email and easily edited.


Front and back matter was important to include as part of the publication process. Including the front and back matter provided completeness to the work and offered context to the reader. The front matter introduced the new work and helped the reader understand the evolution of its creation and the back matter included a glossary and appendix.

In conclusion

The open education philosophy seamlessly interconnects with RMIT Library’s ethos of sharing knowledge and supporting learning. RMIT Library is well positioned to work with academic staff to create, produce, and disseminate open works via open platforms for maximum impact, and the library as publisher, can lead and shape the transformation of curriculum pedagogy where every learner is supported and valued.

Power BI: Data Wrangling and Fish

Danielle Degiorgio, Digital and Information Literacy Project Adviser, Edith Cowan University Library
Sue Khoo, Librarian (Digital and Information Literacy), Edith Cowan University Library

What is Power BI?

Power BI is a Microsoft data visualisation tool that displays data in an easy-to-read format and allows users to interact and show relationships between different data sets.

Why Power BI?

Our goal was simple, we wanted to connect the mapping of digital and information literacy skills across the course curriculum to the teaching and learning activities we were doing each semester. We just had one problem, we were recording our data and statistics in multiple spreadsheets.

As luck would have it, the 2021 VALA Tech Camp was hosted at Edith Cowan University Library that year and we were introduced to Power BI through a series of workshops. Soon after we decided to use Power BI to help us keep track of student statistics in a more visually appealing way and it let us connect multiple sources of data. This meant we could compare, filter, and visualise relationships between multiple spreadsheets which allowed us, and more importantly our manager, to see our progress across courses.

Power BI: Visualisation of mapped digital and information literacy skills in courses.

Things we got Power BI to do:

  • Connect information from multiple spreadsheets to show how much digital and information literacy skills coverage we have in each course. 
  • Filter and display subsets of data. 
  • Be hosted in Microsoft Teams for ease of access where the report can be shared and displayed as a tab in Teams. 
  • Automatically update data from SharePoint, so having all our sheets hosted on SharePoint / Microsoft Teams mean we can easily add data into the model. Our Power BI reads directly from SharePoint files and updates at 9am every day.

Skills: What magic do you need?

  • Spreadsheet and table management – Power BI relies on external data. You must have the data cleaned and stored in a data source such as Excel (or databases such as Salesforce or Access).
  • Logic and relationship management – Connections can be 1-1 and 1-many but only one model may exist at a time. If there are conflicts Power BI will complain.
  • Ability to play with formulas and data types – If you need a relationship that isn’t expressed in the Power BI map you will need to learn to write the formula for it.
  • How to put together a graph – Knowing what graph suits your needs be it a scatter plot, ribbon chart, pie chart, or fish.
  • Professional Google skills – If something goes wrong, be ready to Google it!

Pitfalls: What to watch out for

  • A lot of trial and error and Googling – No training will prepare you for what you want to do. There may be things you want to do but Power BI only gives you the basic tools. You will have to build what you want from there.
  • Broken or dirty data – Power BI relies on relationships between different tables and inputs to build the model. If a piece of information is missing and if that is the connection in the model, it will skip that line. This has resulted in expectations not meeting what was displayed.
  • Know your data story – Power BI does not do data interpretation. You need to know what you want to tell. This is one of the main issues on the final display of information.
  • Permissions – Our shared spreadsheets and the dashboard were stored in places where we didn’t have full access to use. Arrange the files so each input has the right permissions to do SharePoint integration.

How do you get started?

But what about the fish?

The most important thing to remember is to be creative and have fun with your data!  

Power BI: Number of students seen per School using the Enlighten Aquarium visual. Enlighten Aquarium won a people’s choice award for the ‘Power BI Best Visual’ contest in 2016.