Friday, June 27, 2014

Understanding Teaching with Mike Rose

Note: This is a post originally written for an internal Macmillan Education blog on December 7, 2013

More and more the work we do in higher education publishing necessarily and inevitably includes professional development for the teachers who will use our books and media. The business rationale for this is simple: if teachers understand the value and support and insight our stuff brings to them and their students, they'll use it more, require it, depend upon it, and students will then be required to log into our software and services, required to do some of their learning with us, required to buy, open, study from our books. Essential books and software equals profit. Simple. But even more important our mission as educational publishers is to foster better teaching and learning because the world requires good teaching and a humane and educated people.  By all of us understanding teaching and learning, and then applying that understanding on campus from the first sales call to ongoing training and professional development for adopting professors, we enact the better natures of books and media. We become what we promise professors can be.

Mike Rose, who has three books, two as co-editor, in Bedford/St. Martin's professional resource series -- An Open Language is the one solo authored and is a collection of his essays --  has the first of three posts in the Washington Post Online that looks at the complexities of teaching here: He's posting as a guest blogger in Valerie Strauss's The Answer Sheet, a blog that covers education.  Here's how Strauss sets up Mike's work:
Here is a thoughtful piece on the essence of teaching and the kind of teacher education programs we really need from Mike Rose, who is on the faculty of the UCLA Graduate School of Education and Information Studies and author of “Back to School: Why Everyone Deserves a Second Chance at Education.”  This is longer than your average blog post but well worth the time, and is the first of three pieces on teacher education by Rose.
Those who know Mike won't be surprised by the quality and kindness of thought. Here is but one excerpt  that resonates for me, in main because it simply reminds me of my wife, Barbara, a brilliant fourth grade teacher whose job teaching fourth grade is so much more complicated by all kinds of factors and needs than my occasional job teaching college writing courses.
Teaching done well is complex intellectual work, and this is so in the primary grades as well as Advanced Placement physics. Teaching begins with knowledge: of subject matter, of instructional materials and technologies, of cognitive and social development. But it’s not just that teachers know things. Teaching is using knowledge to foster the growth of others. This takes us to the heart of what teaching is, and why defining it primarily as a craft, or a knowledge profession, or any other stock category is inadequate. I’m not sure there is any other work quite like it.

The teacher sets out to explain what a protein or metaphor is, or how to balance the terms in an algebraic equation, or the sociological dynamics of prejudice, but to do so needs to be thinking about how to explain these things: what illustrations, what analogies, what alternative explanations when the first one fails? This instruction is done not only to convey particular knowledge about metaphors or algebraic equations, but also to get students to understand and think about these topics. This involves hefty cognitive activity, as any parent knows from his or her experiences of explaining things to kids, but the teacher is doing it with a room full of young people—which brings a significant performative dimension to the task.

Thus teaching is a deeply social and emotional activity. You have to know your students and be able to read them quickly, and from that reading make decisions to slow down or speed up, stay with a point or return to it later, connect one student’s comment to another's. Simultaneously, you are assessing on the fly Susie’s silence, Pedro’s slump, Janelle’s uncharacteristic aggressiveness. Students are, to varying degrees, also learning from each other, learning all kinds of things, from how to carry oneself to how to multiply mixed numbers. How teachers draw on this dynamic interaction varies depending on their personal style, the way they organize their rooms, and so on—but it is an ever-present part of the work they do.
The full piece by Mike, which goes on to look at the role of college teacher education programs and the balance of classroom preparation for teaching to on the job experience, reminds those of us who do professional development that when we work with educators we are teachers and those educators are students. And what matters so much in our work then is following good pedagogy, applying the things our books suggest. So a workshop might open with a prompt or question and then asking people to write a few minutes on it before discussion, a simple writing to learn activity that helps focus the mind on the topic, gives people a private place to think though they're in a public space, assures that there will be things to be said when one asks for responses, and helps support learner to learner discussion. But what we also need to remember is how people learn, the role of anxiety and aspiration. When we do workshops, our rooms are full of teachers who will have, to borrow from Mike, silences, slumps, aggressiveness, and more.

To understand professional development then, to understand training, we need to understand that we are educators and that what we are doing is as important and complex and demanding and necessary as the work the teachers we are supporting will go forth and do in their classrooms.

Thursday, June 26, 2014

Draft --> Writing Software with Analytics for Writers

Another item originally written for folks at Macmillan Education.

Traci Gardner, who contributes to Bits, a Bedford/St. Martin's blog for teachers at, tweeted a link to a ProfHacker entry by Konrad Lawson about Draft, a tool for collaborative writing, at ""  Lawson observes,
Draft is designed for drafting and collaborative writing of text. It is not a blogging platform or a live editing environment like Google Docs, and the documents created within Draft are not designed to have their final home there. As it functions now, you write, collaborate, edit, and then export or directly publish the documents to your cloud hosting service such as Dropbox, Evernote, or Google Drive, to a social platform such as Blogger, WordPress, Tumblr, and Twitter or, if you use the Chrome or Firefox additions, directly back into a text box in another browser tab. Get a quick overview of its features here: .

I want to draw attention here to the analytics ( features. Especially this idea, described by Draft's creator, Nathan Kontny:
One mistake I keep seeing people make, when they publish their writing, is that they don't pay enough attention to attributes that might affect how much traction that writing will get.  They'll publish 2000 word posts, when their audience would prefer 500. Or they publish on Friday night, when no one might be paying attention and Monday morning might be a better idea. I wanted to make this type of analysis a lot easier to understand, and help people, including myself, learn what makes our writing get more attention than other writing.

This is the kind of thing -- finding ways to give writers data they can use to make decisions about their writing -- that really allows the writer to adapt; it gives them agency. That's a distinct from the kind of adaptive learning that pushes a learner to one desired end, a preset end, and where the software varies content and activities based on performance for the learner to do until they reach the end, where the software is adapting, not the learner.

Note the kinds of things Kontny's trying to help with -- audience awareness, the ability for writers to see differences in drafts. A lot of online writing tools (, for example) can give statistics on word count, or in the case of Grammarly, the number of sentence level errors made. And there's some use in that some times, but giving writers information on what in their writing finds an audience, when it's best to post, what the reading level of their writing is (Draft uses the Flesch reading level.), and other information lets a writer see things over time. This intranet does some of that, by the way; it reports (click the "Content" tab and then filter to see your own content) what of your contributions have been viewed and how many times, commented on, liked, replied to. That's standard stuff in social networks and should be standard stuff in the learning spaces we make for students. What would up the value on that would be aggregation for totals of all those items, and a view that shows trends over time.

The other feature of Draft is the ability for a writer to compare drafts side by side, whether one's own writing or writing being co-edited by fellow writers offers a powerful tool as well. Teachers struggle to get writers to address global revision, what is sometimes called higher order revision. Students tend to focus on tweaking sentences here or there. So a tool like Grammarly or Word's spell checker/grammar checker draw attention to surface level issues. But a tool like Draft shows changes. That's the kind of thing we need in the writing tools we make for students and teachers. There needs to be a drafting tool that supports version control, to encourage writers to save as or upload drafts. And a tool that not only lets writers compare drafts, but that highlights changes. And not only highlighting changes, but it should also tell writers what percentage of the draft is changed: how many new words were added, how many new sentences were created, new paragraphs, how many deletions were made. The idea would be a tool that can summarize larger revision from minor editing. And then something that aggregates that so over assignments and drafts, a writer sees trends.

Giving writers and writing teachers these views -- evidenced based insights into writing and revision processes -- gives them tools for articulating change and describing growth. A teacher can ask for a new draft and can tell the writer that she wants to see at least a 25% increase in new ideas. Peer reviewers can return to see which of their feedback ideas lead to bigger revisions by the writer, which would help reviewers see the value of their work and why it matters.

A Look at the Automated Analysis of Constructed Responses, Or Why Jared is Thin

Note: Like the post on Art Graesser's work, this entry comes from notes taken from a meeting with innovative professors doing great research on using automated response to writing, in this case, writing to learn. One of the privileges of being in publishing is meeting professors with great ideas.

So: Jared, the Subway guy, lost a lot of weight. Where did the weight go?

That's the kind of question that requires biology students to NOT simply recall key concepts, but asks them to explain a process, and unlike a multiple choice question, it does not allow the student to look at choices for hints. It asks them to construct a response that answers the question. So, where did Jared's weight go?  I won't tell you the answer, but it was the illustrative question used by to explore "How Can Automated Assessment of Constructed Responses (AACR) Provide Automatic Evaluation of Written Formative Assessment in LaunchPad?"

The presenters were (with descriptions pulled from
Mark Urban-Lurain, Associate Professor and Associate Director of the Center for Engineering Education in the College of Engineering, Michigan State University directs the technology development and implementation of AACR.

John Merrill, Director of the Biological Sciences Program, Michigan State U, lead the development of the core biology curriculum, and provides disciplinary expertise in the biology portions of the work plan, coordinating with faculty who teach introductory biology courses to implement the materials in their courses.

John began by stating the essential problem: multiple choice questions are not adequate measures of what students know; writing -- in this project, short answer (from one - 40 words or so) reveals better student understanding (or misunderstanding). However, in large lecture courses, sometimes with up to four or five hundred students, even just skimming, let alone reading, sorting, scoring and getting a composite understanding of what students know doesn't work. So John and Mark embarked on a project to have students write, and then use machine analysis to not only score the writing, but sort it by categories and from those categories reveal where students have misconceptions or misunderstandings, giving instructors two things: a better guage of student learning and the means to see broad trends and thus to adjust their teaching to what students need clarification and help with. Currently AACR is an instructor tool: instructors download questions that AACR can score, sort, and report on, instructors deliver the questions to students, extract their answers as a spreadsheet, upload that spreadsheet to AACR, and the AACR team runs the answers and delivers to the instructor a report.  Cumbersome yes, but the team has won a $6 million NSF grant to both improve how AACR works and -- this is key to the mission -- create a means for faculty development, with advice provided to and from faculty communities of users on what kinds of changes to make to their teaching in response to the data AACR provides.

Attending the presentation also were James Morriss, first listed author of Biology: How Life Works ( and Melissa Michael, the lead assessment author on the book. Both James and Melissa concurred on the power of writing to foster learning and to reveal better than multiple choice questions what students know and do not.  James told a story about how he sometimes use both a multiple choice question and a short answer question on the same test, and students will have wildly different takes, getting more right on MC questions but then seeing from written responses that students don't really know the subject matter or at the very least cannot articulate it on their own. An MC question might have language the triggers recall of lecture notes or textbook language, but when asked to write, and thus required to cull that out on their own, things fall apart.

So as to writing. Note the acronym AACR and the phrase "constructed responses."  While we might use the term open ended responses from our use of surveys or assessments we're used to designing, Merrill and Urban-Lurain used "constructed responses" to emphasize two things: first, students have to 'construct' a response, but the questions used are meant to be open ended, as you see in a survey with things like "other" or "tell us more." Instead the questions seek specific responses to key concepts in the course; and second, when fully considered as a concept in learning, a constructed response might not be text only (though AACR is for text only written response) but can include the use of images, artifacts, data, tables and so on.

Earlier I posted on the meeting we had w/ Art Grasser and his use of Latent Semantic Analysis. The AACR project doesn't use that approach. Instead, they automate responses by doing a word analysis -- the presence of key words in students answer, nouns -- of students answers and matching the prevalence of those words to prior student answers. The machine is trained to score student writing and to categorize answers according to core ideas in -- this demo -- biology, so that teachers can see which students are using language that correlates with understanding and which are not, and where students are not, where they are misunderstanding things. The design of the software uses an program called SPSS (, statistical analytic and predictive software now owned by IBM. They chose it in part because it's designed for non specialists to use but is also robust enough for their purposes. They applied NodeXL, a program that creates associational graphs of Excel data to produce concept clusters and association graphs (So a word used a lot has a bigger ball, and a word it appears with a lot in student answers has a thicker line to that word's ball. Go to to see NodeXL graph images to get a sense). The images you see at NodeXL are more complex than an example from a single question in AACR, and so the data is easier to read from the graph, but John Merill noted that even so, and even for science professors, teaching instructors how to read the graph, understand what it means about student learning, and then coming up with a response in their teaching to address what they see is necessary.

Here's my understanding of how things worked:

SPSS was fed WordNet, an open source dictionary funded by the National Science Foundation ( Wordnet links/associate words not just by meaning, like a thesaurus, but concepts and varied meetings, so a richer lexical matrix. Merrill and Urban-Lurain also added to the dictionary terms and meanings particular to biology, so that additional and occasional specialized language or specialized meanings of common language (such as 'mean' in statistics) that students might use in their answers was accommodated.

As student answers were added, SPSS produced a report that pulled and analyzed key works -- nouns -- in student answers and did a first pass association at suggesting categories/concepts for those words. Merrill and his team of biology professors then fine-tuned that, correcting and fine-tuning the categories and which words should be associated them. Categories were then associated with concepts key to the course.  So for example, on where Jared's weight went, 37.3% of answers were associated correctly that the reasons were cellular (metabolic rate, calorie burn, and weight leaves as CO2) and another 37% said they had to do with physiological actions (sweat, urine, feces and other means of departure). Guess which is right, or rather, where the student answers should be clustering? With the results, at a glance and computed quickly and more accurately than a professor could from reading and categorizing and counting answers by category, a teacher sees whether students have the key concepts at stake down, and with greater accuracy even than a easier to auto score multiple choice question. Very powerful stuff information.

Questions were drafted and student responses were uploaded. In the sample question we studied, 374 student responses were gathered and two things happened: one human readers applied a rubric and scored those, and then, the answers, categories and concepts were tweaked so that SPSS could give a predictive score -- note the word predictive -- that says, essentially, based on the vocabulary we see, we the software predicts a human reader, reading the full answer, would give the answer score X. Overtime that prediction matched human scorers in the 83% or higher range for high and low scores (of three levels the humans used) but matched 43% on mid-range scores (where humans also show widest variance.

The labor intensity is in question authoring, adding vocabulary (though both of these would subside if more instructors used the program and added stuff, one of the goals of the NSF grant, a kind of crowd sourcing to get more questions in. The labor comes in establishing predictive outcomes from SPSS that matches the scores or normed human graders (humans trained to apply a rubric consistently, so readers get the same, or nearly the same [depends on the rubric] score on a given sample of student constructed response. It took 374 items scored by humans, for example, to account for the range of responses and lexical variation of student work. That's a lot of norming for one question. Multiple that by just five per chapter to go with a book, and one can see a tremendously labor intensive process.

But that said, the pedagogical benefits and outcomes possible, and the ability to perhaps adapt the machine to not only score and give a report to an instructor, but to also give information directly back to students, adaptive information (so imagine a LearningCurve made of "constructed responses" instead of just multiple choice questions, and you see can see where this might go.

Right now the technology is young, despite ten years of research, and the NSF grant is only 6 months or so (out of five years) in. So there's time to see where this goes and maybe experiments the biology team can try. My own concern in the humanities, where our textbooks, which are the point of sale to pedagogy, are lower, significantly lower, and so the labor to build the questions that the current methodology uses would be something that we couldn't afford. But boy, they got a lot right and it'll be cool to see where this goes and whether biology and other science books in MHE can do experiments.

Art Graesser and MITSC

A colleague at Macmillan Education arranged a meeting in New York for editors to meet with Art Graesser, a Psychology professor at the University of Memphis who researches and designs in the Memphis Intelligent Tutoring Systems Center (MITSC), which is part of the Advanced Distributed Learning Center for Intelligent Tutoring Systems Research and Development (ADL-CITSRD), a government partnership, which is located in the FedEx Institute of Technology (FIT, and yes, there will be an acronym test at the end of this post; it is 50% of your grade.)  The purpose of the meeting was to learn about different approaches to automated writing assessment.

On the way to discussing automated assessment of writing, Art described some other projects from MITSC:

AutoTutor (, where, as a student works at a self-paced tutorial, two "agents," software coded to track what the student is doing (or not doing), are triggered by student actions in the tutorial. So a student might make a mistake in identifying a key idea, and the first agent, might trigger a text or audio message asking the student a question. The second agent might comment on the first agent's question or on the student response, creating a kind of learning dialog among the two agents and the student about the item under study. The Turing Test becomes Turing Tutors. Now, if this sounds bizarre, wait: the research from Art and his team shows that students who study in the tutoring software do slightly better on shallow knowledge (recall, definition, summary) of content than students who read only, but do significantly better at deeper knowledge (reasoning, synthesis, and communication) than students who read only. The acts of dialog, of drawing student attention to thinking in new ways, of answer or at least considering questions the agents posed, leads to deeper learning.

That's not surprising on the face of it, but what's powerful is the creation of software that helps a lone learner come to the kind of deeper engagement necessary for deeper learning.

And that's the nub of Art's work -- deeper learning through deeper engagement via dialog and writing.

Art and his colleagues and Memphis are also doing research with the Center for the Study of Adult Literacy (CSAL) at Georgia State University, which, like MITSC, has won grants from the Institute for Education Sciences (IES), a federal initiative that studies learning sciences. I'm linking to both CSAL and IES because they're sites worth visiting, doing a lot of useful research that we can draw on for validation and direction of editorial initiatives.

On writing, one of the first things you'll want to check out is the work MITSC has been doing with cohesion metrics of written text. To quote from the project, it involves the creation of a tool that generates "Automated Cohesion and Coherence Scores to Predict Text Readability and Facilitate Comprehension," affectionately dubbed, and this will be on the test so pay attention, Coh-Metrix. To put this in simpler terms, Coh-Metrix measure readability, only in ways more sophisticated than Lexile, Gunning-Fog, and perhaps best known (because it's built into MS Word) Flesch-Kincaid. You can get a good explanation of what Coh-Metrix measures here -- -- but what you might want most to do is to go to their Text Easability Assessor (TEA) -- -- create an account and have a TEA party* with some prose of your own.

So. On to auto assessing writing. In the discussion, Art described three broad ways to auto assess text:
  1. Compare the text to an ideal and score it for how close it comes --- We can do this crudely already with one word answers for example. A student writes a word into an answer, and it matches the word (correctly spelled in our limited engine so it fully compares) we've designated as correct, the student gets full points for the question. The software used in automated assessment allows answers more sophisticated than a single, correctly written word and more nuanced scoring than right or wrong.
  2. Using a cluster of answers and mapping to them. That is, instead of comparing to a single idea, a range of responses -- A, B, C, and D answers might be available and student submissions are compared for features that match somewhere in the cluster. So if the writing has features associated with an A -- vocabulary, length, and other measures -- the writing is scored an A and so on.
  3. C-Rater level (C-Rater is an ETS tool that we see in use in writing courses as Criterion). Here, the software is trained to prompts, and a corpus of sample student writing in response to those prompts. The prompts are designed so that submissions will fall into the range of samples given (so a bit of 1 and 2 above happens), but in addition to using that corpus as a tool and way to do the analysis, C-Rater also uses Latent Semantic Analysis, a means of analyzing the text submitted in more sophisticated ways.

The psychology team and the biology team have done experiments with MITSC using the Latent Symantic Analysis (LSA) engine they've designed. The process went something like this:

1. A textbook was turned over to MITSC as .txt files.
2. Those files were scanned and described for the latent semantic features using the engine.
3. Wikipedia's entries on psychology were also scanned, to extend the corpus and to provide a richer semantic matrix. This creates a LSA Space, a corpus of writing that student answers to short answer questions are analyzed and scored against.
4. In an experiment, 6 questions were used and student answers were auto scored and compared to human scores. Now this process had some significant steps I won't get into, and the answers were short answer -- 20 - 100 words or so. But the results were interesting in two ways:
A. The scores on the first three test questions, the ones used to train the machine to score, matched human raters.
B. The machine was able to score on the second three questions accurately, without being trained. That is the algorithm designed on the first three test questions carried over and worked on the second set of three.

The potential here is tremendous -- the ability to deploy in all texts -- questions that draw on the text being read for accuracy and a range of answers. Imagine this in a system where questions occur as students read, engaging them as at key points to help them pause. And imagine the data that might result, where a student sees not only how they did, but what class averages are. Imagine a tutoring agent stepping in from time to time as answers come in, helping students learn deeper based on the answers. That is, with a well designed automated assessment tool, one doesn't just have to give a score alone, the score could trigger dialogic tutoring agent, or suggest a study group of fellow students to chat with, or a trigger a message to an on campus tutor.

It's not the score so much as what is done with the score to foster further learning. The key is that score comes from writing, an act that research shows deepens learning more fully than just reading, just highlighting, and just multiple choice (even in LC) answering.

My understanding is that these questions were designed for convergent thinking (see the graphic here -- Faultless Facilitation – Leveraging De Bono’s Six Thinking Hats -- on that) and not divergent.

There's more, but I'm out of time.

*I apologize

in praise of typo, errata, and error

The URL leads to a piece by Adrienne LaFrance,  "A Corrected History of the Typo" with the subtitle notation, "In the beginning, print was not about perfection; it was a space for collaboration."

LaFrance interviews "Adam Smyth, an English literature fellow at the University of Oxford who specializes in the instability of early modern texts," and he walks her through early printing, the role of errata, and how the relationship of early writers, printers,and readers coalesced around the expected excavation and discussions around error. LaFrance steers the discussion to Smyth with how error is treated online, where it often is erased in corrections (though wiki's most notably keep versions). Together they observe, and this just a sample:
Errata lists in the early days of printed books, then, were themselves a sort of early comment section—the place where revisions were made and ideas were exchanged. They were "confessional spaces" and "emblems of a new culture of accuracy," but errata lists were also a way of seeing books as a collaboration between reader and writer, rather than just the one-way broadcasting of a set of ideas. Which means that print, in its infancy, didn't actually lead to "better, more accurate texts," but to "the dissemination of blunders," Smyth says. It is in this way that the dawn of book printing sounds a bit like where we find ourselves today on the Internet—a fluid and collaborative space for ideas that sometimes seems to be equal parts information-rich and error-riddled. The difference in early print, though, is that errors "were not hidden away." And while screengrabs capture some evaporated Internet writing for posterity, much of what's published today simply disappears or changes with all the imperceptibly of a distant keystroke. 
As a writing teacher, I'd assign this to my students. It's useful for exploring collaboration, for thinking "about tolerating, rather than eliminating, reasonable mistakes," about the social life of information, about the transition into new technologies and how the new starts out by aping what came before it. 

But on the idea of "reasonable mistakes," it would be especially useful for weaker, less confident writers whose fear of error might make it harder to draft, and as well for a discussion of peer review and workshopping, where it might be possible to create a practice of not calling error out for correction but using it as an occasion for discussion and exploration. 

Too, the other lesson to be drawn from this is one wiki's teach as well -- reminding writers to use save as and other techniques for storing drafts, preserving versions of work with errors intact. The goal for most writers will remain on most occasions to publish (or in a course, to turn in a final draft) with as little error as possible, all the way to no errors at all. But the point is, as Paul Krebs notes in another piece I'd assign writers, that error will happen, is in fact necessary for writing to proceed, for thinking to improve, for learning to happen. 

Error is a good thing, and to hide and treat it as a shame, to shame writers who make error, is to shame writing.