Syllabus

The goal of this course is to introduce you to the field of binary program analysis. This is the art and practice of understanding the undocumented, internal principles of software that does not have source code available. We will be studying how program analysis techniques can be used to tell us things about how a program will behave, find vulnerabilities, and understand internal implementation details–all without access to source and with minimal human intervention.

As a secondary goal, this course aims to give you robust skills in both reading and writing academic research papers. To that end, we will be reading ~3 papers per week (though sometimes we will read a book chapter instead) and writing regularly. At the end of the semester, you will propose, research, and write a scientific paper related to some aspect of program analysis, reverse engineering, or software testing.

Note that this is primarily structured as a graduate research course. I realize that roughly half of you are undergraduates–that's okay! I will take it into account when approving and grading the research projects.

Grading

Grading is structured into three main components:

Weekly writing assignments (30%)
In-class participation (10%)
Final research project and presentation (60%)

Each of these components is elaborated on below.

Weekly Writing Assignments

To do research you must not only have technical skills but also be able to effectively communicate your results. The best scientists are also journalists, exploring a topic and then reporting on it in a way that is easy to read and understand. Even if you decide not to do research in computer science, facility with technical writing will serve you well in job applications, writing an email to your boss arguing for a particular technical approach or project, or just proving people wrong on the Internet.

To practice the mechanics of good scientific writing, each week you will submit a three paragraph write-up about that week's reading. The first two paragraphs should summarize the main themes of the reading: what problems the papers are trying to solve, how they solve them, and how effective their solutions are.

The final paragraph should propose an idea for additional research in the same area as the reading. You won't be graded on the quality of these research ideas; it's just meant to get you used to thinking about published research as a jumping off point for new work.

Each of these will be graded on a pass/fail basis. Grammar and spelling mistakes, awkward phrasing, repetitive word choice and sentence structure, and unnecessary verbosity will result in failing that week's writing assignment. However, I will provide detailed feedback, and if you submit a corrected version by the next class I will change your grade.

If you want to improve your writing, here are a few recommendations:

Read widely. The best way to get an ear for how good English prose sounds to read a lot of it. When you find a piece that's unusually enjoyable to read, try to figure out what stylistic choices the author made that contribute to that effect.
Get a guide to grammar and style. Strunk and White's The Elements of Style is a slim, cheap book that I've found very useful as a basic guide to clear writing. I've also heard that Steven Pinker's The Sense of Style is very good, but haven't had a chance to read it.
As you read the assigned papers, again look not only at the content but at the presentation. How are papers structured? What makes writing harder or easier to understand (aside from the inherent difficulty of the topic)?
Practice!

In-Class Participation

During lecture I will periodically pick a name from a hat and call on that person to answer some question about one of the papers in that day's reading. If you can provide a reasonable answer, you get participation credit. If you aren't there or your answer demonstrates that you didn't try to read and understand the paper, you get no credit.

Note that it's okay to provide an answer that indicates you read the paper but didn't understand the specific thing I'm asking about. Particularly early on, you won't always understand every detail of a paper, and learning how to efficiently skim and get the gist of what's going on is an important skill in itself.

Final Research Project

The final research project will consist of a conference-length CS research paper and an accompanying presentation. We will prepare for this throughout the semester by talking about how to write an abstract, related work section, etc. You may work in groups of up to three people.

Project ideas can be found here.

Schedule

Week 1
- Introduction and overview
- Course format
- Some basics of binary analysis
Week 2
- Theoretical background of static analysis
- Impossibility proofs
- Papers:
  
  William Landi. Undecidability of static analysis. ACM Letters on Programming Languages and Systems (LOPLAS). Volume 1 Issue 4, Dec. 1992.
Week 3
- Static Analysis I: CFG, dominance, CDG, PDG, SSA
- Papers:
  
  Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS). Volume 9 Issue 3, July 1987.
  
  Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS). Volume 13 Issue 4, Oct. 1991.
  
  Optional: Compilers: Principles, Techniques, and Tools Chapter 9.
Week 4
- Static Analysis II: Interprocedural analyses, program slicing
- Papers:
  
  Mary Jean Harrold and Mary Lou Soffa. Efficient computation of interprocedural definition-use chains. ACM Transactions on Programming Languages and Systems (TOPLAS). Volume 16 Issue 2, March 1994.
  
  Mark Weiser. Program slicing. Proceedings of the 5th international conference on Software engineering (ICSE). 1981.
- Code to look at:
  - Control dependence graph: https://github.com/angr/angr/blob/master/angr/analyses/cdg.py
  - Data dependence graph: https://github.com/angr/angr/blob/master/angr/analyses/ddg.py
  - Slicing: https://github.com/angr/angr/blob/master/angr/analyses/backward_slice.py
Week 5 & 6
- Static Analysis III: Value Set Analysis, Points-to Analysis
- Paper:
  
  Gogul Balakrishnan and Thomas Reps. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems (TOPLAS). Volume 32 Issue 6, August 2010.
- Code:
  - Value flow graph: https://github.com/angr/angr/blob/master/angr/analyses/vfg.py
  - Strided intervals, a-locs, and value sets: https://github.com/angr/claripy/tree/master/claripy/vsa
Week 7
- Static Analysis V: Symbolic execution
- Papers:
  
  James C. King. Symbolic Execution and Program Testing. Communications of the ACM, Volume 19 Issue 7, July 1976.
  
  Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. Proceedings of the 8th USENIX conference on Operating systems design and implementation (OSDI). 2008.
  
  Food for thought: why would there be a gap of 30 years between the first symbolic execution work and when it became mainstream?
Week 8
- Static Analysis IV: Type inference, decompilation
- Papers:
  
  JongHyup Lee, Thanassis Avgerinos, and David Brumley. TIE: Principled Reverse Engineering of Types in Binary Programs. Proceedings of the Network and Distributed Systems Symposium (NDSS). 2011.
  
  Edward J. Schwartz, JongHyup Lee, Maverick Woo, and David Brumley. Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring. Proceedings of the USENIX Security Symposium. 2013. Video is available.
  
  Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantics-Preserving Transformations. Proceedings of the Network and Distributed Systems Symposium (NDSS). 2015. Slides are available.
  
  Note: We're skipping over some historical work by, e.g., Christina Cifuentes. Unfortunately, the best available resource for her work on decompilation is her PhD thesis, which is a bit too long to pack in with two other papers. It's highly recommended if you're interested in decompilation, though!
Week 9
- Dynamic Analysis I: Instrumentation, debuggers, and tools
- Papers: None
Week 10
- Dynamic Analysis II: Dynamic taint analysis, dynamic slicing
- Papers:
  
  James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Proceedings of the Network and Distributed Systems Symposium (NDSS). 2005.
  
  Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask). Proceedings of the IEEE Symposium on Security and Privacy (Oakland). 2010.
- Homework: Related work for your project. 15 references minimum. See lecture slides for details.
Week 11
- Dynamic Analysis III: Record & Replay, multi-path execution
- Papers: TBD
Week 12
- Dynamic Analysis IV: Bounds checking, use-after-free detection, sanitizers
- Papers: TBD
Week 13:
- Obfuscation: Packing/Unpacking, Opaque Predicates, Control Flow Flattening, Virtualization Packers
- Papers: TBD
Week 14
- Final Presentations I
Exam period:
- Final Presentations II

Cooperation Policy

The paper writeups must be done on your own. For the final research projects, you can work in groups of up to 3.

Moses Center Statement of Disability

If you are student with a disability who is requesting accommodations, please contact New York University’s Moses Center for Students with Disabilities at 212-998-4980 or mosescsd@nyu.edu. You must be registered with CSD to receive accommodations. Information about the Moses Center can be found at www.nyu.edu/csd. The Moses Center is located at 726 Broadway on the 2nd floor.