Vivian Miranda Academic Website - Phy504

Why take Phy504 in the era of AI?

PHY 504 provides students with a view of how programming and computers work that is difficult to build through improvisation. In scientific work, students move between terminal sessions, remote machines, compilers, numerical libraries, legacy code, and modern abstractions. At the same time, students cannot lose sight of the underlying physics, and this multitude of necessary competences is precisely what makes computational research difficult. Students must understand the physical model, the approximations, and the numerical goals, while also grasping how the code is run, how data are handled, how memory behaves, and why a program may produce the wrong answer. PHY504 reduces the computational side from a constant source of uncertainty that requires students to constantly look for patchy answers on AI prompts into a set of tools and concepts that students can use deliberately, leaving them in a much better position to focus on the science itself.

PHY 504 builds a more effective relationship between students and the AI prompt. AI tools are most effective when guided by a user already knowledgeable about what they are asking. Good answers begin with creative but precise questions, which are usually written by people who have enough background to see what is essential and what could go wrong. This is especially important given that AI companies are designing interfaces to avoid angering users. If the user requests a quick fix without understanding the integration to the larger pipeline and the bugs it can introduce, the AI will blindly follow the request (with some light pushback on the best scenario).

Scientists should not “vibe code” their way through serious computational work; they should also never use agents. AI can accelerate learning, clarify syntax, compare options, and help debug a local issue. Without a solid foundation, however, students can easily accept plausible-looking answers that hide numerical errors and memory bugs, many of which are caused by misunderstandings of the language and the tools/libraries they import. Students may be skeptical that AI will introduce bugs, given its increasing capabilities in software development to the point that it can be considered beyond human-level. The key fact is that AI does not have all the information in the prompt; students are the ones who select what to include in the question and how to phrase the request. Phy504 gives students the knowledge needed to use AI as a tool that becomes more powerful precisely because they already know enough to question, evaluate, and refine what it produces.

Agents require a separate conversation; they increase usage and capabilities, plus the fact that they access the entire codebase can be a counterargument to my view on the benefits of Phy504. Agents can view the entire database, but they still can’t produce new code out of thin air. Users must be precise about what they request and use tools such as Smart Synchronize to review all changes. I have never used agents myself, but I have heard some impressive stories about them. However, a key consideration is costs. Allowing agents to think on a problem for days, weeks, maybe months will certainly have astronomical costs, financially and environmentally.

How do we grade in Phy504? Students are free to use AI in any way they see fit, although we provide guidance on best practices, when doing the homework (not graded). However, I give students three written exams (one about bash, one about C and one about C++) where they can access all the slides printed in paper without any adulterations but no access to AI. Questions are taken from homework, so it benefits the most students that try to use AI to learn the subject. There is enough homework that it is almost impossible to memorize all the solutions, so the development of a deeper understanding is necessary.

Phy504: course description

This course is intended for graduate students in physics, chemistry, and related fields who want a firmer computational foundation. Much of scientific work now depends on software, but many students are asked to use it before they have been taught how it is built, how it runs, or how it fails. The course addresses that gap, as it begins with the Unix shell, then moves through the C language, and it concludes with an overview of old and modern C++. These three parts provide a practical introduction to the environments, languages, and habits that are a great deal of modern scientific computing.

We include an opening section on Bash because students are still uncomfortable in a terminal, and that becomes a real limitation when they work on a remote supercomputer. The course treats the shell as a practical research tool. Students learn how to navigate a filesystem, work safely with files and directories, use remote access tools such as SSH and SCP, consult manual pages, write simple scripts, and combine small commands into pipelines that perform useful work. The aim is not to turn students into system administrators, but to make the terminal a normal part of their scientific workflow. Phy504 shows how common tools such as grep, find, cut, sort, uniq, tr, sed, and curl can be combined to search files, manipulate text, inspect data, and build small but effective workflows.

Once students are comfortable enough with the terminal, the course turns to the C language. In C, nothing essential is hidden for long. Students must confront the meaning of types, the effects of integer division, the limitations of floating-point arithmetic, the structure of arrays, the consequences of memory layout, and the reality of bugs that arise not from syntax alone but from incorrect assumptions about how the machine behaves.

Phy504 pays careful attention to issues that matter in real computation: overflow and underflow, cancellation, casting, scientific notation, signed and unsigned arithmetic, uninitialized variables, out-of-bounds access, segmentation faults, memory leaks, and undefined behavior. These features are part of the day-to-day of scientific programming. The course also emphasizes treating compiler warnings as part of scientific reasoning rather than mere noise. Good use of compiler flags, assertions, exit codes, and defensive checks is presented as part of ordinary practice.

The C section does not remain at the level of toy examples. It moves into the material that students are likely to encounter in serious code. Particularly important is the treatment of pointers and memory. Many students in the sciences can use numerical software for years without a clear picture of what stack memory, heap memory, pointer aliasing, shallow copies, or deallocation errors actually mean. This course teaches them directly, with concrete, sometimes surprising examples. By doing so, it gives students a vocabulary for understanding what their programs are doing and why certain classes of bugs are difficult to diagnose.

The course then extends that foundation into C++, but does so in a way that reflects the realities of research computing rather than the conventions of a standard software engineering curriculum. Scientific C++ is rarely encountered as a single, uniform, up-to-date language. Students inherit code written over many years by their advisors in a mixture of styles: older C++98-era conventions, C++11/14 idioms, more recent C++17 or C++20 features, and occasionally cutting-edge C++23/26. The course is designed with that fact in mind. Its goal is not to enforce one exclusive model of “correct modern C++,” but to give students enough fluency to read across standards and enough judgment to recognize the same underlying idea when it appears in different forms.

The covered topics are broad by design: namespaces, overloading, tuples, structured bindings, optional values, maps, vectors, lists, iterators, STL algorithms, ranges, references, lambdas, strings, I/O, classes, inheritance, memory management, move semantics, and templates all appear. The point is not to expect mastery of every feature; Phy504 gives students enough working familiarity to approach unfamiliar code without fear. In a research environment, that ability matters greatly, as we rarely have the luxury of beginning from a blank file; one must learn to read, adapt, and extend what already exists.

The course is shaped by practical considerations. Students use a Docker-based environment to reduce the difficulties of installing compilers and libraries across different systems. The use of libraries such as GSL and formatting/tooling support in later C++ examples is not incidental. Scientific computing almost always depends on ecosystems rather than on a language in isolation. Students are comfortable not only with syntax but also with the habits of computational work: reading documentation, interpreting compiler messages, managing environments, checking assumptions, and using tools without becoming dependent on them.

Phy504 The course does not promise to make them experts in a semester. What it aims to do is more fundamental: to replace mystery with a working understanding. For students who expect computation to play a serious role in their research, that is a worthwhile foundation

Bash Section (7 lectures)

The C Language (8 lectures)

The C++ Language (11 lectures)

Additional Topics (Future studies)

So, you have graduated from PHY 504. Congratulations. Now what?

PHY 504 is not the end of computational training; it is a foundation for continued study. Students who want to be effective in graduate research will eventually need additional tools, especially when working with large numerical codes, high-performance computing systems, machine-learning emulators, or legacy scientific software. For this reason, students are always welcome to continue their studies during the summer, and I gladly offer supervision to those who want to go deeper.

A few topics that commonly appear in my research, and that naturally continue the training started in PHY 504, are:

Python programming.Students should learn to write Python that is readable, concise, and fast enough for scientific work. This includes understanding NumPy arrays, memory layout, vectorization, profiling, and the use of scientific libraries.
The PyTorch library. PyTorch is central to many machine-learning applications in physics, including emulators and accelerated numerical experiments. Learning PyTorch properly requires more than copying neural-network examples. Students need to understand tensors, devices, automatic differentiation, batching, optimization loops, training diagnostics, and numerical failure modes in scientific pipelines.
Shared-memory parallelization in C/C++ with OpenMP. Many scientific codes are first accelerated by using multiple cores on a single machine. OpenMP is often the simplest way to parallelize loops, but using it correctly requires understanding race conditions, reductions, scheduling, memory locality, false sharing, thread oversubscription, and why parallel code can become slower or less reproducible.
Distributed-memory parallelization in C/C++ with MPI. Larger calculations often require many nodes on a cluster, where each process has its own memory. MPI introduces communication, synchronization, domain decomposition, collective operations, and scaling. Students should learn not only how to launch an MPI program, but also how to reason about whether the algorithm itself can scale.
GPU programming in C/C++ for scientific calculations. GPUs are not only for AI. Many physics calculations can benefit from GPU acceleration, but this requires understanding memory transfers, kernels, thread hierarchy, latency hiding, and the structure of calculations that actually run efficiently on GPU hardware.
Makefiles and CMake. Students should learn how to write clean Makefiles, manage compiler flags, dependencies, libraries, include paths, debug builds, optimized builds, and eventually use CMake for larger projects. A good build system is part of the scientific instrument.
Git version control. Many scientific groups rely on large code bases developed by multiple collaborators over many years. Students should learn commits, branches, merges, rebases, tags, pull requests, conflict resolution, and project history, with the goal of making changes that are small, reviewable, reversible, and understandable to others.
Fortran. Much of computational physics still depends on Fortran, and cosmology is no exception. Codes such as CAMB, a Boltzmann code widely used in cosmology, are written in Fortran. Students do not necessarily need to write new large Fortran projects, but they should be able to read Fortran code, modify it safely, understand its array conventions, compile it, and link it to C/C++ or Python.

I will gradually provide lecture notes for each of these additional topics.

The PyTorch Library

Miranda's Group

Phy504 - Computational Physics