Difference between comps in git repo and package repo

I was curious if anyone knew the difference between the git comps file

and the one found in the repo

https://dl.rockylinux.org/pub/rocky/9.3/BaseOS/x86_64/os/repodata/d20e81bf-d915-4b6d-b1cd-6a76cb8110aa-GROUPS.xml
https://dl.rockylinux.org/pub/rocky/9.3/AppStream/x86_64/os/repodata/956e246f-d2aa-4806-8fe8-b8ef9cb06d33-GROUPS.xml

From what I can tell the comps from git inform the installer what to install and comps from the repo inform what get installed for dnf groups. I wrote a script that goes through and pulls out every package from every group and I saw some packages were in the git comps but not the repo comps but it seems some of those were arch specific. I havent gone through every line but there definitely were differences

So my question is what is the difference and if I were to install “Server” which file would be the authority on which packages get installed?

The comps are used by certain tools (such as pungi) to group packages into functional groups. Comps is not only part of the repo data, but it also can help serve as one end of dependency solving when forming repositories.

You have groups and environment groups.

A group is a collection of initial packages. A lot of these groups are split among BaseOS and AppStream. BaseOS, while it can live by itself in some cases, may have grouped packages that also exist in AppStream. This means that BaseOS will not list AppStream packages, but both of those repositories have to be enabled to be able to access the packages of both repositories. This stands to reason that both BaseOS and AppStream are enabled by default.

An environment is a collection of groups, whether mandatory or optional.

Both of these groups types may mention “variant”, which locks down a group/environment to a given repository or “arch”, which locks down a package, group, or environment to a given architecture. Meaning, some groups/packages while they could exist in the repositories are not available to the group/environment. Though, in most cases, packages for an unmentioned architecture may not even exist in the repositories at all, which is by design.

I would look at “core”, “Development Tools”, and “Legacy UNIX Compatibility” for examples.

“Server” is an environment that encompasses multiple groups. Some of those groups reference both BaseOS and AppStream. It stands to reason that BaseOS and AppStream will always be the default repositories that are enabled on a given system. BaseOS, being the only repository that can live by itself (which you shouldn’t do nor attempt anyway), is typically the starting authority, because it will contain most required packages. You can especially tell when you look at the “Standard” group that it references, and you’ll see a large amount are in BaseOS and a few are in AppStream (clear example is rsyslog, wget, and vim-enhanced). Compare the “Standard” groups in the repodata of BaseOS and AppStream, and you may see how they’re split.

Thank you @nazunalika this is super helpful! So, is it fair to say that the BaseOS GROUPS.xml and the AppStream GROUPS.xml should be just as accurate as the comps from gitlab in terms of what gets installed?